Loading...
Loading...
Found 3 Skills
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Conduct Fault Tree Analysis (FTA) to systematically identify and analyze causes of system failures using Boolean logic gates. Top-down deductive method for safety and reliability engineering. Use when analyzing system failures, evaluating safety-critical designs, calculating failure probabilities, identifying minimal cut sets, assessing redundancy effectiveness, or when user mentions "fault tree", "FTA", "system failure analysis", "minimal cut sets", "safety analysis", "failure probability", "AND/OR gates", or needs to trace failure pathways from top event to basic events. Supports qualitative structure analysis and quantitative probability calculations.
Senior engineering analyst for code review, plan review, and automation review. Four-lens review covering architecture, code quality, reliability, and performance, each scored 1-10. Delivers APPROVE, REVISE, or REJECT verdicts. Use when you need a code review, plan review, quality check, or want to verify work before shipping. Part of the architect-system loop. Reads from system/blueprints/ and source code. Outputs to system/reviews/.