Loading...
Loading...
Found 5 Skills
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Conduct Fault Tree Analysis (FTA) to systematically identify and analyze causes of system failures using Boolean logic gates. Top-down deductive method for safety and reliability engineering. Use when analyzing system failures, evaluating safety-critical designs, calculating failure probabilities, identifying minimal cut sets, assessing redundancy effectiveness, or when user mentions "fault tree", "FTA", "system failure analysis", "minimal cut sets", "safety analysis", "failure probability", "AND/OR gates", or needs to trace failure pathways from top event to basic events. Supports qualitative structure analysis and quantitative probability calculations.
Identify single points of failure, assess recovery capabilities, and produce a prioritized remediation plan aligned with the Well-Architected Reliability pillar.
Senior engineering analyst for code review, plan review, and automation review. Four-lens review covering architecture, code quality, reliability, and performance, each scored 1-10. Delivers APPROVE, REVISE, or REJECT verdicts. Use when you need a code review, plan review, quality check, or want to verify work before shipping. Part of the architect-system loop. Reads from system/blueprints/ and source code. Outputs to system/reviews/.
Run a security and reliability health check on a Portaly Vibe payment integration before deployment. Trigger when the user mentions Portaly health check, payment security audit, pre-deploy check, sentry scan, callback verification audit, integration safety check, or wants to verify their Portaly payment integration is safe to go live.