Loading...
Loading...
Found 3 Skills
Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.
Use when the user asks to "create an evaluator", "create evals", "create a scenario", "write a test scenario", "design a test case", "test my agent", "build eval coverage", "plan a test suite", "create red team tests", "set up test profiles", "configure conditional actions", "write a conditional action evaluator", "build a deterministic test", "design an IVR test", "IVR navigation test", "write a unit test for a voice agent", "build a regression test", "scripted scenario", "scripted voice test", "structured evaluator", "exact flow test", "sequential conditions", "fixed sequence test", or "run evals". Covers individual evaluator design, suite coverage strategy, test profiles, mock-tool data design, conditional actions (deterministic / unit test / regression / IVR navigation flows), and best practices for workflow / red-team / edge-case / deterministic test types.
Start a repo-local OptimizeSpec self-improvement change. Use when the user wants to create evals, optimize an agent with GEPA, define an agent self-improvement loop, or begin an ASI-first evaluation workflow.