Search Results: eval-design

Found 3 Skills

AI & Machine Learningrefoundai/lenny-skills

ai-evals

Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.

🇺🇸|EnglishTranslated

AI & Machine Learningcekura-ai/cekura-skills

cekura-eval-design

Use when the user asks to "create an evaluator", "create evals", "create a scenario", "write a test scenario", "design a test case", "test my agent", "build eval coverage", "plan a test suite", "create red team tests", "set up test profiles", "configure conditional actions", "write a conditional action evaluator", "build a deterministic test", "design an IVR test", "IVR navigation test", "write a unit test for a voice agent", "build a regression test", "scripted scenario", "scripted voice test", "structured evaluator", "exact flow test", "sequential conditions", "fixed sequence test", or "run evals". Covers individual evaluator design, suite coverage strategy, test profiles, mock-tool data design, conditional actions (deterministic / unit test / regression / IVR navigation flows), and best practices for workflow / red-team / edge-case / deterministic test types.

🇺🇸|EnglishTranslated

AI & Machine Learningterminaluse/optimizespec

optimizespec-new

Start a repo-local OptimizeSpec self-improvement change. Use when the user wants to create evals, optimize an agent with GEPA, define an agent self-improvement loop, or begin an ASI-first evaluation workflow.

🇺🇸|EnglishTranslated