Search Results: ai-testing

Found 8 Skills

AI & Machine Learningmicrosoft/eval-guide

eval-suite-planner

Produces a concrete eval suite plan grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance — scenario types, evaluation methods, quality signals, thresholds, and priority order — before any test cases are generated or evals are run.

🇺🇸|EnglishTranslated

Testing & QAmicrosoft/eval-guide

eval-generator

Generates eval test cases from an eval suite plan (output of /eval-suite-planner) or a plain-English agent description. Supports both single-response and conversation (multi-turn) evaluation modes. Outputs a Copilot Studio test set table, a CSV file for import (single-response only), and a docx report for human review.

🇺🇸|EnglishTranslated

AI & Machine Learninglivekit/agent-skills

livekit-agents

Build voice AI agents with LiveKit Cloud and the Agents SDK. Use when the user asks to "build a voice agent", "create a LiveKit agent", "add voice AI", "implement handoffs", "structure agent workflows", or is working with LiveKit Agents SDK. Provides opinionated guidance for the recommended path: LiveKit Cloud + LiveKit Inference. REQUIRES writing tests for all implementations.

🇺🇸|EnglishTranslated

AI & Machine Learninggithub/awesome-copilot

eval-driven-dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.

🇺🇸|EnglishTranslated

Testing & QAcinience/alicloud-skills

alicloud-ai-multimodal-qwen-vl-test

Minimal image-understanding smoke test for Model Studio Qwen VL.

🇨🇳|ChineseTranslated

2 scripts/Attention

AI & Machine Learningaxiomhq/skills

writing-evals

Scaffolds evaluation suites for the Axiom AI SDK. Generates eval files, scorers, flag schemas, and config from natural-language descriptions. Use when creating evals, writing scorers, setting up flag schemas, or configuring axiom.config.ts.

🇺🇸|EnglishTranslated

16 scripts/Attention

Testing & QAstablyai/agent-skills

stably-sdk-rules

AI rules for writing tests with Stably Playwright SDK. Use this skill when writing or modifying Playwright tests with Stably AI features. Covers when to use Playwright vs Stably methods, plus minimal patterns for aiAssert, extract, getLocatorsByAI, agent.act, Inbox, and Google auth.

🇺🇸|EnglishTranslated

AI & Machine Learningmarcohefti/zero-context-l...

zcl

Orchestrator workflow for running ZeroContext Lab (ZCL) attempts/suites with deterministic artifacts, trace-backed evidence, and fast post-mortems (shim support for "agent only types tool name").

🇺🇸|EnglishTranslated