Search Results: prompt-testing

Found 11 Skills

AI & Machine Learningpatricio0312rev/skills

prompt-regression-tester

Compares old vs new prompts across test cases with diff summaries, stability metrics, breakage analysis, and fix suggestions. Use for "prompt testing", "A/B testing prompts", "prompt versioning", or "quality regression".

🇺🇸|EnglishTranslated

AI & Machine Learningdaymade/claude-code-skill...

promptfoo-evaluation

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

validate-evaluator

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).

🇺🇸|EnglishTranslated

AI & Machine Learninggarrytan/gstack

benchmark-models

Cross-model benchmark for gstack skills. Runs the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost, and optionally quality via LLM judge. Answers "which model is actually best for this skill?" with data instead of vibes. Separate from /benchmark, which measures web page performance. Use when: "benchmark models", "compare models", "which model is best for X", "cross-model comparison", "model shootout". (gstack) Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".

🇺🇸|EnglishTranslated

AI & Machine Learningkriscard/kriscard-claude-...

prompt-engineer

AI/LLM: Use when crafting system prompts, optimizing LLM outputs, or improving agent instructions. NOT for general coding.

🇺🇸|EnglishTranslated

AI & Machine Learningneolabhq/context-engineer...

customaize-agent:test-prompt

Use when creating or editing any prompt (commands, hooks, skills, subagent instructions) to verify it produces desired behavior - applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing

🇺🇸|EnglishTranslated

AI & Machine Learningpixel-process-ug/superkit...

writing-skills

Use when creating new skills, commands, or agent definitions for Claude Code, including writing SKILL.md files, defining triggers, and testing skill behavior

🇺🇸|EnglishTranslated

AI & Machine Learningphrazzld/claude-config

llm-evaluation

LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing

🇺🇸|EnglishTranslated

AI & Machine Learninglidessen/moniro

prompt-lab

Test, validate, and improve agent instructions (CLAUDE.md, system prompts) using sub-agents as experiment subjects. Measures instruction compliance, context decay, and constraint strength. Use for "test prompt", "validate instructions", "prompt effectiveness", "instruction decay", or when designing robust agent behaviors.

🇺🇸|EnglishTranslated

AI & Machine Learningwyattowalsh/agents

prompt-engineer

Comprehensive prompt and context engineering for any AI system. Four modes: (1) Craft new prompts from scratch, (2) Analyze existing prompts with diagnostic scoring and optional improvement, (3) Convert prompts between model families (Claude/GPT/Gemini/Llama), (4) Evaluate prompts with test suites and rubrics. Adapts all recommendations to model class (instruction-following vs reasoning). Validates findings against current documentation. Use for system prompts, agent prompts, RAG pipelines, tool definitions, or any LLM context design. NOT for running prompts, generating content, or building agents.

🇺🇸|EnglishTranslated

AI & Machine Learningneolabhq/context-engineer...

test-prompt

🇺🇸|EnglishTranslated