Search Results: llm-as-judge

Found 40 Skills

Code Qualityneolabhq/context-engineer...

critique

Comprehensive multi-perspective review using specialized judges with debate and consensus building

AI & Machine Learningneolabhq/context-engineer...

customaize-agent:agent-evaluation

Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

llm_evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

🇺🇸|EnglishTranslated

AI & Machine Learninglangchain-ai/langchain-sk...

langsmith-evaluator

Use this skill for ANY question about CREATING evaluators. Covers creating custom metrics, LLM as Judge evaluators, code-based evaluators, and uploading evaluation logic to LangSmith. Includes basic usage of evaluators to run evaluations.

🇺🇸|EnglishTranslated

1 scripts/Attention