Search Results: benchmark-assessment

Found 1 Skills

AI & Machine Learningancoleman/ai-design-compo...

evaluating-llms

Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.

🇺🇸|EnglishTranslated

9 scripts/Attention