Loading...
Loading...
Found 3 Skills
AI trustworthiness testing using OWASP AI Testing Guide v1. Execute 44 test cases across 4 layers (Application, Model, Infrastructure, Data) with practical payloads and remediation.
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
Use when discussing or working with DeepEval (the python AI evaluation framework)