Loading...
Loading...
Found 4 Skills
AI trustworthiness testing using OWASP AI Testing Guide v1. Execute 44 test cases across 4 layers (Application, Model, Infrastructure, Data) with practical payloads and remediation.
Use when discussing or working with DeepEval (the python AI evaluation framework)
Explains the ADK Dev Console — what each tab shows, how to read Agent Steps, traces, and other UI features visible at localhost:3001 during adk dev
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.