Loading...
Loading...
Filesystem RAG benchmarks: corpus/, train.json, evaluate_rag.py (RAGAS quality). Not for prod monitoring, latency/throughput benchmarking (use rag-perf), or evals outside this repo layout.
npx skill4agent add nvidia/skills rag-evalcorpus/train.jsoncorpus/train.jsonscripts/eval/evaluate_rag.pyscripts/rag-perfdocs/performance-benchmarking.mdcorpus/train.jsonuv sync --project scripts/evallocalhost:80818082NVIDIA_API_KEYRAG_EVAL_JUDGE_MODEL--dataset-pathscorpus/train.jsontrain.jsonreferences/dataset-and-conversion.mdcorpus/uv run --project scripts/eval python scripts/eval/evaluate_rag.py--dataset-paths--host--portreferences/benchmark-execution.mdreferences/evaluate-rag-cli.md--top_k--vdb_top_k--temperature--top-p--max-tokensreferences/benchmark-execution.mdreferences/result-analysis.mdrag_*_evaluation_summary.json.envreferences/benchmark-execution.md#credential-hygiene-nvidia_api_keyuv sync --project scripts/eval
uv run --project scripts/eval python scripts/eval/evaluate_rag.py \
--dataset-paths /path/to/my_dataset \
--host localhost \
--port 8081python3 -m json.tool results/my_dataset/rag_my_dataset_evaluation_summary.jsonreferences/benchmark-execution.mdevaluate_rag.pyNVIDIA_API_KEYreferences/| Error / signal | Likely cause | What to do |
|---|---|---|
Immediate exit mentioning | Missing or invalid key | Set key via secure channel; see credential hygiene in |
| Wrong JSON shape | Top-level array of objects; validate per |
Fewer rows in | Per-query failures | Check stderr: network or stream JSON errors; see error table in benchmark-execution. |
Empty | Retrieval gap | Verify collection, ingestion, |
| Ingestor 404 on upload | Bad ingestor base URL | Pass |
references/benchmark-execution.md#common-error-cases-and-signalsscripts/eval/evaluate_rag.py--ingestor_server_urlhttp://host:port/v1/v1//v1APP_VECTORSTORE_URL--model--llm_endpoint--force_ingestion--collectiongenerated_contextsnv_accuracy| Piece | Location |
|---|---|
| Driver | |
| Human README (always in-repo) | |
| Full CLI (flags, defaults) | |
| Dataset / conversion | |
| Runs, outputs, errors | |
| Result analysis scripts | |
| Latency / throughput | rag-perf skill, |
uv sync --project scripts/evaluv run --project scripts/eval python scripts/eval/evaluate_rag.py--dataset-paths--host--portNVIDIA_API_KEY--ingestor_server_urlhttp://localhost:8082references/benchmark-execution.md--top_k--vdb_top_k--temperature--top-p--max-tokensreferences/dataset-and-conversion.mdreferences/result-analysis.mdpython3 -m json.tool results/<dataset>/rag_<dataset>_evaluation_summary.jsonreferences/benchmark-execution.md#common-error-cases-and-signals