Loading...
Loading...
Performance benchmarking for a deployed NVIDIA RAG Blueprint server: profiling pass + aiperf load test driven by a single YAML config. Not for accuracy / RAGAS scoring (use rag-eval) or for deploying / repairing services (use rag-blueprint).
npx skill4agent add nvidia/skills rag-perfrag-perf -c <config>--help--versionuv sync --project scripts/rag-perfuv sync --project scripts/rag-perf --extra devpytest-asynciohttp://localhost:8081nvidia_ragpip install -e ./scripts/rag-perfaiperf.pluginssynthetic.llm_urlhttp://localhost:8999/v1/chat/completionsNVIDIA_API_KEYscripts/rag-perf/configs/quick_profile.yamlsingle_run.yamlsweep.yamlload.concurrencyrag.vdb_top_krag.reranker_top_kint | list[int]rag.collection_names: ["<collection_name>"]GET /v1/collections<collection_name>uv run --project scripts/rag-perf rag-perf -c scripts/rag-perf/configs/single_run.yaml-c / --config--help--versionreferences/output-and-analysis.mditerations=1iter_<i>/<point>/...references/output-and-analysis.mdresults.jsonresults.csvreport.mdreferences/output-and-analysis.md#summarising-results-to-the-userllm_ttft_msdocs/performance-benchmarking.mdaiperf.enabled: falseload.iterationsload.sleep_between_points_s: 60uv run --project scripts/rag-perf rag-perf -c scripts/rag-perf/configs/quick_profile.yamlrag-perf-results/quick_profile/run_<ts>/{profile_report.md, profile_results.json, profiling/}aiperf_rag_on/profile_*aiperf.enabled: falseuv run --project scripts/rag-perf rag-perf -c scripts/rag-perf/configs/single_run.yamlrun_<ts>/{report.md, results.json, results.csv, profiling/, aiperf_rag_on/}uv run --project scripts/rag-perf rag-perf -c scripts/rag-perf/configs/sweep.yamlrun_<ts>/iter_1/<CR:_VDB-K:_RERANKER-K:_…>/{profiling,aiperf_rag_on}/report.mdresults.jsonresults.csvuv sync --project scripts/rag-perf --extra dev # one-time, installs pytest-asyncio
uv run --project scripts/rag-perf python -m pytest tests/unit/test_rag_perf/load.concurrencyrag.vdb_top_krag.reranker_top_kint | list[int]input.fileinput.syntheticsynthetic.jsonl.csvsynthetic.disable_thinking: truecontentreasoning_contentAiperfRunner._base_aiperf_cmdscripts/rag-perf/rag_perf/runner.pyreferences/| Error / signal | Likely cause | What to do |
|---|---|---|
| Both | Pick one. The XOR validator runs at YAML load time. |
| Extension other than | Rename or convert. |
| e.g. | Each concurrency maps to a unique point dir; dedupe. |
| YAML had | aiperf rejects warmup=0; minimum is 1. |
| Reasoning model used CoT and ran out of tokens | Set |
| Bad URL, server down, wrong collection | Verify |
Per-iteration | Some requests timed out / errored mid-run | Check rag-server logs, raise |
| LLM endpoint rejected a request mid-generation | Partial JSONL is at |
| Collection mismatch between | Run |
Tests error with | Dev extras missing | |
CI: | rag-perf package missing from CI venv | Add |
scripts/rag-perf/examples/queries.jsonlscripts/rag-perf/prompts/default_prompts.yamlscripts/rag-perf/rag.collection_names["<collection_name>"]Citation count (mean): 0load.concurrency_listrag.vdb_top_k_listrag.reranker_top_k_listaiperf.enabled: falseprofile_report.mdprofile_results.jsonprofile_results.csv\n $ python -m aiperf profile -m ... --endpoint-type nvidia_rag ...--endpoint-type nvidia_ragscripts/rag-perf/rag_perf/plugin/nvidia_rag.py/v1/generatemetricsnvidia_raguv sync --project scripts/rag-perfuv pip install -e ./scripts/rag-perf[1, 4]vdb_top_kCR:1_ISL:50_OSL:512_VDB-K:20_RERANKER-K:4_Model:...output.clusteroutput.gpuoutput.experiment_nameload.iterations > 1iter_<i>/n_points × iterations| Piece | Location |
|---|---|
| Driver | |
| Schema | |
| Orchestrator | |
| aiperf plugin | |
| User-facing doc | |
| Presets | |
| Sample queries | |
| Synthetic prompts | |
| Config schema details | |
| Synthetic-query generation | |
| Output layout & metric semantics | |
uv sync --project scripts/rag-perfscripts/rag-perf/configs/<preset>.yamlrag.collection_namesuv run --project scripts/rag-perf rag-perf -c <config>output.dir/run_<ts>/references/output-and-analysis.mdresults.csvreferences/output-and-analysis.md#summarising-results-to-the-userllm_ttft_msquick_profile.yamlaiperf.enabled: falsesingle_run.yamlsweep.yamlreferences/output-and-analysis.md