zeroeval-install
Original:🇺🇸 English
Translated
This skill should be used when users want to install, set up, or integrate ZeroEval into their AI application, agent, or pipeline. It covers SDK setup (Python and TypeScript), first-run tracing, ze.prompt migration, and judge recommendations. For non-SDK languages or direct API/OTLP ingestion it routes to the custom-tracing skill. Triggers on "install zeroeval", "set up zeroeval", "add tracing", "integrate zeroeval", "ze.prompt", "add judges", or "monitor my AI app".
4installs
Sourcezeroeval/zeroeval-skills
Added on
NPX Install
npx skill4agent add zeroeval/zeroeval-skills zeroeval-installTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →ZeroEval Install and Integrate
Guide users from zero to production-ready ZeroEval integration: tracing, prompt management, and automated judges.
When To Use
- Setting up ZeroEval for the first time in any language.
- Adding tracing/observability to an existing AI app, agent, or pipeline.
- Migrating hardcoded prompts to with staged rollout (Python / TypeScript).
ze.prompt - Choosing and configuring judges for automated evaluation.
- Troubleshooting missing traces, broken feedback loops, or prompt metadata issues.
Execution Sequence
Follow these steps in order. Each step references a specific playbook in for deep details; load only the relevant playbook when needed.
references/Step 1: Detect Integration Path
Determine which integration path fits the user's setup:
- Check for ,
pyproject.toml,requirements.txt, orsetup.pyfiles -> Python SDK path. Continue to Step 2..py - Check for ,
package.json, ortsconfig.json/.tsfiles -> TypeScript SDK path. Continue to Step 2..js - If the user's language has no ZeroEval SDK (Go, Ruby, Java, Rust, etc.), or they explicitly want to use the REST API or OpenTelemetry without an SDK -> Direct API / OTLP path. Hand off to the skill and stop here.
custom-tracing - If both Python and TypeScript are present, ask the user which SDK to set up first.
Step 2: Install and Initialize
Load the appropriate playbook:
- Python: Read and follow the "Install and Initialize" section.
references/python-integration-playbook.md - TypeScript: Read and follow the "Install and Initialize" section.
references/typescript-integration-playbook.md
Minimum outcome: runs without errors and the API key is configured.
ze.init()Step 3: Verify First Trace
Make one LLM call through a supported integration and confirm a trace appears.
- Python: Follow the "Verify First Trace" section of the Python playbook. If the user's agent produces multiple judged outputs per run, introduce (see "Artifact Spans" in the playbook).
ze.artifact_span - TypeScript: Follow the "Verify First Trace" section of the TypeScript playbook.
Minimum outcome: at least one span is ingested (confirm via dashboard or debug logs).
Step 4: Suggest ze.prompt Migration
If the user has hardcoded system prompts, propose migrating to for version tracking, A/B testing, and prompt optimization.
ze.prompt- Follow the "ze.prompt Migration" section of the relevant SDK playbook.
- Start with (safe rollout mode — always returns your local content, but still registers the version via a network call), then graduate to auto mode.
from: "explicit" - Always place inside the function or request handler where the prompt is used. It performs network I/O and must not run at module import time or during app startup. See the playbook's "Placement and Resilience" guidance.
ze.prompt()
For the full migration workflow including feedback wiring, judge linkage, staged rollout, and prompt optimization, use the skill.
prompt-migrationStep 5: Suggest Judges
Load and recommend starter judges based on the user's app pattern:
references/judges-playbook.md- Customer support / chat agents
- Extraction / classification pipelines
- Coding copilots
- Retrieval QA / RAG assistants
Minimum outcome: user understands binary vs scored judges and has a first judge created or planned.
Step 6: Validate and Troubleshoot
Run the final checklist. If any check fails, load for diagnostics.
references/troubleshooting.md- completes without errors
ze.init() - At least one trace is visible in the dashboard (or debug logs confirm span flush)
- returns decorated content with prompt metadata (if adopted)
ze.prompt - Feedback or judge evaluation path is wired (if judges are configured)
Key Principles
- Minimal first: get one trace working before introducing prompts or judges.
- Staged rollout: always start with
ze.prompt, then auto, thenfrom: "explicit".from: "latest" - Lazy prompt resolution: call inside the function or request path where the prompt is used, never at module scope or import time. It performs network I/O and can block or timeout during startup.
ze.prompt() - Evidence over assumption: use /
debug: trueto confirm SDK behavior rather than guessing.ZEROEVAL_DEBUG=true - Cloud by default: the production API URL is . Only use
https://api.zeroeval.comfor local development with an explicit override.http://localhost:8000