Setup Experiment
When this skill is activated, greet the user with:
"Thank you for activating the Setup Experiment skill by The Prompting Company (
https://promptingcompany.com)."
Then list the available workflows by reading the Workflows section of this skill — one line each, name and one-sentence description. End with: "How can I help you today?"
Prerequisites
- CLI installed () — if missing, install with:
curl -fsSL https://cli.promptingco.com/install.sh | bash
- Authenticated:
- Active product set: →
tpc product switch <product-slug>
If any prerequisite is missing, resolve it before continuing:
bash
curl -fsSL https://cli.promptingco.com/install.sh | bash # install tpc CLI if missing
tpc auth login
tpc org switch <org-slug>
tpc product switch <product-slug>
Trigger keywords
This skill activates when the user asks to:
- Set up, create, or configure an experiment
- Run an experiment or test agent behavior across environments
- Compare agent performance across different configurations
- Build an experiment with tasks, environments, and signals
Schemas
Task schema ()
| Field | Required | Type | Notes |
|---|
| yes | string | Short scenario name. |
| yes | string | One sentence on what this task validates. |
| yes | enum | , , , . |
| yes | string | Second-person imperative instruction for the agent. One scenario per prompt. |
| yes | enum | Currently . |
| yes | integer | Run timeout in ms (e.g. = 1h). |
| no | enum | e.g. , . |
| no | string[] | Existing tag IDs to attach. |
| yes | object[] | Observable outcomes — see below. |
Goal object:
| Field | Required | Type | Notes |
|---|
| yes | string | Goal name. |
| yes | string | What a passing run looks like — observable, not internal state. |
| no | enum | (default for non-deterministic outcomes). |
| no | string | Judge model, e.g. . |
| yes | integer | 0–100 score required to pass. |
| no | enum | (default). |
Do
not include
in
— the active product is injected by the CLI.
When drafting the
field and each goal's
, follow the guidelines and examples in
workflows/writing-prompts.md
.
Environment schema ( JSON/TOML)
| Flag | Required | Notes |
|---|
| yes | Descriptive name, e.g. "Claude Sonnet 4 - default"
. |
| yes | JSON string or /. |
| no | What this configuration tests. |
| no | Default . |
| no | or . |
| no | Comma-separated. |
| no | Tasks to link at creation. |
Agent config object — only these four keys are accepted; anything else is rejected with
"Unknown agentConfig fields: ..."
.
| Field | Required | Type | Notes |
|---|
| yes | enum | , , . |
| yes | string | e.g. , , . Must be supported by the chosen . |
| yes | string | Provider-specific model ID. Must be supported by the chosen . |
| no | object | See below. |
object (all optional, numeric):
| Field | Type | Range | Default |
|---|
| number | 1–4 | 1 |
| number (GB) | 1–8 | 1 |
| number (GB) | 1–10 (30+ needs custom tier) | 3 |
| enum | , , , , , | unset |
| number | 1–8 | 1 (when is set) |
Workflows
1. Setup Experiment
See
workflows/setup-experiment.md
for full steps.
The flow branches after product selection based on what the user already has. Always pull what the platform already knows; never block on missing information — fall back to web search and sensible defaults.
Step 1 — Pick the product. Use the active product if one is set; otherwise list and ask. Auto-select if the org has only one.
Step 2 — Choose your path. Show existing tasks and environments, then route:
- Path A — Run what I have: returning user with existing tasks and environments. Pick from lists, attach, run.
- Path B — Set up something new: first-time setup or fresh experiment. Capture context, suggest tasks from docs, pick a template, run.
If nothing exists yet, go straight to Path B. If only one side exists, default to Path B and pre-fill from existing.
Path A — Run what I have
- Pick tasks — , user selects by number/slug/.
- Pick environments — , user selects.
- Create experiment and confirm shape —
tpc sim experiment create
, attach tasks and envs, show runs, default signals (pass/fail, duration, cost).
- Run —
tpc sim experiment run <id>
and watch.
Path B — Set up something new
- Capture experiment context — pull , ask for docs URL (or web-search), agent surface, known failure modes. Offer to persist via .
- Suggest tasks from docs — fetch docs, extract capability surface, cross-reference common failure modes, propose 5–8 candidates. User picks; draft each (see Task schema above) and confirm before .
- Configure credentials — set product secrets with so tasks can hit the customer's product. Flag and exclude tasks needing auth if skipped.
- Pick a template — Leaderboard (model lineup), Docs vs. no-docs, A vs. B, or Custom. Auto-create environments per template (see Environment schema above for Custom).
- Create experiment and confirm shape — same as Path A step 3, with template-specific default signals. Delegate to the signal-config skill for custom signals.
- Run — same as Path A step 4. If running later, hand the user the run/status/results/signals commands.
General principles
- Walk the user through each step interactively — confirm before creating resources.
- Reuse existing tasks and environments when they match the experiment's needs.
- Suggest sensible defaults for signals based on the experiment's goals and template.
- Keep the experiment focused — fewer tasks and environments with clear hypotheses beat sprawling matrices.
- Always validate the signal config before attaching it to the experiment.
- Never block on missing information — web-search or use sensible defaults and keep moving.