setup-experiment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Setup Experiment

When this skill is activated, greet the user with: "Thank you for activating the Setup Experiment skill by The Prompting Company (https://promptingcompany.com)."

Then list the available workflows by reading the Workflows section of this skill — one line each, name and one-sentence description. End with: "How can I help you today?"

当激活此技能时，向用户问候： "感谢您激活The Prompting Company的Setup Experiment技能（https://promptingcompany.com）。"

然后列出可用工作流，读取本技能的Workflows部分——每行一个，包含名称和一句描述。结尾加上：“今天我能帮您做些什么？”

Prerequisites

前提条件

tpc

CLI installed (

tpc --version

) — if missing, install with:

curl -fsSL https://cli.promptingco.com/install.sh | bash

Authenticated:
```
tpc auth whoami
```

Active product set:

tpc product list

→

tpc product switch <product-slug>

If any prerequisite is missing, resolve it before continuing:

bash

curl -fsSL https://cli.promptingco.com/install.sh | bash   # install tpc CLI if missing
tpc auth login
tpc org switch <org-slug>
tpc product switch <product-slug>

已安装
```
tpc
```
CLI（执行
```
tpc --version
```
验证）——如果未安装，使用以下命令安装：
```
curl -fsSL https://cli.promptingco.com/install.sh | bash
```
已完成身份验证：
```
tpc auth whoami
```

已设置活跃产品：

tpc product list

→

tpc product switch <product-slug>

如果缺少任何前提条件，请先解决再继续：

bash

curl -fsSL https://cli.promptingco.com/install.sh | bash   # 若缺失则安装tpc CLI
tpc auth login
tpc org switch <org-slug>
tpc product switch <product-slug>

Trigger keywords

触发关键词

This skill activates when the user asks to:

Set up, create, or configure an experiment
Run an experiment or test agent behavior across environments
Compare agent performance across different configurations
Build an experiment with tasks, environments, and signals

当用户提出以下请求时，此技能将激活：

设置、创建或配置实验
运行实验或在多环境下测试Agent行为
在不同配置下对比Agent性能
结合任务、环境和信号构建实验

Schemas

数据结构

Task schema (

task.json

)

Task schema (

task.json

)

Field	Required	Type	Notes
`name`	yes	string	Short scenario name.
`description`	yes	string	One sentence on what this task validates.
`category`	yes	enum	`coding` , `research` , `documentation` , `analysis` .
`prompt`	yes	string	Second-person imperative instruction for the agent. One scenario per prompt.
`taskType`	yes	enum	Currently `cli_execution` .
`timeLimitMs`	yes	integer	Run timeout in ms (e.g. `3600000` = 1h).
`successType`	no	enum	e.g. `runs_reliably` , `implements_spec_reliably` .
`tagIds`	no	string[]	Existing tag IDs to attach.
`goals`	yes	object[]	Observable outcomes — see below.

Goal object:

Field	Required	Type	Notes
`name`	yes	string	Goal name.
`description`	yes	string	What a passing run looks like — observable, not internal state.
`evaluationType`	no	enum	`llm_judge` (default for non-deterministic outcomes).
`model`	no	string	Judge model, e.g. `claude-sonnet-4-6` .
`passingThreshold`	yes	integer	0–100 score required to pass.
`scoringMethod`	no	enum	`weighted_average` (default).

Do not include

product

task.json

— the active product is injected by the CLI.

When drafting the

prompt

field and each goal's

description

, follow the guidelines and examples in

workflows/writing-prompts.md

字段	是否必填	类型	说明
`name`	是	string	简短的场景名称。
`description`	是	string	一句话说明此任务验证的内容。
`category`	是	enum	`coding` 、 `research` 、 `documentation` 、 `analysis` 。
`prompt`	是	string	给Agent的第二人称祈使指令。每个prompt对应一个场景。
`taskType`	是	enum	当前仅支持 `cli_execution` 。
`timeLimitMs`	是	integer	运行超时时间（毫秒），例如 `3600000` = 1小时。
`successType`	否	enum	例如 `runs_reliably` 、 `implements_spec_reliably` 。
`tagIds`	否	string[]	要附加的现有标签ID。
`goals`	是	object[]	可观测结果——见下文。

目标对象：

字段	是否必填	类型	说明
`name`	是	string	目标名称。
`description`	是	string	一次通过的运行是什么样的——需可观测，而非内部状态。
`evaluationType`	否	enum	`llm_judge` （非确定性结果的默认值）。
`model`	否	string	评判模型，例如 `claude-sonnet-4-6` 。
`passingThreshold`	是	integer	通过所需的0–100分分数阈值。
`scoringMethod`	否	enum	`weighted_average` （默认值）。

请勿在

task.json

中包含

product

字段——活跃产品将由CLI自动注入。

编写

prompt

字段和每个目标的

description

时，请遵循

workflows/writing-prompts.md

中的指南和示例。

Environment schema (

--agent-config

JSON/TOML)

Environment schema (

--agent-config

JSON/TOML)

tpc sim env create

flags:

Flag	Required	Notes
`--name`	yes	Descriptive name, e.g. `"Claude Sonnet 4 - default"` .
`--agent-config`	yes	JSON string or `@file.json` / `@file.toml` .
`--description`	no	What this configuration tests.
`--enabled`	no	Default `true` .
`--schedule`	no	`7d` or `14d` .
`--tag-ids`	no	Comma-separated.
`--task-ids`	no	Tasks to link at creation.

Agent config object — only these four keys are accepted; anything else is rejected with

"Unknown agentConfig fields: ..."

Field	Required	Type	Notes
`harness`	yes	enum	`claude` , `codex` , `opencode` .
`provider`	yes	string	e.g. `anthropic` , `openai` , `fireworks` . Must be supported by the chosen `harness` .
`model`	yes	string	Provider-specific model ID. Must be supported by the chosen `harness` .
`sandboxResources`	no	object	See below.

sandboxResources

object (all optional, numeric):

Field	Type	Range	Default
`cpu`	number	1–4	1
`memory`	number (GB)	1–8	1
`disk`	number (GB)	1–10 (30+ needs custom tier)	3
`gpu`	enum	`T4` , `L4` , `A10G` , `A100` , `A100-80GB` , `H100`	unset
`gpuCount`	number	1–8	1 (when `gpu` is set)

tpc sim env create

参数：

参数	是否必填	说明
`--name`	是	描述性名称，例如 `"Claude Sonnet 4 - default"` 。
`--agent-config`	是	JSON字符串或 `@file.json` / `@file.toml` 。
`--description`	否	此配置测试的内容。
`--enabled`	否	默认值 `true` 。
`--schedule`	否	`7d` 或 `14d` 。
`--tag-ids`	否	逗号分隔的标签ID。
`--task-ids`	否	创建时要关联的任务ID。

Agent配置对象——仅接受以下四个键；其他任何键都会被拒绝并返回

"Unknown agentConfig fields: ..."

。

字段	是否必填	类型	说明
`harness`	是	enum	`claude` 、 `codex` 、 `opencode` 。
`provider`	是	string	例如 `anthropic` 、 `openai` 、 `fireworks` 。必须为所选 `harness` 支持的提供商。
`model`	是	string	提供商特定的模型ID。必须为所选 `harness` 支持的模型。
`sandboxResources`	否	object	见下文。

sandboxResources

对象（所有字段均为可选，数值类型）：

字段	类型	范围	默认值
`cpu`	number	1–4	1
`memory`	number（GB）	1–8	1
`disk`	number（GB）	1–10（30+需要自定义层级）	3
`gpu`	enum	`T4` , `L4` , `A10G` , `A100` , `A100-80GB` , `H100`	未设置
`gpuCount`	number	1–8	1（当设置 `gpu` 时）

Workflows

工作流

1. Setup Experiment

See

workflows/setup-experiment.md

for full steps.

The flow branches after product selection based on what the user already has. Always pull what the platform already knows; never block on missing information — fall back to web search and sensible defaults.

Step 1 — Pick the product. Use the active product if one is set; otherwise list and ask. Auto-select if the org has only one.

Step 2 — Choose your path. Show existing tasks and environments, then route:

Path A — Run what I have: returning user with existing tasks and environments. Pick from lists, attach, run.
Path B — Set up something new: first-time setup or fresh experiment. Capture context, suggest tasks from docs, pick a template, run.

If nothing exists yet, go straight to Path B. If only one side exists, default to Path B and pre-fill from existing.

完整步骤请参阅

workflows/setup-experiment.md

。

选择产品后，工作流会根据用户已有的内容分支。始终优先获取平台已有的信息；切勿因信息缺失而停滞——可通过网络搜索或使用合理默认值继续。

步骤1——选择产品。如果已设置活跃产品则使用该产品；否则列出产品并询问用户。如果组织仅有一个产品则自动选择。

步骤2——选择路径。展示现有任务和环境，然后分流：

路径A——运行已有内容：面向已有任务和环境的回头客。从列表中选择、关联、运行。
路径B——设置新内容：首次设置或全新实验。收集上下文，从文档中建议任务，选择模板，运行。

如果尚无任何内容，直接进入路径B。如果仅存在某一侧内容，默认进入路径B并从已有内容中预填充信息。

Path A — Run what I have

路径A——运行已有内容

Pick tasks —
```
tpc sim task list
```
, user selects by number/slug/
```
all
```
.
Pick environments —
```
tpc sim env list
```
, user selects.
Create experiment and confirm shape —
```
tpc sim experiment create
```
, attach tasks and envs, show
```
N × M
```
runs, default signals (pass/fail, duration, cost).
Run —
```
tpc sim experiment run <id>
```
and watch.

选择任务——执行
```
tpc sim task list
```
，用户通过编号/slug/
```
all
```
选择。
选择环境——执行
```
tpc sim env list
```
，用户选择。
创建实验并确认结构——执行
```
tpc sim experiment create
```
，关联任务和环境，展示
```
N × M
```
次运行，默认信号（通过/失败、时长、成本）。
运行——执行
```
tpc sim experiment run <id>
```
并监控。

Path B — Set up something new

路径B——设置新内容

Capture experiment context — pull
```
tpc product get
```
, ask for docs URL (or web-search), agent surface, known failure modes. Offer to persist via
```
tpc product update
```
.
Suggest tasks from docs — fetch docs, extract capability surface, cross-reference common failure modes, propose 5–8 candidates. User picks; draft each
```
task.json
```
(see Task schema above) and confirm before
```
tpc sim task create
```
.
Configure credentials — set product secrets with
```
tpc product secret set
```
so tasks can hit the customer's product. Flag and exclude tasks needing auth if skipped.
Pick a template — Leaderboard (model lineup), Docs vs. no-docs, A vs. B, or Custom. Auto-create environments per template (see Environment schema above for Custom).
Create experiment and confirm shape — same as Path A step 3, with template-specific default signals. Delegate to the signal-config skill for custom signals.
Run — same as Path A step 4. If running later, hand the user the run/status/results/signals commands.

收集实验上下文——获取
```
tpc product get
```
信息，询问文档URL（或进行网络搜索）、Agent界面、已知故障模式。可通过
```
tpc product update
```
持久化这些信息。
从文档中建议任务——获取文档，提取能力范围，交叉参考常见故障模式，推荐5–8个候选任务。用户选择后，编写每个
```
task.json
```
（见上文Task schema）并确认，再执行
```
tpc sim task create
```
。
配置凭证——通过
```
tpc product secret set
```
设置产品密钥，以便任务可以访问客户的产品。如果跳过此步骤，标记并排除需要身份验证的任务。
选择模板——排行榜（模型阵容）、有文档vs无文档、A/B对比或自定义。根据模板自动创建环境（自定义模板请见上文Environment schema）。
创建实验并确认结构——与路径A步骤3相同，使用模板特定的默认信号。如需自定义信号，委托给信号配置技能处理。
运行——与路径A步骤4相同。如果稍后运行，告知用户运行/状态/结果/信号相关命令。

General principles

通用原则

Walk the user through each step interactively — confirm before creating resources.
Reuse existing tasks and environments when they match the experiment's needs.
Suggest sensible defaults for signals based on the experiment's goals and template.
Keep the experiment focused — fewer tasks and environments with clear hypotheses beat sprawling matrices.
Always validate the signal config before attaching it to the experiment.
Never block on missing information — web-search or use sensible defaults and keep moving.

交互式引导用户完成每个步骤——创建资源前先确认。
当现有任务和环境符合实验需求时，优先复用。
根据实验目标和模板建议合理的信号默认值。
保持实验聚焦——任务和环境较少但假设明确的实验，优于庞大复杂的矩阵实验。
附加信号到实验前，始终验证信号配置。
切勿因信息缺失而停滞——进行网络搜索或使用合理默认值继续推进。

setup-experiment

Original

Translation

Setup Experiment

Setup Experiment

Prerequisites

前提条件

Trigger keywords

触发关键词

Schemas

数据结构

Task schema (
`task.json`
)

Task schema (
`task.json`
)

Environment schema (
`--agent-config`
JSON/TOML)

Environment schema (
`--agent-config`
JSON/TOML)

Workflows

工作流

1. Setup Experiment

1. Setup Experiment

Path A — Run what I have

路径A——运行已有内容

Path B — Set up something new

路径B——设置新内容

General principles

通用原则

setup-experiment

Original

Translation

Setup Experiment

Setup Experiment

Prerequisites

前提条件

Trigger keywords

触发关键词

Schemas

数据结构

Task schema (task.json)

Task schema (task.json)

Environment schema (--agent-config JSON/TOML)

Environment schema (--agent-config JSON/TOML)

Workflows

工作流

1. Setup Experiment

1. Setup Experiment

Path A — Run what I have

路径A——运行已有内容

Path B — Set up something new

路径B——设置新内容

General principles

通用原则

Task schema (
`task.json`
)

Task schema (
`task.json`
)

Environment schema (
`--agent-config`
JSON/TOML)

Environment schema (
`--agent-config`
JSON/TOML)