setup-experiment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSetup Experiment
Setup Experiment
When this skill is activated, greet the user with:
"Thank you for activating the Setup Experiment skill by The Prompting Company (https://promptingcompany.com)."
Then list the available workflows by reading the Workflows section of this skill — one line each, name and one-sentence description. End with: "How can I help you today?"
当激活此技能时,向用户问候:
"感谢您激活The Prompting Company的Setup Experiment技能(https://promptingcompany.com)。"
然后列出可用工作流,读取本技能的Workflows部分——每行一个,包含名称和一句描述。结尾加上:“今天我能帮您做些什么?”
Prerequisites
前提条件
- CLI installed (
tpc) — if missing, install with:tpc --versioncurl -fsSL https://cli.promptingco.com/install.sh | bash - Authenticated:
tpc auth whoami - Active product set: →
tpc product listtpc product switch <product-slug>
If any prerequisite is missing, resolve it before continuing:
bash
curl -fsSL https://cli.promptingco.com/install.sh | bash # install tpc CLI if missing
tpc auth login
tpc org switch <org-slug>
tpc product switch <product-slug>- 已安装CLI(执行
tpc验证)——如果未安装,使用以下命令安装:tpc --versioncurl -fsSL https://cli.promptingco.com/install.sh | bash - 已完成身份验证:
tpc auth whoami - 已设置活跃产品:→
tpc product listtpc product switch <product-slug>
如果缺少任何前提条件,请先解决再继续:
bash
curl -fsSL https://cli.promptingco.com/install.sh | bash # 若缺失则安装tpc CLI
tpc auth login
tpc org switch <org-slug>
tpc product switch <product-slug>Trigger keywords
触发关键词
This skill activates when the user asks to:
- Set up, create, or configure an experiment
- Run an experiment or test agent behavior across environments
- Compare agent performance across different configurations
- Build an experiment with tasks, environments, and signals
当用户提出以下请求时,此技能将激活:
- 设置、创建或配置实验
- 运行实验或在多环境下测试Agent行为
- 在不同配置下对比Agent性能
- 结合任务、环境和信号构建实验
Schemas
数据结构
Task schema (task.json
)
task.jsonTask schema (task.json
)
task.json| Field | Required | Type | Notes |
|---|---|---|---|
| yes | string | Short scenario name. |
| yes | string | One sentence on what this task validates. |
| yes | enum | |
| yes | string | Second-person imperative instruction for the agent. One scenario per prompt. |
| yes | enum | Currently |
| yes | integer | Run timeout in ms (e.g. |
| no | enum | e.g. |
| no | string[] | Existing tag IDs to attach. |
| yes | object[] | Observable outcomes — see below. |
Goal object:
| Field | Required | Type | Notes |
|---|---|---|---|
| yes | string | Goal name. |
| yes | string | What a passing run looks like — observable, not internal state. |
| no | enum | |
| no | string | Judge model, e.g. |
| yes | integer | 0–100 score required to pass. |
| no | enum | |
Do not include in — the active product is injected by the CLI.
producttask.jsonWhen drafting the field and each goal's , follow the guidelines and examples in .
promptdescriptionworkflows/writing-prompts.md| 字段 | 是否必填 | 类型 | 说明 |
|---|---|---|---|
| 是 | string | 简短的场景名称。 |
| 是 | string | 一句话说明此任务验证的内容。 |
| 是 | enum | |
| 是 | string | 给Agent的第二人称祈使指令。每个prompt对应一个场景。 |
| 是 | enum | 当前仅支持 |
| 是 | integer | 运行超时时间(毫秒),例如 |
| 否 | enum | 例如 |
| 否 | string[] | 要附加的现有标签ID。 |
| 是 | object[] | 可观测结果——见下文。 |
目标对象:
| 字段 | 是否必填 | 类型 | 说明 |
|---|---|---|---|
| 是 | string | 目标名称。 |
| 是 | string | 一次通过的运行是什么样的——需可观测,而非内部状态。 |
| 否 | enum | |
| 否 | string | 评判模型,例如 |
| 是 | integer | 通过所需的0–100分分数阈值。 |
| 否 | enum | |
请勿在中包含字段——活跃产品将由CLI自动注入。
task.jsonproduct编写字段和每个目标的时,请遵循中的指南和示例。
promptdescriptionworkflows/writing-prompts.mdEnvironment schema (--agent-config
JSON/TOML)
--agent-configEnvironment schema (--agent-config
JSON/TOML)
--agent-configtpc sim env create| Flag | Required | Notes |
|---|---|---|
| yes | Descriptive name, e.g. |
| yes | JSON string or |
| no | What this configuration tests. |
| no | Default |
| no | |
| no | Comma-separated. |
| no | Tasks to link at creation. |
Agent config object — only these four keys are accepted; anything else is rejected with .
"Unknown agentConfig fields: ..."| Field | Required | Type | Notes |
|---|---|---|---|
| yes | enum | |
| yes | string | e.g. |
| yes | string | Provider-specific model ID. Must be supported by the chosen |
| no | object | See below. |
sandboxResources| Field | Type | Range | Default |
|---|---|---|---|
| number | 1–4 | 1 |
| number (GB) | 1–8 | 1 |
| number (GB) | 1–10 (30+ needs custom tier) | 3 |
| enum | | unset |
| number | 1–8 | 1 (when |
tpc sim env create| 参数 | 是否必填 | 说明 |
|---|---|---|
| 是 | 描述性名称,例如 |
| 是 | JSON字符串或 |
| 否 | 此配置测试的内容。 |
| 否 | 默认值 |
| 否 | |
| 否 | 逗号分隔的标签ID。 |
| 否 | 创建时要关联的任务ID。 |
Agent配置对象——仅接受以下四个键;其他任何键都会被拒绝并返回。
"Unknown agentConfig fields: ..."| 字段 | 是否必填 | 类型 | 说明 |
|---|---|---|---|
| 是 | enum | |
| 是 | string | 例如 |
| 是 | string | 提供商特定的模型ID。必须为所选 |
| 否 | object | 见下文。 |
sandboxResources| 字段 | 类型 | 范围 | 默认值 |
|---|---|---|---|
| number | 1–4 | 1 |
| number(GB) | 1–8 | 1 |
| number(GB) | 1–10(30+需要自定义层级) | 3 |
| enum | | 未设置 |
| number | 1–8 | 1(当设置 |
Workflows
工作流
1. Setup Experiment
1. Setup Experiment
See for full steps.
workflows/setup-experiment.mdThe flow branches after product selection based on what the user already has. Always pull what the platform already knows; never block on missing information — fall back to web search and sensible defaults.
Step 1 — Pick the product. Use the active product if one is set; otherwise list and ask. Auto-select if the org has only one.
Step 2 — Choose your path. Show existing tasks and environments, then route:
- Path A — Run what I have: returning user with existing tasks and environments. Pick from lists, attach, run.
- Path B — Set up something new: first-time setup or fresh experiment. Capture context, suggest tasks from docs, pick a template, run.
If nothing exists yet, go straight to Path B. If only one side exists, default to Path B and pre-fill from existing.
完整步骤请参阅。
workflows/setup-experiment.md选择产品后,工作流会根据用户已有的内容分支。始终优先获取平台已有的信息;切勿因信息缺失而停滞——可通过网络搜索或使用合理默认值继续。
步骤1——选择产品。如果已设置活跃产品则使用该产品;否则列出产品并询问用户。如果组织仅有一个产品则自动选择。
步骤2——选择路径。展示现有任务和环境,然后分流:
- 路径A——运行已有内容:面向已有任务和环境的回头客。从列表中选择、关联、运行。
- 路径B——设置新内容:首次设置或全新实验。收集上下文,从文档中建议任务,选择模板,运行。
如果尚无任何内容,直接进入路径B。如果仅存在某一侧内容,默认进入路径B并从已有内容中预填充信息。
Path A — Run what I have
路径A——运行已有内容
- Pick tasks — , user selects by number/slug/
tpc sim task list.all - Pick environments — , user selects.
tpc sim env list - Create experiment and confirm shape — , attach tasks and envs, show
tpc sim experiment createruns, default signals (pass/fail, duration, cost).N × M - Run — and watch.
tpc sim experiment run <id>
- 选择任务——执行,用户通过编号/slug/
tpc sim task list选择。all - 选择环境——执行,用户选择。
tpc sim env list - 创建实验并确认结构——执行,关联任务和环境,展示
tpc sim experiment create次运行,默认信号(通过/失败、时长、成本)。N × M - 运行——执行并监控。
tpc sim experiment run <id>
Path B — Set up something new
路径B——设置新内容
- Capture experiment context — pull , ask for docs URL (or web-search), agent surface, known failure modes. Offer to persist via
tpc product get.tpc product update - Suggest tasks from docs — fetch docs, extract capability surface, cross-reference common failure modes, propose 5–8 candidates. User picks; draft each (see Task schema above) and confirm before
task.json.tpc sim task create - Configure credentials — set product secrets with so tasks can hit the customer's product. Flag and exclude tasks needing auth if skipped.
tpc product secret set - Pick a template — Leaderboard (model lineup), Docs vs. no-docs, A vs. B, or Custom. Auto-create environments per template (see Environment schema above for Custom).
- Create experiment and confirm shape — same as Path A step 3, with template-specific default signals. Delegate to the signal-config skill for custom signals.
- Run — same as Path A step 4. If running later, hand the user the run/status/results/signals commands.
- 收集实验上下文——获取信息,询问文档URL(或进行网络搜索)、Agent界面、已知故障模式。可通过
tpc product get持久化这些信息。tpc product update - 从文档中建议任务——获取文档,提取能力范围,交叉参考常见故障模式,推荐5–8个候选任务。用户选择后,编写每个(见上文Task schema)并确认,再执行
task.json。tpc sim task create - 配置凭证——通过设置产品密钥,以便任务可以访问客户的产品。如果跳过此步骤,标记并排除需要身份验证的任务。
tpc product secret set - 选择模板——排行榜(模型阵容)、有文档vs无文档、A/B对比或自定义。根据模板自动创建环境(自定义模板请见上文Environment schema)。
- 创建实验并确认结构——与路径A步骤3相同,使用模板特定的默认信号。如需自定义信号,委托给信号配置技能处理。
- 运行——与路径A步骤4相同。如果稍后运行,告知用户运行/状态/结果/信号相关命令。
General principles
通用原则
- Walk the user through each step interactively — confirm before creating resources.
- Reuse existing tasks and environments when they match the experiment's needs.
- Suggest sensible defaults for signals based on the experiment's goals and template.
- Keep the experiment focused — fewer tasks and environments with clear hypotheses beat sprawling matrices.
- Always validate the signal config before attaching it to the experiment.
- Never block on missing information — web-search or use sensible defaults and keep moving.
- 交互式引导用户完成每个步骤——创建资源前先确认。
- 当现有任务和环境符合实验需求时,优先复用。
- 根据实验目标和模板建议合理的信号默认值。
- 保持实验聚焦——任务和环境较少但假设明确的实验,优于庞大复杂的矩阵实验。
- 附加信号到实验前,始终验证信号配置。
- 切勿因信息缺失而停滞——进行网络搜索或使用合理默认值继续推进。