run-test-plan

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Run Test Plan

运行测试计划

Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.

执行YAML测试计划，依次运行设置命令、健康检查和各项测试。首次失败时停止并输出丰富的调试信息。

Prerequisites

前置条件

agent-browser skill: Browser tests require the
```
agent-browser:agent-browser
```
skill to be available

agent-browser skill：浏览器测试需要
```
agent-browser:agent-browser
```
技能可用

Arguments

参数

--plan <path>

: Path to test plan (default:

docs/testing/test-plan.yaml

)

```
--skip-setup
```
: Skip setup commands and health checks (for re-running after failure)

--plan <path>

：测试计划路径（默认值：

docs/testing/test-plan.yaml

）

```
--skip-setup
```
：跳过设置命令和健康检查（用于失败后重新运行）

Step 1: Parse Test Plan

步骤1：解析测试计划

Read and validate the test plan:

bash

undefined

读取并验证测试计划：

bash

undefined

Check file exists

检查文件是否存在

ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }

Validate YAML

验证YAML格式

python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }


Extract from the YAML:
- `setup.commands`: List of setup commands
- `setup.health_checks`: List of URLs to poll
- `tests`: Array of test cases

python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }


从YAML中提取内容：
- `setup.commands`：设置命令列表
- `setup.health_checks`：轮询的URL列表
- `tests`：测试用例数组

Step 2: Run Setup (unless --skip-setup)

步骤2：运行设置（除非使用--skip-setup）

2a. Check Prerequisites

2a. 检查前置条件

setup.prerequisites

exists, verify each one:

bash

undefined

如果存在

setup.prerequisites

，验证每一项：

bash

undefined

For each prerequisite in setup.prerequisites

针对setup.prerequisites中的每一项前置条件

<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }

undefined

<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }

undefined

2b. Set Environment Variables

2b. 设置环境变量

setup.env

exists, export each variable. Variables using

${VAR}

syntax should be resolved from the current environment:

bash

undefined

如果存在

setup.env

，导出每个变量。使用

${VAR}

语法的变量应从当前环境解析：

bash

undefined

For each key/value in setup.env

针对setup.env中的每一组键值对

export <key>="<value>"

undefined

export <key>="<value>"

undefined

2c. Build

2c. 构建

setup.build

exists, execute build commands sequentially:

bash

undefined

如果存在

setup.build

，依次执行构建命令：

bash

undefined

For each command in setup.build

针对setup.build中的每一条命令

<command> || { echo "Build failed: <command>"; exit 1; }

undefined

<command> || { echo "Build failed: <command>"; exit 1; }

undefined

2d. Start Services

2d. 启动服务

setup.services

exists, start long-running processes and wait for health checks:

bash

undefined

如果存在

setup.services

，启动长期运行的进程并等待健康检查：

bash

undefined

For each service in setup.services

针对setup.services中的每一项服务

nohup <service.command> > .beagle/service-<index>.log 2>&1 & echo $! > .beagle/service-<index>.pid


For each service with a `health_check`, poll until ready:

```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0

while [ $elapsed -lt $timeout ]; do
  if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
    echo "✓ Health check passed: $url"
    break
  fi
  sleep 2
  elapsed=$((elapsed + 2))
done

if [ $elapsed -ge $timeout ]; then
  echo "✗ Health check timeout: $url"
  exit 1
fi

nohup <service.command> > .beagle/service-<index>.log 2>&1 & echo $! > .beagle/service-<index>.pid


对于带有`health_check`的服务，轮询直到就绪：

```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0

while [ $elapsed -lt $timeout ]; do
  if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
    echo "✓ Health check passed: $url"
    break
  fi
  sleep 2
  elapsed=$((elapsed + 2))
done

if [ $elapsed -ge $timeout ]; then
  echo "✗ Health check timeout: $url"
  exit 1
fi

2e. Legacy Setup Format

2e. 旧版设置格式

If the plan uses the older flat format (

setup.commands

setup.health_checks

instead of

prerequisites

build

services

), fall back to executing

setup.commands

sequentially and polling

setup.health_checks

as before.

如果计划使用旧版扁平格式（

setup.commands

setup.health_checks

而非

prerequisites

build

services

），则回退到依次执行

setup.commands

并按之前的方式轮询

setup.health_checks

。

Step 4: Execute Tests Sequentially

步骤4：依次执行测试

For each test in the plan:

针对计划中的每个测试：

4a. Log Test Start

4a. 记录测试开始

markdown

undefined

markdown

undefined

Running: TC-XX - <test.name>

运行中：TC-XX - <test.name>

Context: <test.context>

undefined

上下文：<test.context>

undefined

4b. Execute Steps

4b. 执行步骤

For each step in

test.steps

, determine the step type and execute accordingly:

Shell commands (
run:
steps):

The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:

bash

undefined

针对

test.steps

中的每个步骤，确定步骤类型并执行：

Shell命令（
run:
步骤）：

最常见的步骤类型。通过Bash执行命令并捕获标准输出、标准错误和退出码：

bash

undefined

Execute the command, capture output and exit code

执行命令，捕获输出和退出码

<command> 2>&1 echo "EXIT_CODE: $?"


Capture all output for evaluation in step 4c. Shell steps cover:
- CLI binary invocations (e.g., `./target/debug/myapp status --all`)
- Database queries (e.g., `psql "${DATABASE_URL}" -c "SELECT ..."`)
- File inspection (e.g., `ls -la /path/to/expected/output`)
- Process lifecycle checks (e.g., `timeout 5 ./myapp 2>&1 || true`)
- Any other command a human would type in a terminal

**curl actions (`action: curl` steps):**

```bash
curl -X <method> \
  -H "Content-Type: application/json" \
  <additional headers> \
  -d '<body>' \
  "<url>" \
  -o response.json \
  -w "%{http_code}" > status_code.txt

<command> 2>&1 echo "EXIT_CODE: $?"


捕获所有输出以便在步骤4c中评估。Shell步骤涵盖：
- CLI二进制调用（例如：`./target/debug/myapp status --all`）
- 数据库查询（例如：`psql "${DATABASE_URL}" -c "SELECT ..."`）
- 文件检查（例如：`ls -la /path/to/expected/output`）
- 进程生命周期检查（例如：`timeout 5 ./myapp 2>&1 || true`）
- 人类在终端中输入的任何其他命令

**curl操作（`action: curl`步骤）：**

```bash
curl -X <method> \
  -H "Content-Type: application/json" \
  <additional headers> \
  -d '<body>' \
  "<url>" \
  -o response.json \
  -w "%{http_code}" > status_code.txt

Capture response for evaluation

捕获响应以便评估

cat response.json cat status_code.txt


**agent-browser CLI actions:**

Steps starting with `agent-browser` are browser automation commands:

```bash

cat response.json cat status_code.txt


**agent-browser CLI操作：**

以`agent-browser`开头的步骤是浏览器自动化命令：

```bash

Navigate

Snapshot interactive elements (always do before interacting)

快照交互元素（交互前必须执行）

agent-browser snapshot -i

Interact using refs from snapshot output (@e1, @e2, etc.)

使用快照输出中的引用（@e1、@e2等）进行交互

agent-browser fill @<ref> "<value>" agent-browser click @<ref>

Wait for conditions

等待条件

agent-browser wait --url "<pattern>" agent-browser wait --text "<text>" agent-browser wait --load networkidle

Capture evidence

捕获证据

agent-browser screenshot docs/testing/evidence/<test.id>.png


**Important:** Always run `agent-browser snapshot -i` before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.

Save screenshots to `docs/testing/evidence/<test.id>.png`

agent-browser screenshot docs/testing/evidence/<test.id>.png


**重要提示：** 在与元素交互前务必运行`agent-browser snapshot -i`以获取有效引用，在导航或DOM发生重大变化后重新快照。

将截图保存到`docs/testing/evidence/<test.id>.png`

4c. Evaluate Result

4c. 评估结果

Using agent reasoning, compare actual outcome against

test.expected

Read the expected behavior description
Compare with actual response/screenshot
Determine PASS or FAIL

使用agent推理能力，将实际结果与

test.expected

进行比较：

读取预期行为描述
与实际响应/截图对比
判断通过或失败

4d. On PASS

4d. 测试通过时

markdown

✓ TC-XX PASSED: <test.name>

Continue to next test.

markdown

✓ TC-XX 通过：<test.name>

继续执行下一个测试。

4e. On FAIL

4e. 测试失败时

Stop immediately. Go to Step 6.

立即停止。进入步骤6。

Step 5: On All Tests Pass

步骤5：所有测试通过时

markdown

undefined

markdown

undefined

Test Results: ALL PASSED

测试结果：全部通过

ID	Name	Result
TC-01	<name>	✓ PASS
TC-02	<name>	✓ PASS
...	...	...

Total: N/N tests passed

ID	名称	结果
TC-01	<name>	✓ 通过
TC-02	<name>	✓ 通过
...	...	...

总计： N/N 测试通过

Evidence

证据

Screenshots saved to

docs/testing/evidence/

截图已保存至

docs/testing/evidence/

Cleanup

清理

Stopping background services...


Clean up:
```bash

正在停止后台服务...


执行清理：
```bash

Kill background services

终止后台服务

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done

undefined

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done

undefined

Step 6: On Failure - Generate Debug Prompt

步骤6：测试失败时 - 生成调试提示

When a test fails, generate rich debug output:

当测试失败时，生成丰富的调试输出：

6a. Gather Context

6a. 收集上下文

bash

undefined

bash

undefined

Get changed files relevant to the failure

获取与失败相关的变更文件

git diff --name-only $(git merge-base HEAD origin/main)..HEAD

Get recent changes in files mentioned in test.context

获取test.context中提及文件的近期变更

git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>

undefined

git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>

undefined

6b. Output Debug Report

6b. 输出调试报告

markdown

undefined

markdown

undefined

Test Failure: TC-XX - <test.name>

测试失败：TC-XX - <test.name>

What Failed

失败内容

Test: <test.name> Expected: <test.expected>

Actual: <Describe what actually happened - response code, error message, screenshot description>

测试： <test.name> 预期： <test.expected>

实际： <描述实际发生的情况 - 响应码、错误信息、截图说明>

Relevant Changes in This PR

本次PR中的相关变更

<For each file mentioned in test.context or related to the failure:> - `<file>` (lines X-Y) - <brief description of changes>

<针对test.context中提及或与失败相关的每个文件：>

```
<file>
```
（第X-Y行）- <变更简要说明>

Evidence

证据

<If screenshot exists:> - Screenshot: `docs/testing/evidence/<test.id>.png` <If API response:> - Status code: <code> - Response body: ```json <response> ```

<如果存在截图：>

截图：
```
docs/testing/evidence/<test.id>.png
```

<如果存在API响应：>

状态码：<code>
响应体：

json

<response>

Error Details

错误详情

<If error message in response or logs:> ``` <error message> ```

<如果响应或日志中有错误信息：>

<error message>

Suggested Investigation

建议的调查方向

<First thing to check based on error type>
<Second thing related to changed files>
<Third thing about environment/setup>

<基于错误，建议2-3项具体检查内容：>

<基于错误类型的首要检查内容>
<与变更文件相关的次要检查内容>
<关于环境/设置的检查内容>

Debug Session Prompt

调试会话提示

Copy this to start a new Claude session:

I'm debugging a test failure in branch

<branch>

Test: <test.name> Error: <brief error description>

Relevant files: <List changed files related to this test>

Help me investigate why <specific failure reason>.

undefined

复制以下内容启动新的Claude会话：

我正在分支

<branch>

中调试测试失败问题。

测试： <test.name> 错误： <错误简要描述>

<总结测试的检查内容及失败原因>

相关文件： <列出与该测试相关的变更文件>

请帮助我调查为什么会出现<具体失败原因>。

undefined

6c. Preserve Evidence

6c. 保存证据

bash

undefined

bash

undefined

Ensure evidence directory exists

确保证据目录存在

mkdir -p docs/testing/evidence

Save failure context

保存失败上下文

cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'

Failure Report: <test.id>

失败报告：<test.id>

<Full debug report content> EOF ```

<完整调试报告内容> EOF

undefined

6d. Cleanup and Exit

6d. 清理并退出

bash

undefined

bash

undefined

Kill background services

终止后台服务

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done

undefined

for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do if [ -f "$pidfile" ]; then kill $(cat "$pidfile") 2>/dev/null rm "$pidfile" fi done

undefined

Test Results Summary Table

测试结果汇总表

Always output a summary table showing progress:

markdown

undefined

始终输出显示进度的汇总表：

markdown

undefined

Test Results

测试结果

ID	Name	Result
TC-01	<name>	✓ PASS
TC-02	<name>	✗ FAIL
TC-03	<name>	- SKIP

Passed: 1/3 Failed: TC-02


Tests after a failure are marked as SKIP (not executed).

ID	名称	结果
TC-01	<name>	✓ 通过
TC-02	<name>	✗ 失败
TC-03	<name>	- 跳过

通过： 1/3 失败： TC-02


失败后的测试标记为SKIP（未执行）。

Verification

验证

Before completing:

bash

undefined

完成前执行：

bash

undefined

Verify evidence directory exists

验证证据目录存在

ls -la docs/testing/evidence/

List captured evidence

列出捕获的证据

ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null


**Verification Checklist:**
- [ ] Setup commands executed successfully
- [ ] Health checks passed before test execution
- [ ] Each executed test has recorded result
- [ ] Evidence captured in `docs/testing/evidence/`
- [ ] On failure: debug prompt includes expected vs actual
- [ ] On failure: relevant PR changes listed
- [ ] Background processes cleaned up

ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null


**验证清单：**
- [ ] 设置命令执行成功
- [ ] 测试执行前健康检查通过
- [ ] 每个已执行测试都记录了结果
- [ ] 证据已捕获至`docs/testing/evidence/`
- [ ] 失败时：调试提示包含预期与实际对比
- [ ] 失败时：列出了相关PR变更
- [ ] 后台进程已清理
- [ ] 失败证据已保存用于调试
- [ ] 调试提示可直接复制粘贴到新会话

Rules

规则

Stop on first test failure (do not continue to other tests)
Always capture evidence (screenshots, responses)
Include file:line references in debug prompts when possible
Use
```
--skip-setup
```
flag to re-run after fixing issues
Never hardcode secrets - use environment variables
Clean up background processes even on failure
Preserve failure evidence for debugging
Make debug prompts copy-paste ready for new sessions

首次测试失败即停止（不继续执行其他测试）
始终捕获证据（截图、响应）
调试提示中尽可能包含文件:行号引用
使用
```
--skip-setup
```
标志在修复问题后重新运行
切勿硬编码密钥 - 使用环境变量
即使失败也要清理后台进程
保存失败证据用于调试
使调试提示可直接复制粘贴到新会话