Execute a YAML test plan, run setup commands, health checks, and each test sequentially. Stop on first failure with rich debug output.
执行YAML测试计划,依次运行设置命令、健康检查和各项测试。首次失败时停止并输出丰富的调试信息。
- agent-browser skill: Browser tests require the
agent-browser:agent-browser
skill to be available
- agent-browser skill:浏览器测试需要
agent-browser:agent-browser
技能可用
- : Path to test plan (default:
docs/testing/test-plan.yaml
)
- : Skip setup commands and health checks (for re-running after failure)
- :测试计划路径(默认值:
docs/testing/test-plan.yaml
)
- :跳过设置命令和健康检查(用于失败后重新运行)
Step 1: Parse Test Plan
步骤1:解析测试计划
Read and validate the test plan:
Check file exists
检查文件是否存在
ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }
ls docs/testing/test-plan.yaml || { echo "Error: Test plan not found"; exit 1; }
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }
Extract from the YAML:
- `setup.commands`: List of setup commands
- `setup.health_checks`: List of URLs to poll
- `tests`: Array of test cases
python3 -c "import yaml; yaml.safe_load(open('docs/testing/test-plan.yaml'))" || { echo "Error: Invalid YAML"; exit 1; }
从YAML中提取内容:
- `setup.commands`:设置命令列表
- `setup.health_checks`:轮询的URL列表
- `tests`:测试用例数组
Step 2: Run Setup (unless --skip-setup)
步骤2:运行设置(除非使用--skip-setup)
2a. Check Prerequisites
2a. 检查前置条件
If
exists, verify each one:
For each prerequisite in setup.prerequisites
针对setup.prerequisites中的每一项前置条件
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
<prerequisite.check> || { echo "Prerequisite not met: <prerequisite.name>"; exit 1; }
2b. Set Environment Variables
2b. 设置环境变量
If
exists, export each variable. Variables using
syntax should be resolved from the current environment:
如果存在
,导出每个变量。使用
语法的变量应从当前环境解析:
For each key/value in setup.env
针对setup.env中的每一组键值对
If
exists, execute build commands sequentially:
For each command in setup.build
针对setup.build中的每一条命令
<command> || { echo "Build failed: <command>"; exit 1; }
<command> || { echo "Build failed: <command>"; exit 1; }
2d. Start Services
2d. 启动服务
If
exists, start long-running processes and wait for health checks:
For each service in setup.services
针对setup.services中的每一项服务
nohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid
For each service with a `health_check`, poll until ready:
```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0
while [ $elapsed -lt $timeout ]; do
if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
echo "✓ Health check passed: $url"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $timeout ]; then
echo "✗ Health check timeout: $url"
exit 1
fi
nohup <service.command> > .beagle/service-<index>.log 2>&1 &
echo $! > .beagle/service-<index>.pid
对于带有`health_check`的服务,轮询直到就绪:
```bash
timeout=<service.health_check.timeout or 30>
url=<service.health_check.url>
elapsed=0
while [ $elapsed -lt $timeout ]; do
if curl -s -o /dev/null -w "%{http_code}" "$url" | grep -qE "^(200|301|302)"; then
echo "✓ Health check passed: $url"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $timeout ]; then
echo "✗ Health check timeout: $url"
exit 1
fi
2e. Legacy Setup Format
2e. 旧版设置格式
If the plan uses the older flat format (
+
instead of
/
/
), fall back to executing
sequentially and polling
as before.
如果计划使用旧版扁平格式(
+
而非
/
/
),则回退到依次执行
并按之前的方式轮询
。
Step 4: Execute Tests Sequentially
步骤4:依次执行测试
For each test in the plan:
4a. Log Test Start
4a. 记录测试开始
Running: TC-XX - <test.name>
运行中:TC-XX - <test.name>
4b. Execute Steps
4b. 执行步骤
For each step in
, determine the step type and execute accordingly:
The most common step type. Execute the command via Bash and capture stdout, stderr, and exit code:
最常见的步骤类型。通过Bash执行命令并捕获标准输出、标准错误和退出码:
Execute the command, capture output and exit code
执行命令,捕获输出和退出码
<command> 2>&1
echo "EXIT_CODE: $?"
Capture all output for evaluation in step 4c. Shell steps cover:
- CLI binary invocations (e.g., `./target/debug/myapp status --all`)
- Database queries (e.g., `psql "${DATABASE_URL}" -c "SELECT ..."`)
- File inspection (e.g., `ls -la /path/to/expected/output`)
- Process lifecycle checks (e.g., `timeout 5 ./myapp 2>&1 || true`)
- Any other command a human would type in a terminal
**curl actions (`action: curl` steps):**
```bash
curl -X <method> \
-H "Content-Type: application/json" \
<additional headers> \
-d '<body>' \
"<url>" \
-o response.json \
-w "%{http_code}" > status_code.txt
<command> 2>&1
echo "EXIT_CODE: $?"
捕获所有输出以便在步骤4c中评估。Shell步骤涵盖:
- CLI二进制调用(例如:`./target/debug/myapp status --all`)
- 数据库查询(例如:`psql "${DATABASE_URL}" -c "SELECT ..."`)
- 文件检查(例如:`ls -la /path/to/expected/output`)
- 进程生命周期检查(例如:`timeout 5 ./myapp 2>&1 || true`)
- 人类在终端中输入的任何其他命令
**curl操作(`action: curl`步骤):**
```bash
curl -X <method> \
-H "Content-Type: application/json" \
<additional headers> \
-d '<body>' \
"<url>" \
-o response.json \
-w "%{http_code}" > status_code.txt
Capture response for evaluation
捕获响应以便评估
cat response.json
cat status_code.txt
**agent-browser CLI actions:**
Steps starting with `agent-browser` are browser automation commands:
```bash
cat response.json
cat status_code.txt
**agent-browser CLI操作:**
以`agent-browser`开头的步骤是浏览器自动化命令:
```bash
Snapshot interactive elements (always do before interacting)
快照交互元素(交互前必须执行)
agent-browser snapshot -i
agent-browser snapshot -i
Interact using refs from snapshot output (@e1, @e2, etc.)
使用快照输出中的引用(@e1、@e2等)进行交互
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>
agent-browser fill @<ref> "<value>"
agent-browser click @<ref>
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle
agent-browser wait --url "<pattern>"
agent-browser wait --text "<text>"
agent-browser wait --load networkidle
agent-browser screenshot docs/testing/evidence/<test.id>.png
**Important:** Always run `agent-browser snapshot -i` before interacting with elements to get valid refs, and re-snapshot after navigation or significant DOM changes.
Save screenshots to `docs/testing/evidence/<test.id>.png`
agent-browser screenshot docs/testing/evidence/<test.id>.png
**重要提示:** 在与元素交互前务必运行`agent-browser snapshot -i`以获取有效引用,在导航或DOM发生重大变化后重新快照。
将截图保存到`docs/testing/evidence/<test.id>.png`
4c. Evaluate Result
4c. 评估结果
Using agent reasoning, compare actual outcome against
:
- Read the expected behavior description
- Compare with actual response/screenshot
- Determine PASS or FAIL
- 读取预期行为描述
- 与实际响应/截图对比
- 判断通过或失败
markdown
✓ TC-XX PASSED: <test.name>
Continue to next test.
markdown
✓ TC-XX 通过:<test.name>
继续执行下一个测试。
Stop immediately. Go to Step 6.
Step 5: On All Tests Pass
步骤5:所有测试通过时
Test Results: ALL PASSED
测试结果:全部通过
| ID | Name | Result |
|---|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✓ PASS |
| ... | ... | ... |
Total: N/N tests passed
| ID | 名称 | 结果 |
|---|
| TC-01 | <name> | ✓ 通过 |
| TC-02 | <name> | ✓ 通过 |
| ... | ... | ... |
总计: N/N 测试通过
Stopping background services...
Kill background services
终止后台服务
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
Step 6: On Failure - Generate Debug Prompt
步骤6:测试失败时 - 生成调试提示
When a test fails, generate rich debug output:
6a. Gather Context
6a. 收集上下文
Get changed files relevant to the failure
获取与失败相关的变更文件
git diff --name-only $(git merge-base HEAD origin/main)..HEAD
git diff --name-only $(git merge-base HEAD origin/main)..HEAD
Get recent changes in files mentioned in test.context
获取test.context中提及文件的近期变更
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
git diff $(git merge-base HEAD origin/main)..HEAD -- <relevant_files>
6b. Output Debug Report
6b. 输出调试报告
Test Failure: TC-XX - <test.name>
测试失败:TC-XX - <test.name>
Test: <test.name>
Expected:
<test.expected>
Actual:
<Describe what actually happened - response code, error message, screenshot description>
测试: <test.name>
预期:
<test.expected>
实际:
<描述实际发生的情况 - 响应码、错误信息、截图说明>
Relevant Changes in This PR
本次PR中的相关变更
<For each file mentioned in test.context or related to the failure:>
- `<file>` (lines X-Y) - <brief description of changes>
<针对test.context中提及或与失败相关的每个文件:>
<If screenshot exists:>
- Screenshot: `docs/testing/evidence/<test.id>.png`
<If API response:>
- Status code: <code>
- Response body:
```json
<response>
```
<如果存在截图:>
- 截图:
docs/testing/evidence/<test.id>.png
<如果存在API响应:>
<If error message in response or logs:>
```
<error message>
```
Suggested Investigation
建议的调查方向
<Based on the error, suggest 2-3 specific things to check:>
-
<First thing to check based on error type>
-
<Second thing related to changed files>
- <Third thing about environment/setup>
<基于错误,建议2-3项具体检查内容:>
- <基于错误类型的首要检查内容>
- <与变更文件相关的次要检查内容>
- <关于环境/设置的检查内容>
Debug Session Prompt
调试会话提示
Copy this to start a new Claude session:
I'm debugging a test failure in branch
.
Test: <test.name>
Error: <brief error description>
<Summarize what the test was checking and what went wrong>
Relevant files:
<List changed files related to this test>
Help me investigate why <specific failure reason>.
复制以下内容启动新的Claude会话:
测试: <test.name>
错误: <错误简要描述>
<总结测试的检查内容及失败原因>
相关文件:
<列出与该测试相关的变更文件>
请帮助我调查为什么会出现<具体失败原因>。
6c. Preserve Evidence
6c. 保存证据
Ensure evidence directory exists
确保证据目录存在
mkdir -p docs/testing/evidence
mkdir -p docs/testing/evidence
Save failure context
保存失败上下文
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
cat > docs/testing/evidence/<test.id>-failure.md << 'EOF'
Failure Report: <test.id>
失败报告:<test.id>
<Full debug report content>
EOF
```
6d. Cleanup and Exit
6d. 清理并退出
Kill background services
终止后台服务
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
for pidfile in .beagle/service-*.pid .beagle/dev-server.pid; do
if [ -f "$pidfile" ]; then
kill $(cat "$pidfile") 2>/dev/null
rm "$pidfile"
fi
done
Test Results Summary Table
测试结果汇总表
Always output a summary table showing progress:
| ID | Name | Result |
|---|
| TC-01 | <name> | ✓ PASS |
| TC-02 | <name> | ✗ FAIL |
| TC-03 | <name> | - SKIP |
Passed: 1/3
Failed: TC-02
Tests after a failure are marked as SKIP (not executed).
| ID | 名称 | 结果 |
|---|
| TC-01 | <name> | ✓ 通过 |
| TC-02 | <name> | ✗ 失败 |
| TC-03 | <name> | - 跳过 |
通过: 1/3
失败: TC-02
Verify evidence directory exists
验证证据目录存在
ls -la docs/testing/evidence/
ls -la docs/testing/evidence/
List captured evidence
列出捕获的证据
ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null
**Verification Checklist:**
- [ ] Setup commands executed successfully
- [ ] Health checks passed before test execution
- [ ] Each executed test has recorded result
- [ ] Evidence captured in `docs/testing/evidence/`
- [ ] On failure: debug prompt includes expected vs actual
- [ ] On failure: relevant PR changes listed
- [ ] Background processes cleaned up
ls docs/testing/evidence/.png docs/testing/evidence/.md 2>/dev/null
**验证清单:**
- [ ] 设置命令执行成功
- [ ] 测试执行前健康检查通过
- [ ] 每个已执行测试都记录了结果
- [ ] 证据已捕获至`docs/testing/evidence/`
- [ ] 失败时:调试提示包含预期与实际对比
- [ ] 失败时:列出了相关PR变更
- [ ] 后台进程已清理
- [ ] 失败证据已保存用于调试
- [ ] 调试提示可直接复制粘贴到新会话
- Stop on first test failure (do not continue to other tests)
- Always capture evidence (screenshots, responses)
- Include file:line references in debug prompts when possible
- Use flag to re-run after fixing issues
- Never hardcode secrets - use environment variables
- Clean up background processes even on failure
- Preserve failure evidence for debugging
- Make debug prompts copy-paste ready for new sessions
- 首次测试失败即停止(不继续执行其他测试)
- 始终捕获证据(截图、响应)
- 调试提示中尽可能包含文件:行号引用
- 使用标志在修复问题后重新运行
- 切勿硬编码密钥 - 使用环境变量
- 即使失败也要清理后台进程
- 保存失败证据用于调试
- 使调试提示可直接复制粘贴到新会话