monitor-ci
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMonitor CI Command
CI监控命令
You are the orchestrator for monitoring Nx Cloud CI pipeline executions and handling self-healing fixes. You spawn the subagent to poll CI status and make decisions based on the results.
ci-monitor-subagent你是监控Nx Cloud CI流水线执行并处理自修复操作的协调器。你需要启动子代理来轮询CI状态,并根据结果做出决策。
ci-monitor-subagentContext
上下文
- Current Branch: !
git branch --show-current - Current Commit: !
git rev-parse --short HEAD - Remote Status: !
git status -sb | head -1
- 当前分支: !
git branch --show-current - 当前提交: !
git rev-parse --short HEAD - 远程状态: !
git status -sb | head -1
User Instructions
用户指令
$ARGUMENTS
Important: If user provides specific instructions, respect them over default behaviors described below.
$ARGUMENTS
重要说明: 如果用户提供了特定指令,请优先遵循用户指令,而非以下默认行为。
Configuration Defaults
默认配置
| Setting | Default | Description |
|---|---|---|
| 10 | Maximum agent-initiated CI Attempt cycles before timeout |
| 120 | Maximum duration in minutes |
| medium | Output level: minimal, medium, verbose |
| (auto-detect) | Branch to monitor |
| 30 | Subagent polling timeout in minutes |
| false | Ignore previous context, start fresh |
| false | Attempt common fixes for pre-CI-Attempt failures (e.g., lockfile updates) |
| 10 | Minutes to wait for new CI Attempt after action |
| 3 | Max local verification + enhance cycles before pushing to CI |
Parse any overrides from and merge with defaults.
$ARGUMENTS| 设置项 | 默认值 | 说明 |
|---|---|---|
| 10 | 代理发起的CI尝试循环的最大次数,超过则超时 |
| 120 | 最大持续时长(分钟) |
| medium | 输出级别:minimal(极简)、medium(中等)、verbose(详细) |
| (自动检测) | 要监控的分支 |
| 30 | 子代理轮询超时时间(分钟) |
| false | 忽略之前的上下文,重新开始 |
| false | 尝试修复CI尝试前的常见失败问题(如锁文件更新) |
| 10 | 执行操作后等待新CI尝试的时长(分钟) |
| 3 | 推送到CI前,本地验证和优化循环的最大次数 |
从中解析任何覆盖配置,并与默认值合并。
$ARGUMENTSNx Cloud Connection Check
Nx Cloud连接检查
CRITICAL: Before starting the monitoring loop, verify the workspace is connected to Nx Cloud.
关键步骤: 开始监控循环前,请验证工作区已连接到Nx Cloud。
Step 0: Verify Nx Cloud Connection
步骤0:验证Nx Cloud连接
-
Checkat workspace root for
nx.jsonornxCloudIdnxCloudAccessToken -
Ifmissing OR neither property exists → exit with:
nx.json[monitor-ci] Nx Cloud not connected. Unlock 70% faster CI and auto-fix broken PRs with https://nx.dev/nx-cloud -
If connected → continue to main loop
-
检查工作区根目录的文件,确认是否存在
nx.json或nxCloudIdnxCloudAccessToken -
如果缺失或两个属性都不存在 → 退出并提示:
nx.json[monitor-ci] Nx Cloud未连接。通过https://nx.dev/nx-cloud解锁70%更快的CI速度和自动修复损坏PR的功能 -
如果已连接 → 进入主循环
Anti-Patterns (NEVER DO)
反模式(绝对禁止)
CRITICAL: The following behaviors are strictly prohibited:
| Anti-Pattern | Why It's Bad |
|---|---|
Using CI provider CLIs with | Bypasses Nx Cloud self-healing entirely |
| Writing custom CI polling scripts | Unreliable, pollutes context, no self-healing |
| Cancelling CI workflows/pipelines | Destructive, loses CI progress |
| Running CI checks on main agent | Wastes main agent context tokens |
If this skill fails to activate, the fallback is:
- Use CI provider CLI for READ-ONLY status check (single call, no watch/polling flags)
- Immediately delegate to this skill with gathered context
- NEVER continue polling on main agent
CI provider CLIs are acceptable ONLY for:
- One-time read of PR/pipeline status
- Getting PR/branch metadata
- NOT for continuous monitoring or watch mode
关键说明: 以下行为严格禁止:
| 反模式 | 危害原因 |
|---|---|
使用带有 | 完全绕过Nx Cloud的自修复功能 |
| 编写自定义CI轮询脚本 | 不可靠、污染上下文、无自修复能力 |
| 取消CI工作流/流水线 | 破坏性操作,丢失CI进度 |
| 在主代理上运行CI检查 | 浪费主代理的上下文令牌 |
如果此技能无法激活,降级方案为:
- 使用CI提供商CLI进行只读状态检查(单次调用,不使用watch/轮询标志)
- 立即将上下文委托给此技能
- 绝对不要在主代理上继续轮询
CI提供商CLI仅可用于:
- 单次读取PR/流水线状态
- 获取PR/分支元数据
- 不可用于持续监控或watch模式
Session Context Behavior
会话上下文行为
Important: Within a Claude Code session, conversation context persists. If you Ctrl+C to interrupt the monitor and re-run , Claude remembers the previous state and may continue from where it left off.
/monitor-ci- To continue monitoring: Just re-run (context is preserved)
/monitor-ci - To start fresh: Use to ignore previous context
/monitor-ci --fresh - For a completely clean slate: Exit Claude Code and restart
claude
重要说明: 在Claude Code会话中,对话上下文会保留。如果你按Ctrl+C中断监控并重新运行,Claude会记住之前的状态,并可能从中断处继续。
/monitor-ci- 继续监控: 只需重新运行(上下文已保留)
/monitor-ci - 重新开始: 使用忽略之前的上下文
/monitor-ci --fresh - 完全重置: 退出Claude Code并重新启动
claude
Default Behaviors by Status
不同状态下的默认行为
The subagent returns with one of the following statuses. This table defines the default behavior for each status. User instructions can override any of these.
| Status | Default Behavior |
|---|---|
| Exit with success. Log "CI passed successfully!" |
| Fix will be auto-applied by self-healing. Do NOT call MCP. Record |
| Compare |
| Self-healing failed to generate fix. Attempt local fix based on |
| Call MCP to request rerun: |
| CI failed, no fix available (self-healing disabled or not executable). Attempt local fix if possible. Otherwise exit with failure. |
| Expected CI Attempt never spawned (CI workflow likely failed before Nx tasks). Report to user, attempt common fixes if configured, or exit with guidance. |
| Subagent polling timeout reached. Exit with timeout. |
| CI Attempt was canceled. Exit with canceled status. |
| CI Attempt timed out. Exit with timeout status. |
| CI Attempt exists but failed with no task data (likely infrastructure issue). Retry once with empty commit. If retry fails, exit with failure and guidance. |
| Increment |
子代理会返回以下状态之一。下表定义了每种状态的默认行为。用户指令可覆盖任何默认行为。
| 状态 | 默认行为 |
|---|---|
| 成功退出。日志输出"CI passed successfully!" |
| 自修复系统将自动应用修复。请勿调用MCP。记录 |
| 对比 |
| 自修复生成修复失败。根据 |
| 调用MCP请求重新运行: |
| CI失败,无可用修复(自修复已禁用或无法执行)。如果可能,尝试本地修复。否则退出并标记为失败。 |
| 预期的CI尝试从未启动(CI工作流可能在Nx任务执行前就失败了)。向用户报告,如果已配置则尝试常见修复,或退出并提供指导。 |
| 子代理轮询超时。退出并标记为超时。 |
| CI尝试已被取消。退出并标记为已取消。 |
| CI尝试超时。退出并标记为超时。 |
| CI尝试已创建,但在失败前未记录任何Nx任务(通常是基础设施问题)。使用空提交重试一次。如果重试失败,退出并标记为失败,同时提供指导。 |
| 增加 |
Fix Available Decision Logic
修复可用决策逻辑
When subagent returns , main agent compares vs :
fix_availablefailedTaskIdsverifiedTaskIds当子代理返回时,主代理会对比和:
fix_availablefailedTaskIdsverifiedTaskIdsStep 1: Categorize Tasks
步骤1:任务分类
- Verified tasks = tasks in both AND
failedTaskIdsverifiedTaskIds - Unverified tasks = tasks in but NOT in
failedTaskIdsverifiedTaskIds - E2E tasks = unverified tasks where target contains "e2e" (task format: or
<project>:<target>)<project>:<target>:<config> - Verifiable tasks = unverified tasks that are NOT e2e
- 已验证任务 = 同时存在于和
failedTaskIds中的任务verifiedTaskIds - 未验证任务 = 存在于但不存在于
failedTaskIds中的任务verifiedTaskIds - E2E任务 = 目标包含"e2e"的未验证任务(任务格式:或
<project>:<target>)<project>:<target>:<config> - 可验证任务 = 非E2E的未验证任务
Step 2: Determine Path
步骤2:确定处理路径
| Condition | Path |
|---|---|
| No unverified tasks (all verified) | Apply via MCP |
| Unverified tasks exist, but ALL are e2e | Apply via MCP (treat as verified enough) |
| Verifiable tasks exist | Local verification flow |
| 条件 | 处理路径 |
|---|---|
| 无未验证任务(全部已验证) | 通过MCP应用 |
| 存在未验证任务,但全部为E2E任务 | 通过MCP应用(视为已充分验证) |
| 存在可验证任务 | 本地验证流程 |
Step 3a: Apply via MCP (fully/e2e-only verified)
步骤3a:通过MCP应用(完全验证/仅E2E验证)
- Call
update_self_healing_fix({ shortLink, action: "APPLY" }) - Record , spawn subagent in wait mode
last_cipe_url
- 调用
update_self_healing_fix({ shortLink, action: "APPLY" }) - 记录,启动处于等待模式的子代理
last_cipe_url
Step 3b: Local Verification Flow
步骤3b:本地验证流程
When verifiable (non-e2e) unverified tasks exist:
-
Detect package manager:
- exists →
pnpm-lock.yamlpnpm nx - exists →
yarn.lockyarn nx - Otherwise →
npx nx
-
Run verifiable tasks in parallel:
- Spawn subagents to run each task concurrently
general - Each subagent runs:
<pm> nx run <taskId> - Collect pass/fail results from all subagents
- Spawn
-
Evaluate results:
| Result | Action |
|---|---|
| ALL verifiable tasks pass | Apply via MCP |
| ANY verifiable task fails | Apply-locally + enhance flow |
-
Apply-locally + enhance flow:
- Run
nx apply-locally <shortLink> - Enhance the code to fix failing tasks
- Run failing tasks again to verify fix
- If still failing → increment , loop back to enhance
local_verify_count - If passing → commit and push, record , spawn subagent in wait mode
expected_commit_sha
- Run
-
Track attempts (wraps step 4):
-
Incrementafter each enhance cycle
local_verify_count -
If(default: 3):
local_verify_count >= local_verify_attempts-
Get code in commit-able state
-
Commit and push with message indicating local verification failed
-
Report to user:
[monitor-ci] Local verification failed after <N> attempts. Pushed to CI for final validation. Failed: <taskIds> -
Record, spawn subagent in wait mode (let CI be final judge)
expected_commit_sha
-
-
当存在可验证(非E2E)的未验证任务时:
-
检测包管理器:
- 存在→ 使用
pnpm-lock.yamlpnpm nx - 存在→ 使用
yarn.lockyarn nx - 否则 → 使用
npx nx
- 存在
-
并行运行可验证任务:
- 启动子代理以并发运行每个任务
general - 每个子代理运行:
<pm> nx run <taskId> - 收集所有子代理的通过/失败结果
- 启动
-
评估结果:
| 结果 | 操作 |
|---|---|
| 所有可验证任务通过 | 通过MCP应用 |
| 任何可验证任务失败 | 本地应用+优化流程 |
-
本地应用+优化流程:
- 运行
nx apply-locally <shortLink> - 优化代码以修复失败任务
- 重新运行失败任务以验证修复
- 如果仍然失败→增加计数,回到优化步骤循环
local_verify_count - 如果通过→提交并推送,记录,启动处于等待模式的子代理
expected_commit_sha
- 运行
-
追踪尝试次数(包裹步骤4):
-
每次优化循环后增加计数
local_verify_count -
如果(默认:3):
local_verify_count >= local_verify_attempts-
将代码调整为可提交状态
-
提交并推送,提交信息需说明本地验证失败
-
向用户报告:
[monitor-ci] 本地验证在<N>次尝试后失败。已推送到CI进行最终验证。失败任务:<taskIds> -
记录,启动处于等待模式的子代理(让CI做最终判断)
expected_commit_sha
-
-
Commit Message Format
提交信息格式
bash
git commit -m "fix(<projects>): <brief description>
Failed tasks: <taskId1>, <taskId2>
Local verification: passed|enhanced|failed-pushing-to-ci"Git Safety: Only stage and commit files that were modified as part of the fix. Users may have concurrent local changes (local publish, WIP features, config tweaks) that must NOT be committed. NEVER use or — always stage specific files by name.
git add -Agit add .bash
git commit -m "fix(<projects>): <简要描述>
Failed tasks: <taskId1>, <taskId2>
Local verification: passed|enhanced|failed-pushing-to-ci"Git安全注意事项: 仅暂存并提交修复过程中修改的文件。用户可能有并发的本地更改(本地发布、WIP功能、配置调整),这些绝对不能被提交。绝对不要使用或 — 始终按文件名暂存特定文件。
git add -Agit add .Unverified Fix Flow (No Verification Attempted)
未验证修复流程(未尝试验证)
When is , , or fix has with no verification:
verificationStatusFAILEDNOT_EXECUTABLEcouldAutoApplyTasks != true- Analyze fix content (,
suggestedFix,suggestedFixReasoning)taskOutputSummary - If fix looks correct → apply via MCP
- If fix needs enhancement → use Apply Locally + Enhance Flow above
- If fix is wrong → reject via MCP, fix from scratch, commit, push
当为、,或修复的且未进行验证时:
verificationStatusFAILEDNOT_EXECUTABLEcouldAutoApplyTasks != true- 分析修复内容(、
suggestedFix、suggestedFixReasoning)taskOutputSummary - 如果修复看起来正确→通过MCP应用
- 如果修复需要优化→使用上述的本地应用+优化流程
- 如果修复错误→通过MCP拒绝,从头修复,提交并推送
Auto-Apply Eligibility
自动应用资格
The field indicates whether the fix is eligible for automatic application:
couldAutoApplyTasks- : Fix is eligible for auto-apply. Subagent keeps polling while verification is in progress. Returns
truewhen verified, orfix_auto_applyingif verification fails.fix_available - or
false: Fix requires manual action (apply via MCP, apply locally, or reject)null
Key point: When subagent returns , do NOT call MCP to apply - self-healing handles it. Just spawn a new subagent in wait mode.
fix_auto_applyingcouldAutoApplyTasks- : 修复符合自动应用条件。子代理会在验证过程中持续轮询。验证通过时返回
true,验证失败时返回fix_auto_applying。fix_available - 或
false: 修复需要手动操作(通过MCP应用、本地应用或拒绝)null
关键点: 当子代理返回时,请勿调用MCP应用修复 — 自修复系统会处理。只需启动一个处于等待模式的新子代理。
fix_auto_applyingApply vs Reject vs Apply Locally
应用、拒绝与本地应用的区别
- Apply via MCP: Calls . Self-healing agent applies the fix in CI and a new CI Attempt spawns automatically. No local git operations needed.
update_self_healing_fix({ shortLink, action: "APPLY" }) - Apply Locally: Runs . Applies the patch to your local working directory and sets state to
nx apply-locally <shortLink>. Use this when you want to enhance the fix before pushing.APPLIED_LOCALLY - Reject via MCP: Calls . Marks fix as rejected. Use only when the fix is completely wrong and you'll fix from scratch.
update_self_healing_fix({ shortLink, action: "REJECT" })
- 通过MCP应用: 调用。自修复代理会在CI中应用修复,新的CI尝试会自动启动。无需本地Git操作。
update_self_healing_fix({ shortLink, action: "APPLY" }) - 本地应用: 运行。将补丁应用到本地工作目录,并将状态设置为
nx apply-locally <shortLink>。当你想在推送前优化修复时使用此方式。APPLIED_LOCALLY - 通过MCP拒绝: 调用。标记修复为已拒绝。仅当修复完全错误且你需要从头修复时使用。
update_self_healing_fix({ shortLink, action: "REJECT" })
Apply Locally + Enhance Flow
本地应用+优化流程
When the fix needs enhancement (use , NOT reject):
nx apply-locally-
Apply the patch locally:(this also updates state to
nx apply-locally <shortLink>)APPLIED_LOCALLY -
Make additional changes as needed
-
Stage only the files you modified:
git add <file1> <file2> ... -
Commit and push:bash
git commit -m "fix: resolve <failedTaskIds>" git push origin $(git branch --show-current) -
Loop to poll for new CI Attempt
当修复需要优化时(使用,而非拒绝):
nx apply-locally-
本地应用补丁:(这也会将状态更新为
nx apply-locally <shortLink>)APPLIED_LOCALLY -
根据需要进行额外修改
-
仅暂存你修改的文件:
git add <file1> <file2> ... -
提交并推送:bash
git commit -m "fix: resolve <failedTaskIds>" git push origin $(git branch --show-current) -
循环轮询新的CI尝试
Reject + Fix From Scratch Flow
拒绝+从头修复流程
When the fix is completely wrong:
-
Call MCP to reject:
update_self_healing_fix({ shortLink, action: "REJECT" }) -
Fix the issue from scratch locally
-
Stage only the files you modified:
git add <file1> <file2> ... -
Commit and push:bash
git commit -m "fix: resolve <failedTaskIds>" git push origin $(git branch --show-current) -
Loop to poll for new CI Attempt
当修复完全错误时:
-
调用MCP拒绝:
update_self_healing_fix({ shortLink, action: "REJECT" }) -
从头本地修复问题
-
仅暂存你修改的文件:
git add <file1> <file2> ... -
提交并推送:bash
git commit -m "fix: resolve <failedTaskIds>" git push origin $(git branch --show-current) -
循环轮询新的CI尝试
Environment Issue Handling
环境问题处理
When :
failureClassification == 'ENVIRONMENT_STATE'- Call MCP to request rerun:
update_self_healing_fix({ shortLink, action: "RERUN_ENVIRONMENT_STATE" }) - New CI Attempt spawns automatically (no local git operations needed)
- Loop to poll for new CI Attempt with set
previousCipeUrl
当时:
failureClassification == 'ENVIRONMENT_STATE'- 调用MCP请求重新运行:
update_self_healing_fix({ shortLink, action: "RERUN_ENVIRONMENT_STATE" }) - 新的CI尝试会自动启动(无需本地Git操作)
- 设置后循环轮询新的CI尝试
previousCipeUrl
No-New-CI-Attempt Handling
无新CI尝试处理
When :
status == 'no_new_cipe'This means the expected CI Attempt was never created - CI likely failed before Nx tasks could run.
-
Report to user:
[monitor-ci] No CI attempt for <sha> after 10 min. Check CI provider for pre-Nx failures (install, checkout, auth). Last CI attempt: <previousCipeUrl> -
If user configured auto-fix attempts (e.g.,):
--auto-fix-workflow-
Detect package manager: check for,
pnpm-lock.yaml,yarn.lockpackage-lock.json -
Run install to update lockfile:bash
pnpm install # or npm install / yarn install -
If lockfile changed:bash
git add pnpm-lock.yaml # or appropriate lockfile git commit -m "chore: update lockfile" git push origin $(git branch --show-current) -
Record new commit SHA, loop to poll with
expectedCommitSha
-
-
Otherwise: Exit withstatus, providing guidance for user to investigate
no_new_cipe
当时:
status == 'no_new_cipe'这意味着预期的CI尝试从未创建 — CI可能在Nx任务运行前就失败了。
-
向用户报告:
[monitor-ci] 10分钟后仍未找到<sha>对应的CI尝试。请检查CI提供商的Nx前失败(安装、检出、认证)。上次CI尝试:<previousCipeUrl> -
如果用户配置了自动修复尝试(如):
--auto-fix-workflow-
检测包管理器:检查是否存在、
pnpm-lock.yaml、yarn.lockpackage-lock.json -
运行安装命令更新锁文件:bash
pnpm install # 或 npm install / yarn install -
如果锁文件已更改:bash
git add pnpm-lock.yaml # 或对应的锁文件 git commit -m "chore: update lockfile" git push origin $(git branch --show-current) -
记录新的提交SHA,设置后循环轮询
expectedCommitSha
-
-
否则: 以状态退出,并为用户提供调查指导
no_new_cipe
CI-Attempt-No-Tasks Handling
CI尝试无任务处理
When :
status == 'cipe_no_tasks'This means the CI Attempt was created but no Nx tasks were recorded before it failed. Common causes:
- CI timeout before tasks could run
- Critical infrastructure error
- Memory/resource exhaustion
- Network issues connecting to Nx Cloud
-
Report to user:
[monitor-ci] CI failed but no Nx tasks were recorded. [monitor-ci] CI Attempt URL: <cipeUrl> [monitor-ci] [monitor-ci] This usually indicates an infrastructure issue. Attempting retry... -
Create empty commit to retry CI:bash
git commit --allow-empty -m "chore: retry ci [monitor-ci]" git push origin $(git branch --show-current) -
Record, spawn subagent in wait mode
expected_commit_sha -
If retry also returns:
cipe_no_tasks-
Exit with failure
-
Provide guidance:
[monitor-ci] Retry failed. Please check: [monitor-ci] 1. Nx Cloud UI: <cipeUrl> [monitor-ci] 2. CI provider logs (GitHub Actions, GitLab CI, etc.) [monitor-ci] 3. CI job timeout settings [monitor-ci] 4. Memory/resource limits
-
当时:
status == 'cipe_no_tasks'这意味着CI尝试已创建,但在失败前未记录任何Nx任务。常见原因:
- 任务运行前CI超时
- 严重基础设施错误
- 内存/资源耗尽
- 连接Nx Cloud时出现网络问题
-
向用户报告:
[monitor-ci] CI失败,但未记录任何Nx任务。 [monitor-ci] CI尝试URL:<cipeUrl> [monitor-ci] [monitor-ci] 这通常表示基础设施问题。正在尝试重试... -
创建空提交以重试CI:bash
git commit --allow-empty -m "chore: retry ci [monitor-ci]" git push origin $(git branch --show-current) -
记录,启动处于等待模式的子代理
expected_commit_sha -
如果重试仍返回:
cipe_no_tasks-
退出并标记为失败
-
提供指导:
[monitor-ci] 重试失败。请检查: [monitor-ci] 1. Nx Cloud UI:<cipeUrl> [monitor-ci] 2. CI提供商日志(GitHub Actions、GitLab CI等) [monitor-ci] 3. CI作业超时设置 [monitor-ci] 4. 内存/资源限制
-
Exit Conditions
退出条件
Exit the monitoring loop when ANY of these conditions are met:
| Condition | Exit Type |
|---|---|
CI passes ( | Success |
| Max agent-initiated cycles reached (after user declines ext) | Timeout |
| Max duration reached | Timeout |
| 3 consecutive no-progress iterations | Circuit breaker |
| No fix available and local fix not possible | Failure |
| No new CI Attempt and auto-fix not configured | Pre-CI-Attempt failure |
| User cancels | Cancelled |
当满足以下任一条件时,退出监控循环:
| 条件 | 退出类型 |
|---|---|
CI通过( | 成功 |
| 达到代理发起的循环次数上限(用户拒绝扩展后) | 超时 |
| 达到最大持续时长 | 超时 |
| 连续3次迭代无进展 | 熔断 |
| 无可用修复且无法进行本地修复 | 失败 |
| 无新CI尝试且未配置自动修复 | CI尝试前失败 |
| 用户取消 | 已取消 |
Main Loop
主循环
Step 1: Initialize Tracking
步骤1:初始化追踪
cycle_count = 0 # Only incremented for agent-initiated cycles (counted against --max-cycles)
start_time = now()
no_progress_count = 0
local_verify_count = 0
last_state = null
last_cipe_url = null
expected_commit_sha = null
agent_triggered = false # Set true after monitor takes an action that triggers new CI Attemptcycle_count = 0 # 仅针对代理发起的循环计数(计入--max-cycles)
start_time = now()
no_progress_count = 0
local_verify_count = 0
last_state = null
last_cipe_url = null
expected_commit_sha = null
agent_triggered = false # 当监控器执行触发新CI尝试的操作时设置为trueStep 2: Spawn Subagent and Monitor Output
步骤2:启动子代理并监控输出
Spawn the subagent to poll CI status. Run in background so you can actively monitor and relay its output to the user.
ci-monitor-subagentFresh start (first spawn, no expected CI Attempt):
Task(
agent: "ci-monitor-subagent",
run_in_background: true,
prompt: "Monitor CI for branch '<branch>'.
Subagent timeout: <subagent-timeout> minutes.
New-CI-Attempt timeout: <new-cipe-timeout> minutes.
Verbosity: <verbosity>."
)After action that triggers new CI Attempt (wait mode):
Task(
agent: "ci-monitor-subagent",
run_in_background: true,
prompt: "Monitor CI for branch '<branch>'.
Subagent timeout: <subagent-timeout> minutes.
New-CI-Attempt timeout: <new-cipe-timeout> minutes.
Verbosity: <verbosity>.
WAIT MODE: A new CI Attempt should spawn. Ignore old CI Attempt until new one appears.
Expected commit SHA: <expected_commit_sha>
Previous CI Attempt URL: <last_cipe_url>"
)启动子代理以轮询CI状态。在后台运行,以便你可以主动监控并将其输出转发给用户。
ci-monitor-subagent全新启动(首次启动,无预期CI尝试):
Task(
agent: "ci-monitor-subagent",
run_in_background: true,
prompt: "Monitor CI for branch '<branch>'.
Subagent timeout: <subagent-timeout> minutes.
New-CI-Attempt timeout: <new-cipe-timeout> minutes.
Verbosity: <verbosity>."
)执行触发新CI尝试的操作后(等待模式):
Task(
agent: "ci-monitor-subagent",
run_in_background: true,
prompt: "Monitor CI for branch '<branch>'.
Subagent timeout: <subagent-timeout> minutes.
New-CI-Attempt timeout: <new-cipe-timeout> minutes.
Verbosity: <verbosity>.
WAIT MODE: A new CI Attempt should spawn. Ignore old CI Attempt until new one appears.
Expected commit SHA: <expected_commit_sha>
Previous CI Attempt URL: <last_cipe_url>"
)Step 2a: Active Output Monitoring (CRITICAL)
步骤2a:主动输出监控(关键)
The subagent's text output is NOT visible to users when running in background. You MUST actively monitor and relay its output. Do NOT passively wait for completion.
After spawning the background subagent, enter a monitoring loop:
- Every 60 seconds, check the subagent output using
TaskOutput(task_id, block=false) - Parse new lines since your last check — look for and
[ci-monitor]prefixed lines⚡ - Relay to user based on verbosity:
- : Only relay
minimalcritical transition lines⚡ - : Relay all
mediumstatus lines[ci-monitor] - : Relay all subagent output
verbose
- Continue until returns a completed status
TaskOutput - When complete, proceed to Step 3 with the final subagent response
Example monitoring loop output:
[monitor-ci] Checking subagent status... (elapsed: 1m)
[monitor-ci] CI: IN_PROGRESS | Self-healing: NOT_STARTED
[monitor-ci] Checking subagent status... (elapsed: 3m)
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS
[monitor-ci] ⚡ CI failed — self-healing fix generation started
[monitor-ci] Checking subagent status... (elapsed: 5m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED | Verification: IN_PROGRESS
[monitor-ci] ⚡ Self-healing fix generated — verification startedNEVER do this:
- Spawn subagent and passively say "Waiting for results..."
- Check once and say "Still working, I'll wait"
- Only show output when the subagent finishes
子代理在后台运行时,其文本输出对用户不可见。 你必须主动监控并转发其输出。绝对不要被动等待完成。
启动后台子代理后,进入监控循环:
- 每60秒,使用检查子代理输出
TaskOutput(task_id, block=false) - 解析自上次检查以来的新行 — 查找以和
[ci-monitor]为前缀的行⚡ - 根据详细程度转发给用户:
- : 仅转发
minimal开头的关键状态转换行⚡ - : 转发所有
medium开头的状态行[ci-monitor] - : 转发所有子代理输出
verbose
- 持续监控直到返回完成状态
TaskOutput - 完成后,使用子代理的最终响应进入步骤3
监控循环输出示例:
[monitor-ci] 检查子代理状态... (已耗时: 1m)
[monitor-ci] CI: IN_PROGRESS | 自修复: NOT_STARTED
[monitor-ci] 检查子代理状态... (已耗时: 3m)
[monitor-ci] CI: FAILED | 自修复: IN_PROGRESS
[monitor-ci] ⚡ CI失败 — 自修复修复生成已启动
[monitor-ci] 检查子代理状态... (已耗时: 5m)
[monitor-ci] CI: FAILED | 自修复: COMPLETED | 验证: IN_PROGRESS
[monitor-ci] ⚡ 自修复修复已生成 — 验证已启动绝对禁止以下操作:
- 启动子代理后被动提示“等待结果中...”
- 检查一次后提示“仍在运行,我将等待”
- 仅在子代理完成后才显示输出
Step 3: Handle Subagent Response
步骤3:处理子代理响应
When subagent returns:
- Check the returned status
- Look up default behavior in the table above
- Check if user instructions override the default
- Execute the appropriate action
- If action expects new CI Attempt, update tracking (see Step 3a)
- If action results in looping, go to Step 2
当子代理返回响应时:
- 检查返回的状态
- 查阅上表中的默认行为
- 检查用户指令是否覆盖默认行为
- 执行相应操作
- 如果操作预期会触发新CI尝试,更新追踪状态(见步骤3a)
- 如果操作需要循环,回到步骤2
Step 3a: Track State for New-CI-Attempt Detection
步骤3a:追踪新CI尝试检测的状态
After actions that should trigger a new CI Attempt, record state before looping:
| Action | What to Track | Subagent Mode |
|---|---|---|
| Fix auto-applying | | Wait mode |
| Apply via MCP | | Wait mode |
| Apply locally + push | | Wait mode |
| Reject + fix + push | | Wait mode |
| Fix failed + local fix + push | | Wait mode |
| No fix + local fix + push | | Wait mode |
| Environment rerun | | Wait mode |
| No-new-CI-Attempt + auto-fix + push | | Wait mode |
| CI Attempt no tasks + retry push | | Wait mode |
CRITICAL: When passing or to the subagent, it enters wait mode:
expectedCommitShalast_cipe_url- Subagent will completely ignore the old/stale CI Attempt
- Subagent will only wait for new CI Attempt to appear
- Subagent will NOT return to main agent with stale CI Attempt data
- Once new CI Attempt detected, subagent switches to normal polling
Why wait mode matters for context preservation: Stale CI Attempt data can be very large (task output summaries, suggested fix patches, reasoning). If subagent returns this to main agent, it pollutes main agent's context with useless data since we already processed that CI Attempt. Wait mode keeps stale data in the subagent, never sending it to main agent.
执行触发新CI尝试的操作后,记录状态再进入循环:
| 操作 | 追踪内容 | 子代理模式 |
|---|---|---|
| 修复自动应用 | | 等待模式 |
| 通过MCP应用 | | 等待模式 |
| 本地应用+推送 | | 等待模式 |
| 拒绝+修复+推送 | | 等待模式 |
| 修复失败+本地修复+推送 | | 等待模式 |
| 无修复+本地修复+推送 | | 等待模式 |
| 环境问题重新运行 | | 等待模式 |
| 无新CI尝试+自动修复+推送 | | 等待模式 |
| CI尝试无任务+重试推送 | | 等待模式 |
关键说明: 当向子代理传递或时,子代理会进入等待模式:
expectedCommitShalast_cipe_url- 子代理将完全忽略旧的/过期的CI尝试
- 子代理只会等待新的CI尝试出现
- 子代理不会将过期的CI尝试数据返回给主代理
- 检测到新CI尝试后,子代理切换到正常轮询模式
等待模式对上下文保留的重要性: 过期的CI尝试数据可能非常大(任务输出摘要、建议修复补丁、推理过程)。如果子代理将这些数据返回给主代理,会用无用数据污染主代理的上下文,因为我们已经处理过该CI尝试。等待模式将过期数据保留在子代理中,绝不会发送给主代理。
Step 4: Cycle Classification and Progress Tracking
步骤4:循环分类与进度追踪
Cycle Classification
循环分类
Not all cycles are equal. Only count cycles the monitor itself triggered toward :
--max-cycles- After subagent returns, check :
agent_triggered- → this cycle was triggered by the monitor →
agent_triggered == truecycle_count++ - → this cycle was human-initiated or a first observation → do NOT increment
agent_triggered == falsecycle_count
- Reset
agent_triggered = false - After Step 3a (when the monitor takes an action that triggers a new CI Attempt) → set
agent_triggered = true
How detection works: Step 3a is only called when the monitor explicitly pushes code, applies a fix via MCP, or triggers an environment rerun. If a human pushes on their own, the subagent detects a new CI Attempt but the monitor never went through Step 3a, so remains .
agent_triggeredfalseWhen a human-initiated cycle is detected, log it:
[monitor-ci] New CI Attempt detected (human-initiated push). Monitoring without incrementing cycle count. (agent cycles: N/max-cycles)并非所有循环都计入。仅监控器自身触发的循环才计入:
--max-cycles- 子代理返回后,检查:
agent_triggered- → 此循环由监控器触发 →
agent_triggered == truecycle_count++ - → 此循环由人工发起或首次观测 → 不增加
agent_triggered == falsecycle_count
- 重置
agent_triggered = false - 步骤3a后(当监控器执行触发新CI尝试的操作时)→ 设置
agent_triggered = true
检测方式: 步骤3a仅在监控器明确推送代码、通过MCP应用修复或触发环境重新运行时调用。如果是人工自行推送,子代理会检测到新CI尝试,但监控器从未执行步骤3a,因此保持为。
agent_triggeredfalse检测到人工发起的循环时,记录日志:
[monitor-ci] 检测到新CI尝试(人工推送)。继续监控,不增加循环计数。(代理循环数:N/max-cycles)Approaching Limit Gate
接近限制时的处理
When , pause and ask the user before continuing:
cycle_count >= max_cycles - 2[monitor-ci] Approaching cycle limit (cycle_count/max_cycles agent-initiated cycles used).
[monitor-ci] How would you like to proceed?
1. Continue with 5 more cycles
2. Continue with 10 more cycles
3. Stop monitoringIncrease by the user's choice and continue.
max_cycles当时,暂停并询问用户后续操作:
cycle_count >= max_cycles - 2[monitor-ci] 即将达到循环限制(已使用cycle_count/max_cycles次代理发起的循环)。
[monitor-ci] 你希望如何继续?
1. 继续,增加5次循环
2. 继续,增加10次循环
3. 停止监控根据用户选择增加并继续。
max_cyclesProgress Tracking
进度追踪
After each action:
- If state changed significantly → reset
no_progress_count = 0 - If state unchanged →
no_progress_count++ - On new CI attempt detected → reset
local_verify_count = 0
每次操作后:
- 如果状态发生显著变化 → 重置
no_progress_count = 0 - 如果状态未变化 →
no_progress_count++ - 检测到新CI尝试 → 重置
local_verify_count = 0
Status Reporting
状态报告
Based on verbosity level:
| Level | What to Report |
|---|---|
| Only final result (success/failure/timeout) |
| State changes + periodic updates ("Cycle N | Elapsed: Xm | Status: ...") |
| All of medium + full subagent responses, git outputs, MCP responses |
根据详细程度级别:
| 级别 | 报告内容 |
|---|---|
| 仅报告最终结果(成功/失败/超时) |
| 状态变化 + 定期更新("循环N | 已耗时: Xm | 状态: ...") |
| 包含medium级别的所有内容 + 完整子代理响应、Git输出、MCP响应 |
User Instruction Examples
用户指令示例
Users can override default behaviors:
| Instruction | Effect |
|---|---|
| "never auto-apply" | Always prompt before applying any fix |
| "always ask before git push" | Prompt before each push |
| "reject any fix for e2e tasks" | Auto-reject if |
| "apply all fixes regardless of verification" | Skip verification check, apply everything |
| "if confidence < 70, reject" | Check confidence field before applying |
| "run 'nx affected -t typecheck' before applying" | Add local verification step |
| "auto-fix workflow failures" | Attempt lockfile updates on pre-CI-Attempt failures |
| "wait 45 min for new CI Attempt" | Override new-CI-Attempt timeout (default: 10 min) |
用户可覆盖默认行为:
| 指令 | 效果 |
|---|---|
| "never auto-apply" | 应用任何修复前始终提示用户 |
| "always ask before git push" | 每次推送前提示用户 |
| "reject any fix for e2e tasks" | 如果 |
| "apply all fixes regardless of verification" | 跳过验证检查,应用所有修复 |
| "if confidence < 70, reject" | 应用前检查置信度字段 |
| "run 'nx affected -t typecheck' before applying" | 添加本地验证步骤 |
| "auto-fix workflow failures" | 尝试修复CI尝试前的工作流失败 |
| "wait 45 min for new CI Attempt" | 覆盖新CI尝试的超时时间(默认:10分钟) |
Error Handling
错误处理
| Error | Action |
|---|---|
| Git rebase conflict | Report to user, exit |
| Report to user, attempt manual patch or exit |
| MCP tool error | Retry once, if fails report to user |
| Subagent spawn failure | Retry once, if fails exit with error |
| No new CI Attempt detected | If |
| Lockfile auto-fix fails | Report to user, exit with guidance to check CI logs |
| 错误 | 操作 |
|---|---|
| Git变基冲突 | 向用户报告并退出 |
| 向用户报告,尝试手动补丁或退出 |
| MCP工具错误 | 重试一次,如果失败则向用户报告 |
| 子代理启动失败 | 重试一次,如果失败则报错退出 |
| 未检测到新CI Attempt | 如果启用 |
| 锁文件自动修复失败 | 向用户报告,退出并指导用户检查CI日志 |
Example Session
会话示例
Example 1: Normal Flow with Self-Healing (medium verbosity)
示例1:带自修复的正常流程(中等详细程度)
[monitor-ci] Starting CI monitor for branch 'feature/add-auth'
[monitor-ci] Config: max-cycles=5, timeout=120m, verbosity=medium
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 1m)
[monitor-ci] CI: IN_PROGRESS | Self-healing: NOT_STARTED
[monitor-ci] Checking subagent status... (elapsed: 3m)
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS
[monitor-ci] ⚡ CI failed — self-healing fix generation started
[monitor-ci] Checking subagent status... (elapsed: 5m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED | Verification: COMPLETED
[monitor-ci] Fix available! Verification: COMPLETED
[monitor-ci] Applying fix via MCP...
[monitor-ci] Fix applied in CI. Waiting for new CI attempt...
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 7m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] Checking subagent status... (elapsed: 8m)
[monitor-ci] CI: SUCCEEDED
[monitor-ci] CI passed successfully!
[monitor-ci] Summary:
- Agent cycles: 1/5
- Total time: 12m 34s
- Fixes applied: 1
- Result: SUCCESS[monitor-ci] 开始监控分支'feature/add-auth'的CI
[monitor-ci] 配置: max-cycles=5, timeout=120m, verbosity=medium
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 1m)
[monitor-ci] CI: IN_PROGRESS | 自修复: NOT_STARTED
[monitor-ci] 检查子代理状态... (已耗时: 3m)
[monitor-ci] CI: FAILED | 自修复: IN_PROGRESS
[monitor-ci] ⚡ CI失败 — 自修复修复生成已启动
[monitor-ci] 检查子代理状态... (已耗时: 5m)
[monitor-ci] CI: FAILED | 自修复: COMPLETED | 验证: COMPLETED
[monitor-ci] 修复可用!验证状态: COMPLETED
[monitor-ci] 通过MCP应用修复...
[monitor-ci] 修复已在CI中应用。等待新CI尝试...
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 7m)
[monitor-ci] ⚡ 检测到新CI尝试!
[monitor-ci] 检查子代理状态... (已耗时: 8m)
[monitor-ci] CI: SUCCEEDED
[monitor-ci] CI成功通过!
[monitor-ci] 摘要:
- 代理循环数: 1/5
- 总耗时: 12分34秒
- 应用修复数: 1
- 结果: SUCCESSExample 2: Pre-CI Failure (medium verbosity)
示例2:CI尝试前失败(中等详细程度)
[monitor-ci] Starting CI monitor for branch 'feature/add-products'
[monitor-ci] Config: max-cycles=5, timeout=120m, auto-fix-workflow=true
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 2m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED
[monitor-ci] Fix available! Applying locally, enhancing, and pushing...
[monitor-ci] Committed: abc1234
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 6m)
[monitor-ci] Waiting for new CI Attempt... (expected SHA: abc1234)
[monitor-ci] Checking subagent status... (elapsed: 12m)
[monitor-ci] ⚠️ CI Attempt timeout (10 min). Status: no_new_cipe
[monitor-ci] --auto-fix-workflow enabled. Attempting lockfile update...
[monitor-ci] Lockfile updated. Committed: def5678
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 16m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] Checking subagent status... (elapsed: 18m)
[monitor-ci] CI: SUCCEEDED
[monitor-ci] CI passed successfully!
[monitor-ci] Summary:
- Agent cycles: 3/5
- Total time: 22m 15s
- Fixes applied: 1 (self-healing) + 1 (lockfile)
- Result: SUCCESS[monitor-ci] 开始监控分支'feature/add-products'的CI
[monitor-ci] 配置: max-cycles=5, timeout=120m, auto-fix-workflow=true
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 2m)
[monitor-ci] CI: FAILED | 自修复: COMPLETED
[monitor-ci] 修复可用!正在本地应用、优化并推送...
[monitor-ci] 已提交: abc1234
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 6m)
[monitor-ci] 等待新CI尝试... (预期SHA: abc1234)
[monitor-ci] 检查子代理状态... (已耗时: 12m)
[monitor-ci] ⚠️ CI尝试超时(10分钟)。状态: no_new_cipe
[monitor-ci] --auto-fix-workflow已启用。尝试更新锁文件...
[monitor-ci] 锁文件已更新。已提交: def5678
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 16m)
[monitor-ci] ⚡ 检测到新CI尝试!
[monitor-ci] 检查子代理状态... (已耗时: 18m)
[monitor-ci] CI: SUCCEEDED
[monitor-ci] CI成功通过!
[monitor-ci] 摘要:
- 代理循环数: 3/5
- 总耗时: 22分15秒
- 应用修复数: 1(自修复) + 1(锁文件)
- 结果: SUCCESSExample 3: Human-in-the-Loop (user pushes during monitoring)
示例3:人工介入流程(监控期间用户推送)
[monitor-ci] Starting CI monitor for branch 'feature/refactor-api'
[monitor-ci] Config: max-cycles=5, timeout=120m, verbosity=medium
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 4m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED
[monitor-ci] Fix available! Applying fix via MCP... (agent cycles: 0/5)
[monitor-ci] Fix applied in CI. Waiting for new CI attempt...
[monitor-ci] Spawning subagent to poll CI status...
[monitor-ci] Checking subagent status... (elapsed: 8m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] CI: FAILED | Self-healing: COMPLETED
[monitor-ci] Agent-initiated cycle. (agent cycles: 1/5)
[monitor-ci] Fix available! Applying locally and enhancing...
[monitor-ci] Committed: abc1234
[monitor-ci] Spawning subagent to poll CI status...
... (user pushes their own changes to the branch while monitor waits) ...
[monitor-ci] Checking subagent status... (elapsed: 12m)
[monitor-ci] ⚡ New CI Attempt detected!
[monitor-ci] CI: FAILED | Self-healing: IN_PROGRESS
[monitor-ci] New CI Attempt detected (human-initiated push). Monitoring without incrementing cycle count. (agent cycles: 2/5)
[monitor-ci] Checking subagent status... (elapsed: 16m)
[monitor-ci] CI: FAILED | Self-healing: COMPLETED
[monitor-ci] Fix available! Applying via MCP... (agent cycles: 2/5)
... (continues, human cycles don't eat into the budget) ...[monitor-ci] 开始监控分支'feature/refactor-api'的CI
[monitor-ci] 配置: max-cycles=5, timeout=120m, verbosity=medium
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 4m)
[monitor-ci] CI: FAILED | 自修复: COMPLETED
[monitor-ci] 修复可用!通过MCP应用修复... (代理循环数: 0/5)
[monitor-ci] 修复已在CI中应用。等待新CI尝试...
[monitor-ci] 启动子代理轮询CI状态...
[monitor-ci] 检查子代理状态... (已耗时: 8m)
[monitor-ci] ⚡ 检测到新CI尝试!
[monitor-ci] CI: FAILED | 自修复: COMPLETED
[monitor-ci] 代理发起的循环。(代理循环数: 1/5)
[monitor-ci] 修复可用!正在本地应用并优化...
[monitor-ci] 已提交: abc1234
[monitor-ci] 启动子代理轮询CI状态...
... (监控等待期间用户向分支推送了自己的更改) ...
[monitor-ci] 检查子代理状态... (已耗时: 12m)
[monitor-ci] ⚡ 检测到新CI尝试!
[monitor-ci] CI: FAILED | 自修复: IN_PROGRESS
[monitor-ci] 检测到新CI尝试(人工推送)。继续监控,不增加循环计数。(代理循环数: 2/5)
[monitor-ci] 检查子代理状态... (已耗时: 16m)
[monitor-ci] CI: FAILED | 自修复: COMPLETED
[monitor-ci] 修复可用!通过MCP应用修复... (代理循环数: 2/5)
... (继续执行,人工发起的循环不消耗循环配额) ...