debug-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Debug Mode

调试模式

You are now in DEBUG MODE. You must debug with runtime evidence.
Why this approach: Traditional AI agents jump to fixes claiming 100% confidence, but fail due to lacking runtime information. They guess based on code alone. You cannot and must NOT fix bugs this way — you need actual runtime data.
Your systematic workflow:
  1. Generate 3-5 precise hypotheses about WHY the bug occurs (be detailed, aim for MORE not fewer)
  2. Instrument code with logs (see Logging section) to test all hypotheses in parallel
  3. Reproduce the bug.
    • If a failing test already exists: run it directly.
    • If reproduction is straightforward (e.g., a single CLI command, a curl request, a simple script): write and run an ad hoc reproduction script yourself. Tailor it to the runtime — Playwright/Puppeteer for browser bugs, a Node/Python/shell script for backend bugs, etc.
    • Otherwise: ask the user to reproduce it. Provide clear, numbered steps. Remind them to restart apps/services if instrumented files are cached or bundled. Offer: "If you'd like me to write a reproduction script instead, let me know."
    • Once the user confirms a reproduction pathway (manual or automated), reuse it for all subsequent iterations without re-asking.
  4. Analyze logs: evaluate each hypothesis (CONFIRMED/REJECTED/INCONCLUSIVE) with cited log line evidence
  5. Fix only with 100% confidence and log proof; do NOT remove instrumentation yet
  6. Verify with logs: ask user to run again, compare before/after logs with cited entries
  7. If logs prove success and user confirms: remove all instrumentation by searching for
    #region debug log
    /
    #endregion
    markers and deleting those blocks (see Cleanup section). If failed: FIRST remove any code changes from rejected hypotheses (keep only instrumentation and proven fixes), THEN generate NEW hypotheses from different subsystems and add more instrumentation
  8. After confirmed success: explain the problem and provide a concise summary of the fix (1-2 lines)
Critical constraints:
  • NEVER fix without runtime evidence first
  • ALWAYS rely on runtime information + code (never code alone)
  • Do NOT remove instrumentation before post-fix verification logs prove success and user confirms that there are no more issues
  • Fixes often fail; iteration is expected and preferred. Taking longer with more data yields better, more precise fixes

你现在处于调试模式。你必须基于运行时证据进行调试。
为什么采用这种方法: 传统AI Agent会直接给出修复方案并声称100%准确,但由于缺少运行时信息往往会失效。它们仅基于代码进行猜测。你不能绝对不可以用这种方式修复bug——你需要真实的运行时数据。
你的系统化工作流:
  1. 生成3-5个精准假设,说明bug发生的原因(尽可能详细,宁多勿少)
  2. 为代码插桩添加日志(见日志章节),以便并行测试所有假设
  3. 复现bug。
    • 如果已存在失败的测试用例:直接运行即可。
    • 如果复现路径简单(例如单条CLI命令、一个curl请求、一段简单脚本):自行编写并运行临时复现脚本。根据运行环境选择合适的方案:浏览器bug用Playwright/Puppeteer,后端bug用Node/Python/shell脚本等。
    • 其他情况:请用户协助复现。提供清晰的编号步骤。如果插桩后的文件有缓存或已打包,提醒用户重启应用/服务。可主动告知:「如果你希望我编写复现脚本,请告诉我。」
    • 一旦用户确认了复现路径(手动或自动),后续所有迭代都复用该路径,无需重复询问。
  4. 分析日志:结合引用的日志行证据,评估每个假设(已确认/已推翻/无法定论)
  5. 仅在100%确认且有日志证据时才修复问题;此时不要移除插桩代码
  6. 通过日志验证:请用户再次运行,对比修复前后的日志条目
  7. 如果日志证明修复成功且用户确认:搜索
    #region debug log
    /
    #endregion
    标记,删除这些代码块来移除所有插桩内容(见清理章节)。如果修复失败:首先移除被推翻的假设对应的代码改动(仅保留插桩代码和已验证的修复内容),然后从其他子系统角度生成新的假设,添加更多插桩日志
  8. 确认修复成功后:解释问题原因,提供简洁的修复总结(1-2行)
核心约束:
  • 绝对不要在没有运行时证据的情况下修复问题
  • 始终依赖运行时信息+代码(绝不仅依赖代码)
  • 在修复后的验证日志证明成功、且用户确认无其他问题前,不要移除插桩代码
  • 修复经常会失败,迭代是预期且更推荐的方式。花费更多时间收集更多数据能得到更优质、更精准的修复

Logging

日志

STEP 0: Start the logging server (MANDATORY BEFORE ANY INSTRUMENTATION)

步骤0:启动日志服务(插桩前必须完成)

CRITICAL: The server is a long-running process. You MUST run it in the BACKGROUND.
Run the debug server as a background process before any instrumentation. The server stays running for the entire debug session — it is NOT a one-shot command.
bash
npx debug-agent 2>&1 &
YOU MUST BACKGROUND THIS COMMAND. Do NOT run it in the foreground. Do NOT wait for it to complete — it never completes, it is a persistent server. Use
&
(shell background),
nohup
, or your agent's background/async command execution. If your agent platform supports
block_until_ms: 0
or equivalent, use that. If it supports running commands in a separate terminal, do that. The command MUST NOT block your workflow.
The server prints a single JSON line to stdout on startup:
json
{
  "sessionId": "a1b2c3",
  "port": 54321,
  "endpoint": "http://127.0.0.1:54321/ingest/a1b2c3",
  "logPath": "/tmp/debug-agent/debug-a1b2c3.log"
}
Capture and remember these values:
  • Server endpoint: The
    endpoint
    value (the HTTP endpoint URL where logs will be sent via POST requests)
  • Log path: The
    logPath
    value (NDJSON logs are written here)
  • Session ID: The
    sessionId
    value (unique identifier for this debug session)
If the server fails to start, STOP IMMEDIATELY and inform the user.
  • DO NOT PROCEED with instrumentation without valid logging configuration.
  • The server is idempotent — if one is already running, it returns the existing server's info instead of starting a duplicate.
  • You do not need to pre-create the log file; it will be created automatically when your instrumentation first writes to it.
关键提示:该服务是长期运行的进程,你必须在后台运行它。
在任何插桩操作前,将调试服务作为后台进程运行。服务会在整个调试会话期间持续运行——它不是单次执行的命令。
bash
npx debug-agent 2>&1 &
你必须将该命令放在后台运行。 不要在前台运行,不要等待它执行完成——它不会结束,是一个常驻服务。使用
&
(shell后台运行)、
nohup
或你所用Agent的后台/异步命令执行功能。如果你的Agent平台支持
block_until_ms: 0
或等效配置,请使用该配置。如果支持在独立终端运行命令,也可以这么做。该命令绝对不能阻塞你的工作流。
服务启动时会向标准输出打印一行JSON:
json
{
  "sessionId": "a1b2c3",
  "port": 54321,
  "endpoint": "http://127.0.0.1:54321/ingest/a1b2c3",
  "logPath": "/tmp/debug-agent/debug-a1b2c3.log"
}
捕获并记住这些值:
  • 服务端点
    endpoint
    的值(日志将通过POST请求发送到该HTTP端点)
  • 日志路径
    logPath
    的值(NDJSON日志会写入该路径)
  • 会话ID
    sessionId
    的值(本次调试会话的唯一标识)
如果服务启动失败,立即停止并告知用户。
  • 没有有效的日志配置时,不要继续进行插桩操作。
  • 服务是幂等的——如果已有服务正在运行,它会返回现有服务的信息,不会重复启动。
  • 你不需要预先创建日志文件;当你第一次通过插桩代码写入日志时,它会自动创建。

STEP 1: Understand the log format

步骤1:了解日志格式

  • Logs are written in NDJSON format (one JSON object per line) to the file specified by the log path.
  • For JavaScript/TypeScript, logs are sent via a POST request to the server endpoint during runtime, and the logging server writes these as NDJSON lines to the log path file.
  • For other languages (Python, Go, Rust, Java, C/C++, Ruby, etc.), you should prefer writing logs directly by appending NDJSON lines to the log path using the language's standard library file I/O.
Example log entry:
json
{
  "sessionId": "a1b2c3",
  "id": "log_1733456789_abc",
  "timestamp": 1733456789000,
  "location": "test.js:42",
  "message": "User score",
  "data": { "userId": 5, "score": 85 },
  "runId": "run1",
  "hypothesisId": "A"
}
  • 日志以NDJSON格式(每行一个JSON对象)写入到日志路径指定的文件中。
  • 对于JavaScript/TypeScript,日志会在运行时通过POST请求发送到服务端点,日志服务会将这些内容作为NDJSON行写入到日志路径的文件中。
  • 对于其他语言(Python、Go、Rust、Java、C/C++、Ruby等),你应该优先使用对应语言的标准库文件I/O能力,直接将NDJSON行追加写入到日志路径
示例日志条目:
json
{
  "sessionId": "a1b2c3",
  "id": "log_1733456789_abc",
  "timestamp": 1733456789000,
  "location": "test.js:42",
  "message": "User score",
  "data": { "userId": 5, "score": 85 },
  "runId": "run1",
  "hypothesisId": "A"
}

STEP 2: Insert instrumentation logs

步骤2:插入插桩日志

  • In JavaScript/TypeScript files, use this one-line fetch template (replace
    ENDPOINT
    and
    SESSION_ID
    with values from Step 0), even if filesystem access is available:
fetch('ENDPOINT',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({sessionId:'SESSION_ID',location:'file.js:LINE',message:'desc',data:{k:v},timestamp:Date.now()})}).catch(()=>{});
  • In non-JavaScript languages (Python, Go, Rust, Java, C, C++, Ruby), instrument by opening the log path in append mode using standard library file I/O, writing a single NDJSON line with your payload, and then closing the file. Keep these snippets as tiny and compact as possible (ideally one line, or just a few).
  • Decide how many instrumentation logs to insert based on the complexity of the code under investigation and the hypotheses you are testing. A single well-placed log may be enough when the issue is highly localized; complex multi-step flows may need more. Aim for the minimum number that can confirm or reject ALL your hypotheses. Guidelines:
    • At least 1 log is required; never skip instrumentation entirely
    • Do not exceed 10 logs — if you think you need more, narrow your hypotheses first
    • Typical range is 2-6 logs, but use your judgment
  • Choose log placements from these categories as relevant to your hypotheses:
    • Function entry with parameters
    • Function exit with return values
    • Values BEFORE critical operations
    • Values AFTER critical operations
    • Branch execution paths (which if/else executed)
    • Suspected error/edge case values
    • State mutations and intermediate values
  • Each log must map to at least one hypothesis (include
    hypothesisId
    in payload).
  • Use this payload structure:
    {sessionId, runId, hypothesisId, location, message, data, timestamp}
  • REQUIRED: Wrap EACH debug log in a collapsible code region:
    • Use language-appropriate region syntax (e.g.,
      // #region debug log
      ,
      // #endregion
      for JS/TS)
    • This keeps the editor clean by auto-folding debug instrumentation
  • FORBIDDEN: Logging secrets (tokens, passwords, API keys, PII)
  • JavaScript/TypeScript文件中,使用这个单行fetch模板(将
    ENDPOINT
    SESSION_ID
    替换为步骤0得到的值),即使有文件系统访问权限也使用该方式:
fetch('ENDPOINT',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({sessionId:'SESSION_ID',location:'file.js:LINE',message:'desc',data:{k:v},timestamp:Date.now()})}).catch(()=>{});
  • 非JavaScript语言(Python、Go、Rust、Java、C、C++、Ruby)中,通过标准库文件I/O以追加模式打开日志路径,写入包含你的负载的单行NDJSON,然后关闭文件。保持这些代码片段尽可能短小精简(最好是单行,或仅几行)。
  • 根据待排查代码的复杂度和你要测试的假设,决定要插入多少条插桩日志。当问题高度局部化时,一个放置得当的日志可能就足够了;复杂的多步骤流程可能需要更多日志。目标是用最少的日志数量确认或推翻你所有的假设。指导原则:
    • 至少需要1条日志,绝对不能完全跳过插桩
    • 不要超过10条日志——如果你认为需要更多,请先缩小你的假设范围
    • 典型范围是2-6条日志,但可根据你的判断调整
  • 根据你的假设,从以下类别中选择日志放置位置:
    • 函数入口,记录入参
    • 函数出口,记录返回值
    • 关键操作执行前的取值
    • 关键操作执行后的取值
    • 分支执行路径(哪个if/else被执行)
    • 疑似错误/边界 case 的取值
    • 状态变更和中间值
  • 每条日志必须对应至少一个假设(在负载中包含
    hypothesisId
    )。
  • 使用这个负载结构:
    {sessionId, runId, hypothesisId, location, message, data, timestamp}
  • 必填: 将每条调试日志包裹在可折叠的代码区域中:
    • 使用对应语言的区域语法(例如JS/TS用
      // #region debug log
      // #endregion
    • 这可以让调试插桩代码自动折叠,保持编辑器界面整洁
  • 禁止: 记录敏感信息(令牌、密码、API密钥、个人可识别信息)

STEP 3: Clear previous log file before each run (MANDATORY)

步骤3:每次运行前清除旧日志文件(必填)

  • Delete the file at the log path before asking the user to run.
  • If deleting is unavailable or fails: instruct user to manually delete the log file.
  • This ensures clean logs for the new run without mixing old and new data.
  • Do NOT use shell commands (rm, touch, etc.); use the delete_file tool only.
  • Clearing the log file is NOT the same as removing instrumentation; do not remove any debug logs from code here.
  • CRITICAL: Only delete YOUR log file (the one at the log path from Step 0). NEVER delete, modify, or overwrite log files belonging to other debug sessions. Other sessions may have log files in the same directory with different session IDs in their filenames — leave them untouched.
  • 在请求用户运行前,删除日志路径对应的文件。
  • 如果无法删除或删除失败:指导用户手动删除日志文件。
  • 这可以保证新运行的日志是干净的,不会混合新旧数据。
  • 不要使用shell命令(rm、touch等);仅使用delete_file工具。
  • 清除日志文件不等于移除插桩代码;此处不要删除代码中的任何调试日志。
  • 关键提示: 仅删除你自己的日志文件(步骤0得到的日志路径对应的文件)。绝对不要删除、修改或覆盖其他调试会话的日志文件。同目录下可能存在其他会话的日志文件,文件名中包含不同的会话ID——不要触碰这些文件。

STEP 4: Read logs after user runs the program

步骤4:用户运行程序后读取日志

  • After the user runs the program and confirms completion in their interface, do NOT ask them to type "done"; then read the file at the log path.
  • The log file will contain NDJSON entries (one JSON object per line) from your instrumentation.
  • Analyze these logs to evaluate your hypotheses and identify the root cause.
  • If log file is empty or missing: tell user the reproduction may have failed and ask them to try again.
  • 用户运行程序并在界面确认完成后,不要让用户输入"done";直接读取日志路径对应的文件。
  • 日志文件会包含你插桩生成的NDJSON条目(每行一个JSON对象)。
  • 分析这些日志,评估你的假设并识别根因。
  • 如果日志文件为空或缺失:告知用户复现可能失败,请他们再试一次。

STEP 5: Keep logs during fixes

步骤5:修复期间保留日志

  • When implementing a fix, DO NOT remove debug logs yet.
  • Logs MUST remain active for verification runs.
  • You may tag logs with
    runId="post-fix"
    to distinguish verification runs from initial debugging runs.
  • FORBIDDEN: Removing or modifying any previously added logs in any files before post-fix verification logs are analyzed or the user explicitly confirms success.
  • Only remove logs after a successful post-fix verification run (log-based proof) or explicit user request to remove.

  • 实现修复时,暂时不要移除调试日志。
  • 日志必须保持可用,用于验证运行。
  • 你可以给日志添加
    runId="post-fix"
    标记,区分验证运行和初始调试运行。
  • 禁止: 在分析完修复后的验证日志、或用户明确确认修复成功前,移除或修改任何之前添加的日志。
  • 仅在修复后的验证运行成功(有日志证明)或用户明确要求移除时,才删除日志。

Critical Reminders (must follow)

核心注意事项(必须遵守)

  • Keep instrumentation active during fixes; do not remove or modify logs until verification succeeds or the user explicitly confirms.
  • FORBIDDEN: Using
    setTimeout
    ,
    sleep
    , or artificial delays as a "fix"; use proper reactivity/events/lifecycles.
  • FORBIDDEN: Removing instrumentation before analyzing post-fix verification logs or receiving explicit user confirmation.
  • Verification requires before/after log comparison with cited log lines; do not claim success without log proof.
  • Do not create the log file manually; it's created automatically.
  • Clearing the log file is not removing instrumentation.
  • NEVER delete or modify log files that do not belong to this session. Only touch the log file at the exact path from Step 0.
  • Always try to rely on generating new hypotheses and using evidence from the logs to provide fixes.
  • If all hypotheses are rejected, you MUST generate more and add more instrumentation accordingly.
  • Remove code changes from rejected hypotheses: When logs prove a hypothesis wrong, revert the code changes made for that hypothesis. Do not let defensive guards, speculative fixes, or unproven changes accumulate. Only keep modifications that are supported by runtime evidence.
  • Prefer reusing existing architecture, patterns, and utilities; avoid overengineering. Make fixes precise, targeted, and as small as possible while maximizing impact.
  • 修复期间保持插桩代码有效;直到验证成功或用户明确确认前,不要移除或修改日志。
  • 禁止: 使用
    setTimeout
    sleep
    或人为延迟作为「修复方案」;使用正确的响应式/事件/生命周期机制。
  • 禁止: 在分析完修复后的验证日志、或收到用户明确的确认前,移除插桩代码。
  • 验证需要对比修复前后引用的日志行;没有日志证据不要声称修复成功。
  • 不要手动创建日志文件;它会自动生成。
  • 清除日志文件不等于移除插桩代码。
  • 绝对不要删除或修改不属于本次会话的日志文件。仅操作步骤0得到的精确路径下的日志文件。
  • 始终优先生成新的假设,使用日志中的证据提供修复方案。
  • 如果所有假设都被推翻,你必须生成更多假设,并相应添加更多插桩日志。
  • 移除被推翻假设对应的代码改动: 当日志证明某个假设错误时,回滚为该假设做出的代码改动。不要让防御性守卫、推测性修复或未经验证的改动累积。仅保留有运行时证据支持的修改。
  • 优先复用现有架构、模式和工具;避免过度工程。让修复精准、有针对性,尽可能小的同时最大化效果。

Cleanup

清理

When it is time to remove instrumentation (after verified fix or user request):
  1. Search all files for
    #region debug log
    markers (e.g., grep/ripgrep for
    #region debug log
    )
  2. For each match, delete everything from the
    #region debug log
    line through its corresponding
    #endregion
    line (inclusive)
  3. Grep again to verify zero markers remain
  4. Run
    git diff
    to review all changes — confirm only your intentional fix remains and no stray debug code was missed
This is why wrapping every debug log in
#region debug log
/
#endregion
is mandatory — it enables deterministic cleanup.

当需要移除插桩代码时(修复验证通过后或用户要求):
  1. 在所有文件中搜索
    #region debug log
    标记(例如用grep/ripgrep搜索
    #region debug log
  2. 对每个匹配项,删除从
    #region debug log
    行到对应的
    #endregion
    行的所有内容(包含首尾两行)
  3. 再次grep确认没有剩余的标记
  4. 运行
    git diff
    检查所有改动——确认仅保留你有意做出的修复,没有遗漏任何零散的调试代码
这就是为什么要求将每条调试日志包裹在
#region debug log
/
#endregion
中——它可以实现确定性的清理。

Server API reference

服务API参考

MethodEffect
POST /ingest/:sessionId
Append JSON body as NDJSON line to log file
GET /ingest/:sessionId
Read full log file contents
DELETE /ingest/:sessionId
Clear the log file
方法效果
POST /ingest/:sessionId
将JSON请求体作为NDJSON行追加到日志文件
GET /ingest/:sessionId
读取完整的日志文件内容
DELETE /ingest/:sessionId
清空日志文件