job-babysitter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseJob Babysitter
后台任务监控助手(Job Babysitter)
Purpose
用途
Stop manually polling long-running background jobs. Instead of dozens-to-hundreds of
/ checks while guessing at completion, start one background watcher that
detects the terminal state via plateau heuristics, then routes a verdict — done,
needs-attention, or blocked — with the exact next command.
ls -lhpsA night-shift nurse for background jobs: it checks vitals on a schedule and escalates
only when something is actually wrong.
停止手动轮询长时间运行的后台任务。无需在猜测任务完成情况时执行数十甚至数百次/检查,只需启动一个后台监控程序,通过平台启发式算法检测终端状态,随后给出判定结果——完成、需关注或阻塞——并附上明确的下一步操作命令。
ls -lhps它就像后台任务的夜班护士:按时检查任务状态,仅在确实出现问题时才发出告警。
When to use
使用场景
Use when a job will run long enough that babysitting it by hand wastes attention:
- Media encodes / transcodes (ffmpeg, video-transcribe, audio extraction)
- Embedding or vector-DB builds (qmd embed, index builds)
- Batch agent / LLM pipelines run in the background
- Browser / scrape daemons (real-browser, agent-browser) prone to hanging
Do NOT use for jobs that finish in seconds, or where a single call already
returns the result.
Bash当任务运行时间足够长,手动监控会浪费精力时使用:
- 媒体编码/转码(ffmpeg、视频转录、音频提取)
- 嵌入或向量数据库构建(qmd embed、索引构建)
- 在后台运行的批处理Agent/LLM流水线
- 容易挂起的浏览器/爬虫守护进程(real-browser、agent-browser)
请勿用于几秒内即可完成的任务,或单个调用即可返回结果的任务。
BashCore principle: stay thin, lean on the harness
核心原则:轻量封装,依托框架
This skill orchestrates Claude Code's own primitives — do not reimplement them:
- Start the watcher with . When it exits, the harness re-invokes the agent automatically — no manual polling loop needed.
run_in_background: true - The watcher () owns the deterministic part: poll with backoff, detect plateau, distinguish done from stuck, emit a verdict JSON.
scripts/watch_job.py - The skill's value is the per-job-type heuristics, the safe-recovery playbook,
and notification routing — all in .
references/playbook.md
该Skill编排Claude Code自身的原语——无需重新实现它们:
- 使用****启动监控程序。当它退出时,工具框架会自动重新调用Agent——无需手动轮询循环。
run_in_background: true - 监控程序()负责确定性部分:带退避策略的轮询、检测平台状态、区分完成与卡住、输出判定结果JSON。
scripts/watch_job.py - 该Skill的价值在于针对不同任务类型的启发式算法、安全恢复手册以及通知路由——这些内容均在中。
references/playbook.md
Workflow
工作流程
1. Identify the job's signals
1. 确定任务的监控信号
Determine what can be watched, in order of reliability:
- PID — the process ID (most reliable completion signal). Get it from the job's
launch, , or
pgrep.ps - Output file — a file that grows as the job progresses (e.g. ffmpeg target).
- Log file — a log that gets appended (e.g. an embed progress log).
Read § "Completion heuristics by job type" to pick flags for
the specific job type (ffmpeg, embed, batch, browser).
references/playbook.md确定可监控的信号,按可靠性排序:
- PID——进程ID(最可靠的完成信号)。可从任务启动信息、或
pgrep命令获取。ps - 输出文件——随任务推进而增长的文件(如ffmpeg的目标文件)。
- 日志文件——持续追加内容的日志(如嵌入进度日志)。
阅读中的“按任务类型划分的完成启发式算法”章节,为特定任务类型(ffmpeg、嵌入、批处理、浏览器)选择相应参数。
references/playbook.md2. Launch the watcher in the background
2. 在后台启动监控程序
Run with . Always pass when known; add file/log
signals as corroboration. Write the verdict to a known path.
run_in_background: true--pidbash
scripts/watch_job.py \
--label "lab05 stream encode" \
--pid <PID> \
--output-file /path/to/output.mp4 \
--plateau-bytes 65536 --plateau-polls 5 --stuck-after 120 \
--max-wait 7200 \
--verdict-out /tmp/job-babysitter-<label>.jsonThe watcher prints a one-line JSON heartbeat per poll (tail it for live progress) and
writes the final verdict JSON to on exit.
--verdict-outTuning lives in the playbook; sensible defaults: (backs off to 60),
, , .
--interval 10--plateau-polls 4--stuck-after 300--max-wait 7200使用运行。已知PID时务必传递参数;可添加文件/日志信号作为佐证。将判定结果写入指定路径。
run_in_background: true--pidbash
scripts/watch_job.py \
--label "lab05 stream encode" \
--pid <PID> \
--output-file /path/to/output.mp4 \
--plateau-bytes 65536 --plateau-polls 5 --stuck-after 120 \
--max-wait 7200 \
--verdict-out /tmp/job-babysitter-<label>.json监控程序每次轮询会打印一行JSON心跳信息(可通过tail命令查看实时进度),退出时会将最终判定结果JSON写入指定的路径。
--verdict-out参数调优可参考手册;合理默认值:(退避至60)、、、。
--interval 10--plateau-polls 4--stuck-after 300--max-wait 72003. On watcher exit, read the verdict and route it
3. 监控程序退出后,读取判定结果并处理
The harness re-invokes the agent when the background watcher finishes. Read the
verdict JSON. It has ∈ {done, needs-attention, blocked}, a ,
, elapsed time, and final size.
statusreasonsuggested_next- done → verify the output is real (see the job-type "Done check" in the playbook,
e.g. for media, count match for embeds), then proceed with the original task.
ffprobe - needs-attention → the job plateaued while still alive (possibly wedged). Follow the recovery playbook: diagnose read-only FIRST. Never kill or run destructive recovery (pkill, WAL checkpoint, VACUUM) without asking the user.
- blocked → the watcher gave up after . Report honestly: "gave up waiting" ≠ "failed". Offer to re-check or extend the ceiling.
--max-wait
当后台监控程序完成时,工具框架会重新调用Agent。读取判定结果JSON,其中包含字段(取值为done、needs-attention、blocked)、、、耗时以及最终文件大小。
statusreasonsuggested_next- done(完成) → 验证输出是否有效(参考手册中对应任务类型的“完成检查”,如对媒体文件使用,对嵌入任务检查数量匹配),然后继续执行原任务。
ffprobe - needs-attention(需关注) → 任务仍在运行但已进入平台期(可能卡住)。遵循恢复手册:首先诊断只读状态。未经用户许可,切勿执行破坏性恢复操作(如pkill、WAL检查点、VACUUM)。
- blocked(阻塞) → 监控程序在达到后放弃等待。如实报告:“放弃等待”≠“失败”。提供重新检查或延长等待时长的选项。
--max-wait
4. Notify per the chosen channel
4. 按选定渠道发送通知
Default to in-session resume. If the user picked a channel (Telegram, voice/TTS,
desktop notification), route per § "Notification routing".
Always include the status emoji, label, elapsed time, and the exact next command.
references/playbook.md默认在会话内恢复。如果用户选择了通知渠道(Telegram、语音/TTS、桌面通知),请按照中的“通知路由”章节进行处理。务必包含状态表情、任务标签、耗时以及明确的下一步命令。
references/playbook.mdGuardrails (non-negotiable)
约束规则(不可违反)
- Never act on a single slow poll. "Stuck" requires plateau AND elapsed past
— the watcher already enforces this before returning needs-attention.
--stuck-after - Ask before any destructive recovery — ,
pkill, WAL checkpoint,kill, daemon restart. Diagnose read-only first.VACUUM - Report honestly. Distinguish done from "gave up waiting" from "wedged". Never imply a success the watcher did not observe.
- Poll with backoff, not tight loops — the watcher handles this; never wrap it in a manual fast-polling loop.
- 切勿仅凭一次慢轮询就采取行动。“卡住”需要同时满足平台期状态和超过时长——监控程序在返回needs-attention前已强制执行此规则。
--stuck-after - 执行任何破坏性恢复操作前需征得用户同意——如、
pkill、WAL检查点、kill、守护进程重启。首先诊断只读状态。VACUUM - **如实报告。**区分完成、“放弃等待”和“卡住”三种状态。切勿暗示监控程序未观测到的成功。
- 采用带退避策略的轮询,而非紧密循环——监控程序已处理此逻辑;切勿将其包裹在手动快速轮询循环中。
Resources
资源
- — background watcher: plateau detection, stuck-vs-done logic, verdict JSON. Stdlib only, Python 3.11+.
scripts/watch_job.py - — per-job-type completion heuristics, the safe-recovery table, and notification routing. Load when picking watcher flags or handling a needs-attention/blocked verdict.
references/playbook.md
- —— 后台监控程序:平台状态检测、卡住与完成逻辑区分、判定结果JSON输出。仅依赖标准库,需Python 3.11+。
scripts/watch_job.py - —— 按任务类型划分的完成启发式算法、安全恢复表以及通知路由。在选择监控程序参数或处理needs-attention/blocked判定结果时加载此文件。
references/playbook.md