droid-control

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Droid Control

Droid Control

Automate terminals and browsers. Three routing decisions, then atoms guide you the rest of the way.
实现终端与浏览器自动化。只需做出三项路由决策,后续流程将由原子组件(atoms)引导完成。

Ground rules

基本原则

  1. Real apps, real environments. Non-deterministic behavior (LLM responses, network latency, variable output) is expected. Handle it with
    wait
    /
    wait-idle
    . Never substitute fixtures or mocked data.
  2. Commit to execution. Once you've chosen a driver, run the plan. If something fails mid-run, recover and retry -- don't re-evaluate the approach.
  3. Atoms are self-contained. Load one and follow its mechanics. No cross-referencing needed.
  4. tctl
    is the ONLY way to launch recorded sessions.
    tctl
    manages recording by wrapping
    asciinema rec
    around the PTY — raw
    tuistory
    has no recording capability and never will. Never call
    tuistory launch
    directly; unknown flags crash
    tuistory-relay
    . Always resolve
    TCTL
    to its absolute filesystem path before use, especially when delegating to workers (they don't inherit
    ${DROID_PLUGIN_ROOT}
    ).
  5. Isolate every run. Multiple droids may be filming simultaneously on the same machine. Session names and output paths share a global namespace (
    /tmp/tctl-sessions/
    ). At the start of every workflow, generate a run ID (
    RUN_ID=$(date +%s)-$$
    or similar) and use it as a prefix for all session names and a scoped temp directory for all output files:
    bash
    RUN_ID="$(date +%s)-$$"
    RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)"
    # Session names: -s ${RUN_ID}-before, -s ${RUN_ID}-after
    # Output paths: ${RUN_DIR}/before.cast, ${RUN_DIR}/after.cast
    Never use bare session names like
    -s demo
    ,
    -s before
    ,
    -s after
    — they will collide with concurrent runs.
  1. 真实应用,真实环境。 需考虑非确定性行为(如LLM响应、网络延迟、输出变化),使用
    wait
    /
    wait-idle
    处理。绝不使用固定数据或模拟数据替代真实场景。
  2. 坚持执行到底。 选定驱动程序后,执行既定方案。若运行中途失败,应恢复并重试,而非重新评估方案。
  3. 原子组件独立封装。 加载单个组件后,只需遵循其自身机制,无需跨组件参考。
  4. tctl
    是启动录制会话的唯一方式。
    tctl
    通过在PTY外层封装
    asciinema rec
    来管理录制——原生
    tuistory
    不具备录制功能,且未来也不会支持。绝不要直接调用
    tuistory launch
    ;未知参数会导致
    tuistory-relay
    崩溃。使用前务必将
    TCTL
    解析为文件系统绝对路径,尤其是在委托给工作进程时(它们不会继承
    ${DROID_PLUGIN_ROOT}
    环境变量)。
  5. 隔离每次运行。 同一台机器上可能有多个droid同时录制。会话名称和输出路径共享全局命名空间(
    /tmp/tctl-sessions/
    )。在每个工作流开始时,生成一个运行ID(如
    RUN_ID=$(date +%s)-$$
    ),并将其作为所有会话名称的前缀,同时为所有输出文件创建一个带作用域的临时目录:
    bash
    RUN_ID="$(date +%s)-$$"
    RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)"
    # Session names: -s ${RUN_ID}-before, -s ${RUN_ID}-after
    # Output paths: ${RUN_DIR}/before.cast, ${RUN_DIR}/after.cast
    绝不要使用
    -s demo
    -s before
    -s after
    这类无前缀会话名称——它们会与并发运行的任务冲突。

Routing

路由选择

Three independent lookups. Do all three, then load the union of skills they produce.
三项独立的查询操作。完成全部三项后,加载它们所产生技能的集合。

1. Target route — what are you driving?

1. 目标路由——你要操控什么?

TargetLoad these skills
Droid CLI (
droid-dev
,
droid exec
)
droid-cli + tuistory backend via
${DROID_PLUGIN_ROOT}/bin/tctl
Droid CLI (real terminal proof)true-input + droid-cli
Other terminal TUItuistory backend via
${DROID_PLUGIN_ROOT}/bin/tctl
Other terminal TUI (real terminal proof)true-input
Web page or Electron appagent-browser
Raw terminal byte sequencestrue-input + pty-capture
tuistory is the default for terminal work. Use true-input only when you need real terminal rendering evidence.
目标对象加载以下技能
Droid CLI(
droid-dev
droid exec
droid-cli + 通过
${DROID_PLUGIN_ROOT}/bin/tctl
调用tuistory后端
Droid CLI(真实终端证明)true-input + droid-cli
其他终端TUI通过
${DROID_PLUGIN_ROOT}/bin/tctl
调用tuistory后端
其他终端TUI(真实终端证明)true-input
网页或Electron应用agent-browser
原始终端字节序列true-input + pty-capture
tuistory是终端操作的默认选择。仅当需要真实终端渲染证据时,才使用true-input

2. Stage route — what does the workflow need?

2. 阶段路由——工作流需要什么?

Every workflow passes through stages. Load the atoms for each stage you'll use.
StageSkillWhen to load
CapturecaptureAlways -- every workflow records or captures something
ComposecomposeWhen the deliverable is a produced artifact (video, annotated screenshots, comparison image)
VerifyverifyAlways -- every deliverable gets checked against commitments
每个工作流都会经历多个阶段。加载你将使用的每个阶段对应的原子组件。
阶段技能加载时机
捕获capture始终加载——每个工作流都会进行录制或捕获操作
合成compose当交付物是生成的工件(视频、标注截图、对比图)时
验证verify始终加载——每个交付物都要对照承诺检查

3. Artifact route — does compose need polish tools?

3. 工件路由——compose是否需要美化工具?

Only relevant when compose is loaded.
Artifact needAlso load
Showcase polish (window chrome, branded frame, cinematic background)showcase
Effects and keystroke overlays(compose handles this — they're fields in the Remotion props JSON)
仅在加载compose时适用。
工件需求额外加载
展示美化(窗口边框、品牌框架、电影级背景)showcase
特效与按键覆盖层(由compose处理——它们是Remotion props JSON中的字段)

Workflow shape

工作流结构

Command (intent + commitments)
  → Target route (load driver atoms)
  → Capture (record / screenshot / byte-capture)
  → Compose (assemble deliverable, if needed)
  → Verify (check against commitments)
  → Report
Commands declare what to produce. Atoms own how.
命令(意图+承诺)
  → 目标路由(加载驱动原子组件)
  → 捕获(录制/截图/字节捕获)
  → 合成(按需组装交付物)
  → 验证(对照承诺检查)
  → 报告
命令定义要生成什么,原子组件负责如何实现。

Layout default

默认布局

Default:
single
.
One clip showing the target/final state. Pick this unless the deliverable is fundamentally a comparison.
CaseLayout
Brand-new feature (no meaningful prior state)
single
Bug fix, single-clip proof of the working path
single
Walkthrough / tutorial / readme hero
single
Regression proof (broken vs fixed)
side-by-side
Behavior-preserving refactor (visual parity is the point)
side-by-side
User explicitly asks for a comparison
side-by-side
Do not synthesize a "before" state to justify
side-by-side
. If there is no real baseline, use
single
.
默认:
single
(单帧)。
单个片段展示目标/最终状态。除非交付物本质上是对比内容,否则选择此布局。
场景布局
全新功能(无有意义的前置状态)
single
Bug修复、工作路径的单片段证明
single
演练/教程/自述文件示例
single
回归证明(故障vs修复)
side-by-side
(并排)
行为保留重构(视觉一致性为核心)
side-by-side
(并排)
用户明确要求对比
side-by-side
(并排)
不要为了使用
side-by-side
(并排)布局而合成“前置”状态。如果没有真实基准,就使用
single
布局。

Delegation

任务委托

The parent agent plans and orchestrates. Mechanical work runs in worker subagents via the Task tool. This keeps the parent's context clean and enables parallelism.
主Agent负责规划与编排。机械性工作通过Task工具由工作子Agent执行。这样可保持主Agent的上下文整洁,并支持并行处理。

What to delegate

可委托的任务

TaskDelegate?Why
Capture clip (single layout)YESWorker runs the interaction script end-to-end and returns the
.cast
path
Capture both clips (comparison layout)YES —
run_in_background=true
for each
Branches are independent; run in parallel
Remotion renderYESNeeds only props JSON, clip paths, output path. Runs
render-showcase.sh
(handles .cast conversion, fidelity profiles, duration detection, cleanup)
Planning, interaction scriptingNO — parentRequires PR context and editorial judgment
Layout and prop constructionNO — parentRequires editorial decisions about effects, timing, labels
VerificationNO — parentRequires commitment context
Single ffprobe / file-existence checkNO — inlineToo trivial for subagent overhead
任务是否委托?原因
捕获片段(单帧布局)工作进程从头到尾执行交互脚本,并返回
.cast
文件路径
捕获两个片段(对比布局)是——每个任务设置
run_in_background=true
分支相互独立,可并行运行
Remotion渲染仅需props JSON、片段路径、输出路径。执行
render-showcase.sh
(处理.cast转换、保真度配置、时长检测、清理)
规划、交互脚本编写否——主Agent负责需要PR上下文和编辑判断
布局与props构建否——主Agent负责需要关于特效、时长、标签的编辑决策
验证否——主Agent负责需要承诺上下文
单次ffprobe/文件存在性检查否——内联执行过于简单,无需子Agent开销

How to delegate

委托方式

Step 0: Resolve paths and generate a run ID. Workers don't inherit
${DROID_PLUGIN_ROOT}
. Resolve once, paste everywhere:
bash
TCTL="$(realpath "${DROID_PLUGIN_ROOT}/bin/tctl")"
RENDER="$(realpath "${DROID_PLUGIN_ROOT}/scripts/render-showcase.sh")"
RUN_ID="$(date +%s)-$$"
RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)"
Use
${RUN_DIR}
for all output files (recordings, props, rendered video). Use
${RUN_ID}-
as a prefix for all session names. Never use bare names like
-s before
or hardcoded paths like
/tmp/before.cast
.
Give workers exact commands with the resolved absolute paths — not abstract instructions, not
tuistory
, not
${DROID_PLUGIN_ROOT}
. The parent does the thinking; the worker executes:
Task prompt for a capture worker:
  "Run these commands in order. Report the output file path and any errors.
   1. /abs/path/to/bin/tctl launch "droid-dev" -s 1712345678-42-before --backend tuistory \
        --repo-root /abs/path/to/baseline/worktree \
        --cols 120 --rows 36 --record /tmp/droid-run-1712345678-42-xxxx/before.cast \
        --env FORCE_COLOR=3 --env COLORTERM=truecolor
   2. /abs/path/to/bin/tctl -s 1712345678-42-before wait ">" --timeout 15000
   3. /abs/path/to/bin/tctl -s 1712345678-42-before type "hello world"
   4. /abs/path/to/bin/tctl -s 1712345678-42-before press enter
   5. /abs/path/to/bin/tctl -s 1712345678-42-before wait-idle
   6. /abs/path/to/bin/tctl -s 1712345678-42-before close"
Task prompt for a Remotion render worker:
  "Run this command. Report the output file path and any errors.
   /abs/path/to/scripts/render-showcase.sh \
     --props /tmp/droid-run-1712345678-42-xxxx/showcase-props.json \
     --output /tmp/droid-run-1712345678-42-xxxx/demo.mp4 \
     /tmp/droid-run-1712345678-42-xxxx/before.cast /tmp/droid-run-1712345678-42-xxxx/after.cast"
步骤0:解析路径并生成运行ID。 工作进程不会继承
${DROID_PLUGIN_ROOT}
环境变量。只需解析一次,然后在所有地方使用:
bash
TCTL="$(realpath "${DROID_PLUGIN_ROOT}/bin/tctl")"
RENDER="$(realpath "${DROID_PLUGIN_ROOT}/scripts/render-showcase.sh")"
RUN_ID="$(date +%s)-$$"
RUN_DIR="$(mktemp -d /tmp/droid-run-${RUN_ID}-XXXXXX)"
所有输出文件(录制内容、props、渲染视频)都使用
${RUN_DIR}
目录。所有会话名称都以
${RUN_ID}-
作为前缀。绝不要使用
-s before
这类无前缀名称,或
/tmp/before.cast
这类硬编码路径。
向工作进程提供包含已解析绝对路径的精确命令——而非抽象指令、
tuistory
${DROID_PLUGIN_ROOT}
。主Agent负责决策,工作进程负责执行:
捕获工作进程的Task提示:
  "按顺序运行以下命令。报告输出文件路径及任何错误。
   1. /abs/path/to/bin/tctl launch "droid-dev" -s 1712345678-42-before --backend tuistory \
        --repo-root /abs/path/to/baseline/worktree \
        --cols 120 --rows 36 --record /tmp/droid-run-1712345678-42-xxxx/before.cast \
        --env FORCE_COLOR=3 --env COLORTERM=truecolor
   2. /abs/path/to/bin/tctl -s 1712345678-42-before wait ">" --timeout 15000
   3. /abs/path/to/bin/tctl -s 1712345678-42-before type "hello world"
   4. /abs/path/to/bin/tctl -s 1712345678-42-before press enter
   5. /abs/path/to/bin/tctl -s 1712345678-42-before wait-idle
   6. /abs/path/to/bin/tctl -s 1712345678-42-before close"
Remotion渲染工作进程的Task提示:
  "运行以下命令。报告输出文件路径及任何错误。
   /abs/path/to/scripts/render-showcase.sh \
     --props /tmp/droid-run-1712345678-42-xxxx/showcase-props.json \
     --output /tmp/droid-run-1712345678-42-xxxx/demo.mp4 \
     /tmp/droid-run-1712345678-42-xxxx/before.cast /tmp/droid-run-1712345678-42-xxxx/after.cast"

Parallel capture pattern (comparison flows only)

并行捕获模式(仅适用于对比流程)

Only applicable when the Layout default table above selects
side-by-side
. For a
single
layout, launch one capture worker and skip this section.
For before/after comparison demos, launch both capture workers simultaneously:
1. Parent constructs the interaction script (identical for both branches)
2. Launch worker A: capture the baseline/reference branch with `--repo-root` set to that worktree
3. Launch worker B: capture the candidate/change branch with `--repo-root` set to that worktree
4. Wait for both to complete (TaskOutput)
5. Collect .cast paths from results
6. Continue to compose
仅当上述默认布局表格选择
side-by-side
时适用。如果是
single
布局,启动一个捕获工作进程即可,跳过此部分。
对于前后对比演示,同时启动两个捕获工作进程:
1. 主Agent构建交互脚本(两个分支脚本相同)
2. 启动工作进程A:将`--repo-root`设置为基准/参考分支的工作目录,进行捕获
3. 启动工作进程B:将`--repo-root`设置为候选/变更分支的工作目录,进行捕获
4. 等待两者完成(获取TaskOutput)
5. 从结果中收集`.cast`文件路径
6. 继续执行合成步骤

Shared tooling

共享工具

Terminal drivers use the unified
tctl
wrapper. agent-browser has its own CLI and does not use
tctl
.
Drivers can be combined in one workflow — e.g.,
tctl
for a CLI and
agent-browser
for a web UI it interacts with.
终端驱动使用统一的
tctl
封装器。agent-browser有自己的CLI,不使用
tctl
驱动程序可在同一个工作流中组合使用——例如,使用
tctl
操控CLI,使用
agent-browser
操控与之交互的网页UI。

Prerequisites

前置条件

StagePlatformRequiredOptional
tuistoryAll
tuistory
,
asciinema
,
agg
tmux
true-inputLinux/Wayland
cage
,
wtype
, Wayland terminal,
/dev/dri/*
grim
,
wf-recorder
true-inputWindows (KVM)
libvirt
,
qemu
, KVM VM with SPICE + SSH,
DROID_VM_*
env vars
virt-manager
true-inputmacOS (QEMU)
qemu
,
socat
, macOS VM with SSH,
DROID_MAC_*
env vars
agent-browserAll
agent-browser
(+
agent-browser install
)
composeAll
ffmpeg
,
ffprobe
,
agg
showcaseAllNode.js (>= 18), Chrome/Chromium
阶段平台必需工具可选工具
tuistory所有平台
tuistory
asciinema
agg
tmux
true-inputLinux/Wayland
cage
wtype
、Wayland终端、
/dev/dri/*
grim
wf-recorder
true-inputWindows (KVM)
libvirt
qemu
、带SPICE + SSH的KVM虚拟机、
DROID_VM_*
环境变量
virt-manager
true-inputmacOS (QEMU)
qemu
socat
、带SSH的macOS虚拟机、
DROID_MAC_*
环境变量
agent-browser所有平台
agent-browser
(需执行
agent-browser install
compose所有平台
ffmpeg
ffprobe
agg
showcase所有平台Node.js(>= 18)、Chrome/Chromium

Install commands

安装命令

bash
undefined
bash
undefined

tuistory driver + recording

tuistory驱动 + 录制

npm install -g tuistory # virtual PTY driver pip install asciinema # terminal recording (tctl wraps this) cargo install --git https://github.com/asciinema/agg # .cast -> .gif converter (compose needs this)
npm install -g tuistory # 虚拟PTY驱动 pip install asciinema # 终端录制(tctl封装此工具) cargo install --git https://github.com/asciinema/agg # .cast -> .gif转换器(compose需要)

true-input driver (Linux/Wayland)

true-input驱动(Linux/Wayland)

sudo apt-get install -y cage wtype # required: headless compositor + keystroke injection sudo apt-get install -y grim wf-recorder # optional: screenshots + video recording
sudo apt-get install -y cage wtype # 必需:无头 compositor + 按键注入 sudo apt-get install -y grim wf-recorder # 可选:截图 + 视频录制

agent-browser driver

agent-browser驱动

agent-browser install # one-time: downloads bundled Chromium
agent-browser install # 一次性操作:下载捆绑的Chromium

compose + showcase (video rendering)

compose + showcase(视频渲染)

sudo apt-get install -y ffmpeg # video processing (includes ffprobe) cd ${DROID_PLUGIN_ROOT}/remotion && npm install # Remotion dependencies
sudo apt-get install -y ffmpeg # 视频处理(包含ffprobe) cd ${DROID_PLUGIN_ROOT}/remotion && npm install # Remotion依赖

Chrome or Chromium must be installed for Remotion rendering

视频渲染需安装Chrome或Chromium

undefined
undefined