🚨 CRITICAL: macOS Daemon Workaround & Gemini CLI Usage 🚨
🚨 重要提示:macOS守护进程解决方案与Gemini CLI使用说明 🚨
When using
in this macOS environment, the default background daemonization process crashes, causing
Connection refused (os error 61)
errors.
You MUST start the daemon manually shielded from TTY hangups before running any commands. Using
is insufficient; you must use
to provide a fully isolated pseudo-terminal.
To support parallel runs, only restart the daemon if it is not currently running:
在macOS环境中使用
时,默认的后台守护进程化进程会崩溃,导致
Connection refused (os error 61)
错误。
**在运行任何
命令之前,您必须手动启动守护进程并避免TTY挂起问题。**仅使用
是不够的;您必须使用
提供完全隔离的伪终端。
为了支持并行运行,仅当守护进程未运行时才重启它:
Check if daemon is alive, start it in tmux if it is not
检查守护进程是否存活,若未存活则在tmux中启动
if ! agent-tui sessions >/dev/null 2>&1; then
tmux kill-session -t agent-tui 2>/dev/null || true
agent-tui daemon stop 2>/dev/null || true
rm -f /tmp/agent-tui*
tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1'
sleep 1
fi
if ! agent-tui sessions >/dev/null 2>&1; then
tmux kill-session -t agent-tui 2>/dev/null || true
agent-tui daemon stop 2>/dev/null || true
rm -f /tmp/agent-tui*
tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1'
sleep 1
fi
Session ID vs PID (Crucial for Reconnection)
会话ID与PID(重连关键)
When
returns JSON, it includes both a
and a
. The
is purely informational (the OS process ID of the child command). You
do not use the
to reconnect or issue commands. You must always use the
(e.g.,
).
If the daemon crashes (
), the pseudo-terminal is destroyed. Even if the child
survives as an orphan, you cannot reconnect to it. You must restart the daemon using the workaround above and start a completely new session.
当
返回JSON时,会包含
和
。
仅用于信息展示(子命令的操作系统进程ID)。您
不能使用
进行重连或发送命令。必须始终使用
(例如:
)。
如果守护进程崩溃(出现
),伪终端会被销毁。即使子进程
作为孤儿进程存活,您也无法重连到它。您必须使用上述解决方案重启守护进程,并启动一个全新的会话。
Testing the Gemini CLI
测试Gemini CLI
When testing the Gemini CLI with
, there are several strict requirements to ensure deterministic and accurate behavior:
- Build Before Running: runs the built JS files, not TypeScript. You MUST run or after making code changes and before launching the CLI with .
- Bypass Trust Modals: Always pass
GEMINI_CLI_TRUST_WORKSPACE=true
in the environment. If you don't, any new project-level agents or extensions will trigger a full-screen "Acknowledge and Enable" modal. This modal steals focus, swallows automation keystrokes, and causes commands to time out.
- Isolated Environments: If you need to test without real user credentials or existing agents interfering, isolate the global settings using
GEMINI_CLI_HOME=<some-test-dir>
.
- Testing State Deltas (e.g., Reloads): If you are testing features that report deltas (e.g., outputting "1 new local subagent"), you MUST:
- Start the CLI first so it establishes its baseline registry.
- Use a separate shell command (outside of ) to write the new agent / file.
- Use and to trigger the command inside the running session.
- (If you add the files before starting the CLI, they become part of the baseline and won't trigger the delta logic).
使用
测试Gemini CLI时,有几个严格要求以确保行为的确定性和准确性:
- 运行前构建:运行的是已构建的JS文件,而非TypeScript文件。在修改代码后、使用启动CLI之前,您必须运行或。
- 绕过信任弹窗:始终在环境变量中传入
GEMINI_CLI_TRUST_WORKSPACE=true
。如果不设置,任何新的项目级代理或扩展都会触发全屏的“Acknowledge and Enable”弹窗。该弹窗会抢占焦点、拦截自动化按键,并导致命令超时。
- 隔离环境:如果需要在无真实用户凭据或现有代理干扰的情况下进行测试,请使用
GEMINI_CLI_HOME=<some-test-dir>
隔离全局设置。
- 测试状态增量(如重载):如果您测试的功能会报告增量(例如输出“1 new local subagent”),您必须:
- 先启动CLI,使其建立基线注册表。
- 使用单独的shell命令(在外部)写入新的代理/文件。
- 使用和在运行的会话中触发命令。
- (如果在启动CLI之前添加文件,它们会成为基线的一部分,不会触发增量逻辑)。
Example: Standard isolated run (sandboxed config + bypass trust modals)
示例:标准隔离运行(沙箱配置 + 绕过信任弹窗)
env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
Terminal Automation Mastery
终端自动化精通指南
- Supported OS: macOS or Linux (Windows not supported yet).
- Verify install:
If not installed, use one of:
- 支持的操作系统:macOS或Linux(暂不支持Windows)。
- 验证安装:
如果未安装,可选择以下方式之一:
Recommended: one-line install (macOS/Linux)
推荐:一键安装(macOS/Linux)
npm i -g agent-tui
pnpm add -g agent-tui
bun add -g agent-tui
npm i -g agent-tui
pnpm add -g agent-tui
bun add -g agent-tui
Philosophy: Why Terminal Automation Is Different
理念:终端自动化为何与众不同
Terminal UIs are stateless from the observer's perspective. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything:
| Web Automation | Terminal Automation |
|---|
| DOM persists across interactions | Screen buffer is redrawn constantly |
| Selectors are stable | Text positions may shift |
| Query once, act many times | Must re-verify before EVERY action |
| Network events signal completion | Must detect visual stability |
The Core Insight: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous state means nothing after the UI changes. This isn't a limitation—it's the nature of terminal interaction.
终端UI从观察者的角度来看是无状态的。与具有持久DOM的Web浏览器不同,终端自动化处理的是不断刷新的字符网格。这一根本差异决定了所有操作逻辑:
| Web自动化 | 终端自动化 |
|---|
| DOM在交互过程中持久存在 | 屏幕缓冲区不断重绘 |
| 选择器稳定 | 文本位置可能偏移 |
| 查询一次,多次操作 | 每次操作前必须重新验证 |
| 网络事件标志完成 | 必须检测视觉稳定性 |
核心洞察:agent-tui为您提供“视觉”但无“记忆”。每个截图都是全新的观察结果。UI变化后,之前的状态毫无意义。这并非局限——而是终端交互的本质。
Mental Model: The Feedback Loop
思维模型:反馈循环
Think of terminal automation as a closed-loop control system:
┌──────────────────────────────────────────────┐
│ │
▼ │
OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY ───┘
│ │
│ │
└─────── NEVER skip ◄────────────────────┘
Each phase is mandatory. Skipping verification is the #1 cause of flaky automation.
将终端自动化视为一个闭环控制系统:
┌──────────────────────────────────────────────┐
│ │
▼ │
观察 ──► 决策 ──► 执行 ──► 等待 ──► 验证 ───┘
│ │
│ │
└─────── 绝不能跳过 ◄────────────────────┘
**每个阶段都是必需的。**跳过验证是自动化不稳定的首要原因。
The "Fresh Eyes" Principle
“全新视角”原则
Every time you need to interact with the UI:
- Take a fresh screenshot — your previous one is now stale
- Locate your target visually — text positions may have changed
- Verify the state — the UI may have changed unexpectedly
- Act only when stable — animations and loading states cause failures
This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug.
每次需要与UI交互时:
- 截取全新截图——之前的截图已过期
- 视觉定位目标——文本位置可能已变化
- 验证状态——UI可能已意外变化
- 仅在稳定时执行操作——动画和加载状态会导致失败
这看起来更慢,但却是唯一可靠的方法。盲目复用过期状态会导致难以调试的间歇性失败。
Critical Rules (Non-Negotiable)
关键规则(不可协商)
RULE 1: Atomic Execution (No Pipelining)
You are FORBIDDEN from chaining commands with
(e.g.,
type "x" && press Enter && wait
). Modals or UI updates can intercept your keystrokes. You MUST execute one atomic action, wait, screenshot, and verify before taking the next action in a new turn.
RULE 2: Re-snapshot after EVERY action
The UI state is invalidated by any change. Always take a fresh screenshot before acting again.
RULE 3: Never act on unstable UI
If the UI is animating, loading, or transitioning,
first. Acting during transitions because race conditions.
RULE 4: Verify before claiming success
Use
wait "expected text" --assert
to confirm outcomes. Don't assume an action worked—prove it.
RULE 5: Error Recovery
If a
command times out, DO NOT blindly restart or kill the session. Execute
to visually diagnose what unexpected UI element (modal, error dialog, lost focus) intercepted the flow.
RULE 6: Clean up sessions
Always end with
. Orphaned sessions consume resources and can interfere with future runs.
规则1:原子执行(禁止流水线)
禁止使用
链式执行命令(例如:
type "x" && press Enter && wait
)。弹窗或UI更新可能会拦截您的按键。您必须执行一个原子操作,等待、截图、验证,然后在新的步骤中执行下一个操作。
规则2:每次操作后重新截图
任何变化都会使UI状态失效。再次执行操作前务必截取全新截图。
规则3:绝不在不稳定UI上执行操作
如果UI正在动画、加载或过渡,请先执行
。在过渡期间执行操作会导致竞态条件。
规则4:验证后再宣告成功
使用
wait "expected text" --assert
确认结果。不要假设操作成功——要证明它成功。
规则5:错误恢复
如果
命令超时,请勿盲目重启或终止会话。执行
以可视化诊断是什么意外UI元素(弹窗、错误对话框、失去焦点)拦截了流程。
规则6:清理会话
始终以
结束会话。孤立的会话会消耗资源并可能干扰后续运行。
Which Screenshot Mode?
选择哪种截图模式?
Use
when parsing automation output, or plain
for human readable text.
What are you waiting for?
│
├─► Specific text to appear
│ └─► `wait "text" --assert` (fails if not found)
│
├─► Specific text to disappear
│ └─► `wait "text" --gone --assert`
│
├─► UI to stop changing (animations, loading)
│ └─► `wait --stable`
│
└─► Multiple conditions
└─► Chain waits sequentially
您在等待什么?
│
├─► 特定文本出现
│ └─► `wait "text" --assert`(未找到则失败)
│
├─► 特定文本消失
│ └─► `wait "text" --gone --assert`
│
├─► UI停止变化(动画、加载)
│ └─► `wait --stable`
│
└─► 多个条件
└─► 按顺序链式等待
What do you need to do?
│
├─► Type text into the terminal
│ └─► `type "text"`
│
├─► Send keyboard shortcuts/navigation
│ └─► `press Ctrl+C` or `press ArrowDown Enter`
您需要做什么?
│
├─► 在终端中输入文本
│ └─► `type "text"`
│
├─► 发送键盘快捷键/导航
│ └─► `press Ctrl+C` 或 `press ArrowDown Enter`
The canonical automation loop:
1. START: Launch the TUI app
1. 启动:启动TUI应用
agent-tui run <command> [-- args...]
agent-tui run <command> [-- args...]
2. OBSERVE: Get current UI state
2. 观察:获取当前UI状态
agent-tui screenshot --format json
agent-tui screenshot --format json
3. DECIDE: Based on text, determine next action
3. 决策:根据文本确定下一步操作
(This happens in your head/code)
(此步骤在您的头脑/代码中完成)
4. ACT: Execute the action
4. 执行:执行操作
agent-tui type "text"
agent-tui press Enter
agent-tui type "text"
agent-tui press Enter
5. WAIT: Synchronize with UI changes
5. 等待:与UI变化同步
agent-tui wait "Expected" --assert # or wait --stable
agent-tui wait "Expected" --assert # 或 wait --stable
6. VERIFY: Confirm the outcome (often combined with step 5)
6. 验证:确认结果(通常与步骤5结合)
If verification fails, handle the error
如果验证失败,处理错误
7. REPEAT: Go back to step 2 until done
7. 重复:回到步骤2直到完成
8. CLEANUP: Always clean up
8. 清理:始终进行清理
Anti-Patterns (What NOT to Do)
反模式(切勿这样做)
❌ Acting During Animation/Loading
❌ 在动画/加载期间执行操作
WRONG: Acting immediately on dynamic UI
错误:在动态UI上立即执行操作
agent-tui run my-app
agent-tui screenshot --format json # UI might still be loading!
agent-tui type "value" # ❌ Might miss the input field
agent-tui run my-app
agent-tui screenshot --format json # UI可能仍在加载!
agent-tui type "value" # ❌ 可能错过输入字段
RIGHT: Wait for stability first
正确:先等待稳定
agent-tui run my-app
agent-tui wait --stable # Let UI settle
agent-tui screenshot --format json # Now it's reliable
agent-tui type "value"
agent-tui run my-app
agent-tui wait --stable # 等待UI稳定
agent-tui screenshot --format json # 现在可靠了
agent-tui type "value"
❌ Assuming Success Without Verification
❌ 未验证就假设成功
WRONG: Assuming the type worked
错误:假设输入成功
agent-tui type "value"
agent-tui press Enter
agent-tui type "value"
agent-tui press Enter
...proceed as if success... # ❌ What if it failed silently?
...继续操作,仿佛已成功... # ❌ 如果静默失败怎么办?
RIGHT: Verify the outcome
正确:验证结果
agent-tui type "value"
agent-tui press Enter
agent-tui wait "Success" --assert # ✓ Proves the action worked
agent-tui type "value"
agent-tui press Enter
agent-tui wait "Success" --assert # ✓ 证明操作成功
WRONG: Forgetting to kill the session
错误:忘记终止会话
script ends # ❌ Session left running!
脚本结束 # ❌ 会话仍在运行!
RIGHT: Always clean up
正确:始终清理
agent-tui kill # ✓ Clean exit
Before You Start: Clarify Requirements
开始前:明确需求
Before automating any TUI, gather this information:
- Command: What exactly to run? ( or ?)
- Success criteria: What text/state indicates success?
- Input sequence: What keystrokes/data to enter, in what order?
- Safety: Is it safe to submit forms, delete data, etc.?
- Auth: Does it need login? Test credentials?
- Live preview: Does the user want to watch? (
agent-tui live start --open
)
If any of these are unclear, ask before running.
在自动化任何TUI之前,请收集以下信息:
- 命令:具体要运行什么?(还是?)
- 成功标准:什么文本/状态表示成功?
- 输入序列:要输入哪些按键/数据,顺序如何?
- 安全性:提交表单、删除数据等操作是否安全?
- 认证:是否需要登录?测试凭据?
- 实时预览:用户是否希望观看?(
agent-tui live start --open
)
如果有任何不清楚的地方,请在运行前询问。
Demo Mode: Showing What agent-tui Can Do
演示模式:展示agent-tui的功能
When a user asks what agent-tui is, wants a demo, or asks "show me how it works":
- Don't explain—demonstrate. Actions speak louder than words.
- Use the live preview so they can watch in real-time.
- Run —it's universal and shows dynamic real-time updates.
Quick demo trigger phrases:
- "What is agent-tui?" / "What does agent-tui do?"
- "Demo agent-tui" / "Show me agent-tui"
- "How does agent-tui work?" / "See it in action"
当用户询问agent-tui是什么、想要演示,或询问“show me how it works”时:
- **不要解释——直接演示。**行动胜于言语。
- 使用实时预览让用户可以实时观看。
- 运行——它是通用工具,可展示动态实时更新。
快速演示触发短语:
- "What is agent-tui?" / "What does agent-tui do?"
- "Demo agent-tui" / "Show me agent-tui"
- "How does agent-tui work?" / "See it in action"
| Symptom | Diagnosis | Solution |
|---|
| "Text not found" | Stale view or text moved | Re-snapshot, locate text again |
| Wait times out | UI didn't reach expected state | Check screenshot, verify expectations |
| "Daemon not running" | Daemon crashed or not started | |
| Unexpected layout | Wrong terminal size | agent-tui resize --cols 120 --rows 40
|
| Session unresponsive | App crashed or hung | , then re-run |
| Repeated failures | Something fundamentally wrong | Stop after 3-5 attempts, ask user |
| 症状 | 诊断 | 解决方案 |
|---|
| "Text not found" | 视图过期或文本位置移动 | 重新截图,重新定位文本 |
| 等待超时 | UI未达到预期状态 | 检查截图,验证预期结果 |
| "Daemon not running" | 守护进程崩溃或未启动 | |
| 布局异常 | 终端尺寸错误 | agent-tui resize --cols 120 --rows 40
|
| 会话无响应 | 应用崩溃或挂起 | ,然后重新运行 |
| 重复失败 | 存在根本性问题 | 尝试3-5次后停止,询问用户 |
Self-Discovery: Use --help
自我探索:使用--help
You don't need to memorize every flag. The CLI is self-documenting:
bash
agent-tui --help # List all commands
agent-tui run --help # Options for 'run'
agent-tui screenshot --help # Options for 'screenshot'
agent-tui wait --help # Options for 'wait'
When in doubt, ask the CLI. This skill teaches
when and
why to use commands. For exact flags and syntax,
is authoritative.
您无需记住所有标志。CLI自带文档:
bash
agent-tui --help # 列出所有命令
agent-tui run --help # 'run'命令的选项
agent-tui screenshot --help # 'screenshot'命令的选项
agent-tui wait --help # 'wait'命令的选项
**如有疑问,请询问CLI。**本技能教授何时以及为何使用命令。对于确切的标志和语法,
是权威来源。
agent-tui run <cmd> [-- args] # Launch TUI under control
agent-tui run <cmd> [-- args] # 在控制下启动TUI
agent-tui screenshot # Plain text view
agent-tui screenshot --format json # Machine-readable output
agent-tui screenshot # 纯文本视图
agent-tui screenshot --format json # 机器可读输出
agent-tui press Enter # Press key(s)
agent-tui press Ctrl+C # Keyboard shortcuts
agent-tui type "text" # Type text
agent-tui press Enter # 按下按键
agent-tui press Ctrl+C # 键盘快捷键
agent-tui type "text" # 输入文本
agent-tui wait "text" --assert # Wait for text, fail if not found
agent-tui wait "text" --gone --assert # Wait for text to disappear
agent-tui wait --stable # Wait for UI to stop changing
agent-tui wait "text" --assert # 等待文本出现,未找到则失败
agent-tui wait "text" --gone --assert # 等待文本消失,未消失则失败
agent-tui wait --stable # 等待UI停止变化
agent-tui sessions # List active sessions
agent-tui live start --open # Start live preview
agent-tui kill # End current session
agent-tui sessions # 列出活跃会话
agent-tui live start --open # 启动实时预览
agent-tui kill # 结束当前会话