screenclaw

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

screenclaw

核心规则

Core Rules

坐标必须来自 ScreenClaw 截图上的
```
XxY
```
网格交叉点，不能用内部视觉坐标或凭感觉推测。
每个会话固定一个
```
session_id
```
，所有公开调用只使用
```
scripts/screenclaw.py|ps1|sh
```
。
调用 endpoint 前阅读
```
references/api/{endpoint}.md
```
；参数错误或 API 报错时回到文档修正。
首次动态坐标、高风险坐标、看不清目标或数字时，必须先裁剪放大或 marker 反验。
API 成功只代表指令已发送，不代表界面达成目标；操作后必须截图验证。
收到
```
SELF_CHECK_REQUIRED
```
时，阅读并执行
```
references/self_check.md
```
的自检程序，让关键上下文重新装载，再用
```
self_check
```
总结执行内容后重试截图。

Coordinates must come from the
```
XxY
```
grid intersections on ScreenClaw screenshots; internal visual coordinates or speculative coordinates are not allowed.
A fixed
```
session_id
```
is used for each session, and all public calls only use
```
scripts/screenclaw.py|ps1|sh
```
.
Read
```
references/api/{endpoint}.md
```
before calling an endpoint; if there are parameter errors or API errors, return to the document to correct them.
For first-time dynamic coordinates, high-risk coordinates, or when the target or numbers are unclear, you must first crop and zoom or verify with a marker.
API success only means the instruction has been sent, not that the interface has achieved the goal; you must take a screenshot to verify after the operation.
When receiving
```
SELF_CHECK_REQUIRED
```
, read and execute the self-check procedure in
```
references/self_check.md
```
to reload key context, then summarize the executed content with
```
self_check
```
and retry taking screenshots.

心智模型

Mental Model

外部事实优先：截图、marker、API 返回、窗口列表是事实；你的直觉和预设结论必须服从这些事实。marker 不在目标上，就说明坐标错了。
先证伪，再操作：候选坐标默认可能是错的。先用 crop 或 marker 找反例，确认标记点实际落在哪里，再操作。
失败先回到截图：操作失败、结果不符合预期、连续微调无效时，先重新截图、裁剪、读文档和参数，不要继续猜坐标或重复点击。

External facts take priority: screenshots, markers, API returns, and window lists are facts; your intuition and preset conclusions must obey these facts. If the marker is not on the target, it means the coordinates are wrong.
Falsify first, then operate: candidate coordinates are assumed to be wrong by default. Use crop or marker to find counterexamples, confirm where the marked point actually falls, then operate.
Return to screenshots when failing: when operations fail, results do not meet expectations, or continuous fine-tuning is ineffective, first re-take screenshots, crop, read documents and parameters, do not continue guessing coordinates or repeating clicks.

固定工作循环

Fixed Work Cycle

text

理解目标 -> health -> config -> get_window_list -> screenshot -> 读坐标 -> marker 反验 -> 操作 -> screenshot 验证

text

Understand Goal -> health -> config -> get_window_list -> screenshot -> Read Coordinates -> Marker Verification -> Operation -> Screenshot Verification

1. 初始化

1. Initialization

根据用户语言回复。

阅读

references/config.md

获取

api_url

、

token

、

ai_app_type

、

session_id

规则。

阅读
```
scripts/README.md
```
了解统一脚本入口和点号路径格式。
调用
```
health
```
确认服务可用。
搜索
```
references/scenarios/
```
是否有匹配模板。
复杂、多步、高风险任务先维护 2-5 步简短计划；简单单步任务不强制创建待办。

Respond according to the user's language.

Read

references/config.md

to get the rules for

api_url

token

ai_app_type

, and

session_id

Read
```
scripts/README.md
```
to understand the unified script entry and dot path format.
Call
```
health
```
to confirm the service is available.
Search
```
references/scenarios/
```
for matching templates.
For complex, multi-step, high-risk tasks, maintain a short plan of 2-5 steps first; short single-step tasks do not require creating to-dos.

2. 获取目标窗口

2. Get Target Window

调用
```
get_window_list
```
找主窗口和可能的子窗口。
新进程或窗口不确定时，对候选窗口截图，记录窗口内容和可用
```
window_id/main_window_id
```
。
后续操作失败时，先检查是否选错窗口，再换操作模式。

Call
```
get_window_list
```
to find the main window and possible child windows.
When a new process or window is uncertain, take screenshots of candidate windows and record window content and available
```
window_id/main_window_id
```
.
When subsequent operations fail, first check if the wrong window was selected, then switch operation modes.

3. 截图与读坐标

3. Screenshot and Read Coordinates

定位坐标使用
```
screenshot coordinate_type=grid
```
。
分析内容或给用户看图可用
```
coordinate_type=no
```
。
默认先依赖服务端自适应网格参数。
目标没有被交叉点覆盖时，阅读
```
references/api/screenshot.md
```
调整
```
grid.density_x/y
```
。
数字或元素看不清时，使用
```
crop_zoom_screenshot
```
或调整数字参数。
首次动态坐标或高风险坐标，先用
```
crop_zoom_screenshot
```
看清局部，再用
```
marker.0.x/y
```
反向验证。
marker 反验要先找图上的标记点实际落在哪里，描述那里有什么，再判断它是否等于目标；不要先假设候选坐标正确。

Use
```
screenshot coordinate_type=grid
```
to locate coordinates.
Use
```
coordinate_type=no
```
to analyze content or show images to users.
By default, rely on server-side adaptive grid parameters first.
When the target is not covered by intersections, read
```
references/api/screenshot.md
```
to adjust
```
grid.density_x/y
```
.
When numbers or elements are unclear, use
```
crop_zoom_screenshot
```
or adjust number parameters.
For first-time dynamic coordinates or high-risk coordinates, first use
```
crop_zoom_screenshot
```
to see details clearly, then verify in reverse with
```
marker.0.x/y
```
.
For marker verification, first find where the marked point on the image actually falls, describe what is there, then judge whether it matches the target; do not assume candidate coordinates are correct first.

4. 操作与验证

4. Operation and Verification

操作模式优先
```
background
```
。
```
background
```
无效或必须物理输入时才考虑
```
hijack
```
。
用户主动要求、游戏实时操作、中文输入法候选面板等持续物理输入场景，阅读
```
references/api/delegated.md
```
后进入托管。
探索阶段单步调用；流程稳定、需要瞬间观察 hover/菜单/操作结果时才用
```
batch
```
。
每次操作后截图验证，验证不通过则回到截图和读坐标。
收到
```
SELF_CHECK_REQUIRED
```
时必须更新当前计划或下一步动作。

Prioritize the
```
background
```
operation mode.
Only consider
```
hijack
```
when
```
background
```
is invalid or physical input is required.
For scenarios requiring continuous physical input such as user-initiated requests, real-time game operations, and Chinese input method candidate panels, read
```
references/api/delegated.md
```
before entering delegation mode.
Use single-step calls during the exploration phase; use
```
batch
```
only when the process is stable and instant observation of hover/menu/operation results is needed.
Take a screenshot to verify after each operation; if verification fails, return to screenshot and coordinate reading.
When receiving
```
SELF_CHECK_REQUIRED
```
, you must update the current plan or next action.

坐标概念

Coordinate Concept

截图上的坐标格式为

XxY

，例如

50x35

表示距左边界 50%、距上边界 35%。

是坐标分隔符，不是乘号。目标元素的有效坐标是覆盖到该元素的网格交叉点坐标。

The coordinate format on screenshots is

XxY

, for example,

50x35

means 50% from the left boundary and 35% from the top boundary.

is the coordinate separator, not a multiplication sign. The valid coordinate of a target element is the grid intersection coordinate that covers the element.

API 索引

API Index

执行API前先读对应文档
references/api/{endpoint}.md

API	method	适用场景	参考文档
health	GET	任务开始前检查服务	`references/api/health.md`
get_window_list	POST	找出需要被控制的目标窗口	`references/api/get_window_list.md`
screenshot	POST	带网格可定位坐标。不带网格可分析界面、留存记录。带标记点可预览坐标位置	`references/api/screenshot.md`
crop_zoom_screenshot	POST	裁剪任意截图并放大，看清细节（如坐标数字）	`references/api/crop_zoom_screenshot.md`
scroll_screenshot	POST	滚动长截图，记录长页面、长内容，整体理解目标窗口	`references/api/scroll_screenshot.md`
click	POST	单击，触发按钮/进入页面	`references/api/click.md`
long_press	POST	长按，触发某些功能	`references/api/long_press.md`
swipe	POST	触摸式滑动，上下左右移动页面	`references/api/swipe.md`
drag	POST	拖拽元素，按住鼠标并移动	`references/api/drag.md`
scroll	POST	鼠标滚轮滚动，上下移动页面	`references/api/scroll.md`
right_click	POST	右键，打开上下文菜单	`references/api/right_click.md`
hover	POST	触发悬停效果，配合截图获取hover效果	`references/api/hover.md`
mouse_move	POST	鼠标移动，游戏视角控制，仅hijack/托管	`references/api/mouse_move.md`
input_text	POST	输入文本。带坐标会先单击再输入。不带坐标直接输入	`references/api/input_text.md`
press_key	POST	按键/组合键。带坐标会先单击再按键。不带坐标直接按键	`references/api/press_key.md`
wait	POST	等待UI动画/页面加载	`references/api/wait.md`
batch	POST	组合指令，执行连续步骤。多个单步的操作可组合执行，提高效率	`references/api/batch.md`
delegated	POST	用户主动要求进入/退出托管模式	`references/api/delegated.md`

Read the corresponding document
references/api/{endpoint}.md
before executing the API

API	method	Applicable Scenario	Reference Document
health	GET	Check service availability before starting tasks	`references/api/health.md`
get_window_list	POST	Find the target window to be controlled	`references/api/get_window_list.md`
screenshot	POST	With grid for coordinate positioning. Without grid for interface analysis and record keeping. With marker points to preview coordinate positions	`references/api/screenshot.md`
crop_zoom_screenshot	POST	Crop any screenshot and zoom in to see details (such as coordinate numbers)	`references/api/crop_zoom_screenshot.md`
scroll_screenshot	POST	Scroll to take long screenshots, record long pages and content, and understand the target window as a whole	`references/api/scroll_screenshot.md`
click	POST	Single click to trigger buttons/enter pages	`references/api/click.md`
long_press	POST	Long press to trigger certain functions	`references/api/long_press.md`
swipe	POST	Touch-style swipe to move pages up, down, left, or right	`references/api/swipe.md`
drag	POST	Drag elements by holding the mouse and moving	`references/api/drag.md`
scroll	POST	Mouse wheel scroll to move pages up or down	`references/api/scroll.md`
right_click	POST	Right click to open context menus	`references/api/right_click.md`
hover	POST	Trigger hover effects, cooperate with screenshots to obtain hover effects	`references/api/hover.md`
mouse_move	POST	Mouse movement, game perspective control, only for hijack/delegation	`references/api/mouse_move.md`
input_text	POST	Input text. If coordinates are provided, it will click first then input. If no coordinates are provided, it will input directly	`references/api/input_text.md`
press_key	POST	Press keys/combination keys. If coordinates are provided, it will click first then press keys. If no coordinates are provided, it will press keys directly	`references/api/press_key.md`
wait	POST	Wait for UI animations/page loading	`references/api/wait.md`
batch	POST	Combine instructions to execute continuous steps. Multiple single-step operations can be combined to improve efficiency	`references/api/batch.md`
delegated	POST	User actively requests to enter/exit delegation mode	`references/api/delegated.md`

脚本降级

Script Downgrade

降级路径：

text

scripts/screenclaw.py -> scripts/screenclaw.ps1 -> scripts/screenclaw.sh -> curl

降级前先判断原因：

错误类型	处理方式
参数错误	修正参数，重跑同一脚本，不降级
API 业务错误	阅读对应 API 文档和服务端 message，不降级
Python 不存在等环境错误	降级到 PowerShell 或 shell

Downgrade path:

text

scripts/screenclaw.py -> scripts/screenclaw.ps1 -> scripts/screenclaw.sh -> curl

Judge the reason before downgrading:

Error Type	Handling Method
Parameter Error	Correct parameters and re-run the same script; do not downgrade
API Business Error	Read the corresponding API document and server message; do not downgrade
Environment Errors such as Python not existing	Downgrade to PowerShell or shell

参考文档

Reference Documents

references/config.md

- 连接配置、

ai_app_type

、

session_id

```
scripts/README.md
```
- 统一脚本入口和点号路径格式
```
references/self_check.md
```
- 长时程自检重载清单
```
references/api/*.md
```
- 各 API 参数和排错
```
references/scenarios/
```
- 场景模板和应用知识

references/config.md

- Connection configuration,

ai_app_type

session_id

```
scripts/README.md
```
- Unified script entry and dot path format
```
references/self_check.md
```
- Long-term self-check reload checklist
```
references/api/*.md
```
- Parameters and troubleshooting for each API
```
references/scenarios/
```
- Scenario templates and application knowledge