screenclaw

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

screenclaw

screenclaw

核心规则

Core Rules

  1. 坐标必须来自 ScreenClaw 截图上的
    XxY
    网格交叉点,不能用内部视觉坐标或凭感觉推测。
  2. 每个会话固定一个
    session_id
    ,所有公开调用只使用
    scripts/screenclaw.py|ps1|sh
  3. 调用 endpoint 前阅读
    references/api/{endpoint}.md
    ;参数错误或 API 报错时回到文档修正。
  4. 首次动态坐标、高风险坐标、看不清目标或数字时,必须先裁剪放大或 marker 反验。
  5. API 成功只代表指令已发送,不代表界面达成目标;操作后必须截图验证。
  6. 收到
    SELF_CHECK_REQUIRED
    时,阅读并执行
    references/self_check.md
    的自检程序,让关键上下文重新装载,再用
    self_check
    总结执行内容后重试截图。
  1. Coordinates must come from the
    XxY
    grid intersections on ScreenClaw screenshots; internal visual coordinates or speculative coordinates are not allowed.
  2. A fixed
    session_id
    is used for each session, and all public calls only use
    scripts/screenclaw.py|ps1|sh
    .
  3. Read
    references/api/{endpoint}.md
    before calling an endpoint; if there are parameter errors or API errors, return to the document to correct them.
  4. For first-time dynamic coordinates, high-risk coordinates, or when the target or numbers are unclear, you must first crop and zoom or verify with a marker.
  5. API success only means the instruction has been sent, not that the interface has achieved the goal; you must take a screenshot to verify after the operation.
  6. When receiving
    SELF_CHECK_REQUIRED
    , read and execute the self-check procedure in
    references/self_check.md
    to reload key context, then summarize the executed content with
    self_check
    and retry taking screenshots.

心智模型

Mental Model

  1. 外部事实优先:截图、marker、API 返回、窗口列表是事实;你的直觉和预设结论必须服从这些事实。marker 不在目标上,就说明坐标错了。
  2. 先证伪,再操作:候选坐标默认可能是错的。先用 crop 或 marker 找反例,确认标记点实际落在哪里,再操作。
  3. 失败先回到截图:操作失败、结果不符合预期、连续微调无效时,先重新截图、裁剪、读文档和参数,不要继续猜坐标或重复点击。
  1. External facts take priority: screenshots, markers, API returns, and window lists are facts; your intuition and preset conclusions must obey these facts. If the marker is not on the target, it means the coordinates are wrong.
  2. Falsify first, then operate: candidate coordinates are assumed to be wrong by default. Use crop or marker to find counterexamples, confirm where the marked point actually falls, then operate.
  3. Return to screenshots when failing: when operations fail, results do not meet expectations, or continuous fine-tuning is ineffective, first re-take screenshots, crop, read documents and parameters, do not continue guessing coordinates or repeating clicks.

固定工作循环

Fixed Work Cycle

text
理解目标 -> health -> config -> get_window_list -> screenshot -> 读坐标 -> marker 反验 -> 操作 -> screenshot 验证
text
Understand Goal -> health -> config -> get_window_list -> screenshot -> Read Coordinates -> Marker Verification -> Operation -> Screenshot Verification

1. 初始化

1. Initialization

  • 根据用户语言回复。
  • 阅读
    references/config.md
    获取
    api_url
    token
    ai_app_type
    session_id
    规则。
  • 阅读
    scripts/README.md
    了解统一脚本入口和点号路径格式。
  • 调用
    health
    确认服务可用。
  • 搜索
    references/scenarios/
    是否有匹配模板。
  • 复杂、多步、高风险任务先维护 2-5 步简短计划;简单单步任务不强制创建待办。
  • Respond according to the user's language.
  • Read
    references/config.md
    to get the rules for
    api_url
    ,
    token
    ,
    ai_app_type
    , and
    session_id
    .
  • Read
    scripts/README.md
    to understand the unified script entry and dot path format.
  • Call
    health
    to confirm the service is available.
  • Search
    references/scenarios/
    for matching templates.
  • For complex, multi-step, high-risk tasks, maintain a short plan of 2-5 steps first; short single-step tasks do not require creating to-dos.

2. 获取目标窗口

2. Get Target Window

  • 调用
    get_window_list
    找主窗口和可能的子窗口。
  • 新进程或窗口不确定时,对候选窗口截图,记录窗口内容和可用
    window_id/main_window_id
  • 后续操作失败时,先检查是否选错窗口,再换操作模式。
  • Call
    get_window_list
    to find the main window and possible child windows.
  • When a new process or window is uncertain, take screenshots of candidate windows and record window content and available
    window_id/main_window_id
    .
  • When subsequent operations fail, first check if the wrong window was selected, then switch operation modes.

3. 截图与读坐标

3. Screenshot and Read Coordinates

  • 定位坐标使用
    screenshot coordinate_type=grid
  • 分析内容或给用户看图可用
    coordinate_type=no
  • 默认先依赖服务端自适应网格参数。
  • 目标没有被交叉点覆盖时,阅读
    references/api/screenshot.md
    调整
    grid.density_x/y
  • 数字或元素看不清时,使用
    crop_zoom_screenshot
    或调整数字参数。
  • 首次动态坐标或高风险坐标,先用
    crop_zoom_screenshot
    看清局部,再用
    marker.0.x/y
    反向验证。
  • marker 反验要先找图上的标记点实际落在哪里,描述那里有什么,再判断它是否等于目标;不要先假设候选坐标正确。
  • Use
    screenshot coordinate_type=grid
    to locate coordinates.
  • Use
    coordinate_type=no
    to analyze content or show images to users.
  • By default, rely on server-side adaptive grid parameters first.
  • When the target is not covered by intersections, read
    references/api/screenshot.md
    to adjust
    grid.density_x/y
    .
  • When numbers or elements are unclear, use
    crop_zoom_screenshot
    or adjust number parameters.
  • For first-time dynamic coordinates or high-risk coordinates, first use
    crop_zoom_screenshot
    to see details clearly, then verify in reverse with
    marker.0.x/y
    .
  • For marker verification, first find where the marked point on the image actually falls, describe what is there, then judge whether it matches the target; do not assume candidate coordinates are correct first.

4. 操作与验证

4. Operation and Verification

  • 操作模式优先
    background
  • background
    无效或必须物理输入时才考虑
    hijack
  • 用户主动要求、游戏实时操作、中文输入法候选面板等持续物理输入场景,阅读
    references/api/delegated.md
    后进入托管。
  • 探索阶段单步调用;流程稳定、需要瞬间观察 hover/菜单/操作结果时才用
    batch
  • 每次操作后截图验证,验证不通过则回到截图和读坐标。
  • 收到
    SELF_CHECK_REQUIRED
    时必须更新当前计划或下一步动作。
  • Prioritize the
    background
    operation mode.
  • Only consider
    hijack
    when
    background
    is invalid or physical input is required.
  • For scenarios requiring continuous physical input such as user-initiated requests, real-time game operations, and Chinese input method candidate panels, read
    references/api/delegated.md
    before entering delegation mode.
  • Use single-step calls during the exploration phase; use
    batch
    only when the process is stable and instant observation of hover/menu/operation results is needed.
  • Take a screenshot to verify after each operation; if verification fails, return to screenshot and coordinate reading.
  • When receiving
    SELF_CHECK_REQUIRED
    , you must update the current plan or next action.

坐标概念

Coordinate Concept

截图上的坐标格式为
XxY
,例如
50x35
表示距左边界 50%、距上边界 35%。
x
是坐标分隔符,不是乘号。目标元素的有效坐标是覆盖到该元素的网格交叉点坐标。
The coordinate format on screenshots is
XxY
, for example,
50x35
means 50% from the left boundary and 35% from the top boundary.
x
is the coordinate separator, not a multiplication sign. The valid coordinate of a target element is the grid intersection coordinate that covers the element.

API 索引

API Index

执行API前先读对应文档
references/api/{endpoint}.md
APImethod适用场景参考文档
healthGET任务开始前检查服务
references/api/health.md
get_window_listPOST找出需要被控制的目标窗口
references/api/get_window_list.md
screenshotPOST带网格可定位坐标。不带网格可分析界面、留存记录。带标记点可预览坐标位置
references/api/screenshot.md
crop_zoom_screenshotPOST裁剪任意截图并放大,看清细节(如坐标数字)
references/api/crop_zoom_screenshot.md
scroll_screenshotPOST滚动长截图,记录长页面、长内容,整体理解目标窗口
references/api/scroll_screenshot.md
clickPOST单击,触发按钮/进入页面
references/api/click.md
long_pressPOST长按,触发某些功能
references/api/long_press.md
swipePOST触摸式滑动,上下左右移动页面
references/api/swipe.md
dragPOST拖拽元素,按住鼠标并移动
references/api/drag.md
scrollPOST鼠标滚轮滚动,上下移动页面
references/api/scroll.md
right_clickPOST右键,打开上下文菜单
references/api/right_click.md
hoverPOST触发悬停效果,配合截图获取hover效果
references/api/hover.md
mouse_movePOST鼠标移动,游戏视角控制,仅hijack/托管
references/api/mouse_move.md
input_textPOST输入文本。带坐标会先单击再输入。不带坐标直接输入
references/api/input_text.md
press_keyPOST按键/组合键。带坐标会先单击再按键。不带坐标直接按键
references/api/press_key.md
waitPOST等待UI动画/页面加载
references/api/wait.md
batchPOST组合指令,执行连续步骤。多个单步的操作可组合执行,提高效率
references/api/batch.md
delegatedPOST用户主动要求进入/退出托管模式
references/api/delegated.md
Read the corresponding document
references/api/{endpoint}.md
before executing the API
APImethodApplicable ScenarioReference Document
healthGETCheck service availability before starting tasks
references/api/health.md
get_window_listPOSTFind the target window to be controlled
references/api/get_window_list.md
screenshotPOSTWith grid for coordinate positioning. Without grid for interface analysis and record keeping. With marker points to preview coordinate positions
references/api/screenshot.md
crop_zoom_screenshotPOSTCrop any screenshot and zoom in to see details (such as coordinate numbers)
references/api/crop_zoom_screenshot.md
scroll_screenshotPOSTScroll to take long screenshots, record long pages and content, and understand the target window as a whole
references/api/scroll_screenshot.md
clickPOSTSingle click to trigger buttons/enter pages
references/api/click.md
long_pressPOSTLong press to trigger certain functions
references/api/long_press.md
swipePOSTTouch-style swipe to move pages up, down, left, or right
references/api/swipe.md
dragPOSTDrag elements by holding the mouse and moving
references/api/drag.md
scrollPOSTMouse wheel scroll to move pages up or down
references/api/scroll.md
right_clickPOSTRight click to open context menus
references/api/right_click.md
hoverPOSTTrigger hover effects, cooperate with screenshots to obtain hover effects
references/api/hover.md
mouse_movePOSTMouse movement, game perspective control, only for hijack/delegation
references/api/mouse_move.md
input_textPOSTInput text. If coordinates are provided, it will click first then input. If no coordinates are provided, it will input directly
references/api/input_text.md
press_keyPOSTPress keys/combination keys. If coordinates are provided, it will click first then press keys. If no coordinates are provided, it will press keys directly
references/api/press_key.md
waitPOSTWait for UI animations/page loading
references/api/wait.md
batchPOSTCombine instructions to execute continuous steps. Multiple single-step operations can be combined to improve efficiency
references/api/batch.md
delegatedPOSTUser actively requests to enter/exit delegation mode
references/api/delegated.md

脚本降级

Script Downgrade

降级路径:
text
scripts/screenclaw.py -> scripts/screenclaw.ps1 -> scripts/screenclaw.sh -> curl
降级前先判断原因:
错误类型处理方式
参数错误修正参数,重跑同一脚本,不降级
API 业务错误阅读对应 API 文档和服务端 message,不降级
Python 不存在等环境错误降级到 PowerShell 或 shell
Downgrade path:
text
scripts/screenclaw.py -> scripts/screenclaw.ps1 -> scripts/screenclaw.sh -> curl
Judge the reason before downgrading:
Error TypeHandling Method
Parameter ErrorCorrect parameters and re-run the same script; do not downgrade
API Business ErrorRead the corresponding API document and server message; do not downgrade
Environment Errors such as Python not existingDowngrade to PowerShell or shell

参考文档

Reference Documents

  • references/config.md
    - 连接配置、
    ai_app_type
    session_id
  • scripts/README.md
    - 统一脚本入口和点号路径格式
  • references/self_check.md
    - 长时程自检重载清单
  • references/api/*.md
    - 各 API 参数和排错
  • references/scenarios/
    - 场景模板和应用知识
  • references/config.md
    - Connection configuration,
    ai_app_type
    ,
    session_id
  • scripts/README.md
    - Unified script entry and dot path format
  • references/self_check.md
    - Long-term self-check reload checklist
  • references/api/*.md
    - Parameters and troubleshooting for each API
  • references/scenarios/
    - Scenario templates and application knowledge