Loading...
Loading...
Obtain screenshots with coordinate grids via screenclaw, read the coordinate numbers on the image, then call APIs such as click, input, and key press to control software and simulate human visual interaction. Use this skill in the following scenarios: - Need to operate software visually - User's goal requires understanding the interface, reading element coordinates, or verifying UI status through screenshots - Scenarios not covered by Playwright, CLI, or application-specific APIs, such as hidden buttons, games, captchas, and any ordinary desktop software - User requests automated operation of software without dedicated skills
npx skill4agent add ginsing1226/screenclaw screenclawXxYsession_idscripts/screenclaw.py|ps1|shreferences/api/{endpoint}.mdSELF_CHECK_REQUIREDreferences/self_check.mdself_checkUnderstand Goal -> health -> config -> get_window_list -> screenshot -> Read Coordinates -> Marker Verification -> Operation -> Screenshot Verificationreferences/config.mdapi_urltokenai_app_typesession_idscripts/README.mdhealthreferences/scenarios/get_window_listwindow_id/main_window_idscreenshot coordinate_type=gridcoordinate_type=noreferences/api/screenshot.mdgrid.density_x/ycrop_zoom_screenshotcrop_zoom_screenshotmarker.0.x/ybackgroundhijackbackgroundreferences/api/delegated.mdbatchSELF_CHECK_REQUIREDXxY50x35xRead the corresponding documentbefore executing the APIreferences/api/{endpoint}.md
| API | method | Applicable Scenario | Reference Document |
|---|---|---|---|
| health | GET | Check service availability before starting tasks | |
| get_window_list | POST | Find the target window to be controlled | |
| screenshot | POST | With grid for coordinate positioning. Without grid for interface analysis and record keeping. With marker points to preview coordinate positions | |
| crop_zoom_screenshot | POST | Crop any screenshot and zoom in to see details (such as coordinate numbers) | |
| scroll_screenshot | POST | Scroll to take long screenshots, record long pages and content, and understand the target window as a whole | |
| click | POST | Single click to trigger buttons/enter pages | |
| long_press | POST | Long press to trigger certain functions | |
| swipe | POST | Touch-style swipe to move pages up, down, left, or right | |
| drag | POST | Drag elements by holding the mouse and moving | |
| scroll | POST | Mouse wheel scroll to move pages up or down | |
| right_click | POST | Right click to open context menus | |
| hover | POST | Trigger hover effects, cooperate with screenshots to obtain hover effects | |
| mouse_move | POST | Mouse movement, game perspective control, only for hijack/delegation | |
| input_text | POST | Input text. If coordinates are provided, it will click first then input. If no coordinates are provided, it will input directly | |
| press_key | POST | Press keys/combination keys. If coordinates are provided, it will click first then press keys. If no coordinates are provided, it will press keys directly | |
| wait | POST | Wait for UI animations/page loading | |
| batch | POST | Combine instructions to execute continuous steps. Multiple single-step operations can be combined to improve efficiency | |
| delegated | POST | User actively requests to enter/exit delegation mode | |
scripts/screenclaw.py -> scripts/screenclaw.ps1 -> scripts/screenclaw.sh -> curl| Error Type | Handling Method |
|---|---|
| Parameter Error | Correct parameters and re-run the same script; do not downgrade |
| API Business Error | Read the corresponding API document and server message; do not downgrade |
| Environment Errors such as Python not existing | Downgrade to PowerShell or shell |
references/config.mdai_app_typesession_idscripts/README.mdreferences/self_check.mdreferences/api/*.mdreferences/scenarios/