Loading...
Loading...
Found 18 Skills
When you need to control macOS desktop apps (Atlas, Chrome, Finder) → automate via AppleScript.
Let agents control many desktop software directly from the cli, with one pip install, and no MCP servers.
Expert in AT-SPI2 (Assistive Technology Service Provider Interface) for Linux desktop automation. Specializes in accessible automation of GTK/Qt applications via D-Bus accessibility interface. HIGH-RISK skill requiring security controls for system-wide access.
Control the mouse, keyboard, and read screen content via accessibility. Use this skill when the user asks to click somewhere on screen, type text into an app, move the mouse, press keyboard shortcuts, read what's on screen, get the accessibility tree of the current window, automate desktop interactions, or control the computer.
Desktop automation via native OS accessibility trees using the agent-desktop CLI. Use when an AI agent needs to observe, interact with, or automate desktop applications (click buttons, fill forms, navigate menus, read UI state, toggle checkboxes, scroll, drag, type text, take screenshots, manage windows, use clipboard). Covers 50 commands across observation, interaction, keyboard/mouse, app lifecycle, clipboard, and wait. Triggers on: "click button", "fill form", "open app", "read UI", "automate desktop", "accessibility tree", "snapshot app", "type into field", "navigate menu", "toggle checkbox", "take screenshot", "desktop automation", "agent-desktop", or any desktop GUI interaction task. Supports macOS (Phase 1), with Windows and Linux planned.
基于 PyAutoGUI 的桌面自动化工具。支持截图、鼠标键盘控制、图像识别、颜色检测等操作。当用户需要自动化操作桌面应用、执行 UI 测试或重复性任务时触发。
Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.
Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM.
macOS native app automation CLI for AI agents. Use when the user needs to interact with macOS desktop applications, including opening apps, clicking buttons, toggling settings, filling forms, reading UI state, automating System Settings, controlling Finder, Safari, or any native app.
Expert in Windows UI Automation (UIA) and Win32 APIs for desktop automation. Specializes in accessible, secure automation of Windows applications including element discovery, input simulation, and process interaction. HIGH-RISK skill requiring strict security controls for system access.
Vision-driven desktop automation using Midscene. Control your desktop (macOS, Windows, Linux) with natural language commands. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible elements on screen regardless of technology stack. ⚠️ Takes over the user's real mouse and keyboard. For web apps, prefer "Browser Automation" instead. Only use this for desktop-native apps (Electron, Qt, native macOS/Windows/Linux) that cannot run in a browser. Triggers: open app, press key, desktop, computer, click on screen, type text, screenshot desktop, launch application, switch window, desktop automation, control computer, mouse click, keyboard shortcut, screen capture, find on screen, read screen, verify window, close app, test Electron app Powered by Midscene.js (https://midscenejs.com)
Computer Use Agent (CUA) for macOS automation using TuriX. Use when you need to perform visual tasks on the desktop, such as opening apps, clicking buttons, or navigating UIs that don't have a CLI or API.