Loading...
Loading...
Found 351 Skills
Understand images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A, visual analysis, OCR-like extraction, chart/table reading, or screenshot understanding workflows.
Extract actionable Linear tickets from ambiguous input — Slack conversations, call transcripts, screenshots, meeting notes, or any unstructured material. Proposes tickets in a scratchpad file for user review, then creates them in Linear on approval. Use when the user wants to turn conversations, transcripts, screenshots, or notes into Linear tickets. Also use when user says "create tickets from this", "send to linear", "make issues from this call/chat", or provides raw material and asks for tickets.
Remote KVM control via PiKVM REST API. Use for controlling remote computers through PiKVM - taking screenshots, moving mouse, clicking, typing text, pressing keys, keyboard shortcuts, scrolling, or power management.
Visual design intelligence and UI aesthetics. Integrates: chrome-devtools, ai-multimodal, media-processing. Capabilities: design analysis, visual hierarchy, color theory, typography, micro-interactions, animation, design systems, accessibility. Actions: analyze, design, create, capture, evaluate, implement UI aesthetics. Keywords: Dribbble, Behance, Mobbin, design inspiration, visual hierarchy, color palette, typography, spacing, animation, micro-interaction, design system, style guide, accessibility, WCAG, contrast ratio, golden ratio, whitespace, visual rhythm. Use when: building beautiful UIs, analyzing design inspiration, implementing visual hierarchy, adding animations/micro-interactions, creating design systems, evaluating aesthetic quality, capturing design screenshots.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Reverse-engineer an application's design system from its codebase and screenshots. Use when asked to analyse visual design, extract a colour palette, document UI patterns, identify typography and spacing systems, audit design consistency, or understand the design language of a frontend codebase.
QA web testing skill using Chrome DevTools MCP tools for visual regression testing, responsive breakpoint validation, and CSS layout debugging. Use this skill whenever the user asks to "test a page", "check breakpoints", "verify responsive layout", "QA this page", "test CSS at different viewports", "check for layout bugs", "verify the fix", or wants to visually inspect a web page at specific viewport widths. Also triggers when the user provides a URL and asks to take screenshots, compare layouts, or inspect element dimensions. Works with any Chrome DevTools MCP-connected browser session on localhost or staging environments.
A tool for generating application materials for Chinese software copyright, which directly outputs LaTeX source files and compiles them into PDF. It automatically analyzes project code to generate four materials: copyright registration application form, source code document (30 pages at the beginning and 30 at the end, total 60 pages), user manual, and design specification, including headers and footers, screenshot placeholders, and information consistency checks. Applicable to various software projects such as WeChat Mini Programs, Web applications, mobile Apps, and desktop applications. This Skill must be used when users mention software copyright, software copyright application, or copyright registration. It should also be used when users need to prepare copyright materials or generate software copyright documents for any software project.
Expert in extracting text from images using Tesseract, EasyOCR, PaddleOCR, Google Vision, AWS Textract, Claude Vision. Trigger: When extracting text from images, screenshots, scanned documents, or PDFs.
Controls Windows Remote Desktop sessions for automation, testing, and remote administration. Use when the user needs to connect to Windows machines via RDP, take screenshots, click, type, or interact with remote Windows desktops.
Create polished frontend interfaces from designs/screenshots with high design quality. Use for web components, pages, replicating UI designs, extracting design guidelines, avoiding AI slop aesthetics.
Cross-platform operating system automation and screen control toolkit. Use when users need screenshots, mouse/keyboard control, visual recognition, window management, browser automation, or desktop automation tasks. Supports macOS 12+ and Windows 10+. On macOS, uses AppleScript, pyautogui, and OpenCV. On Windows, uses pywinauto, pyautogui, and OpenCV (no Hammerspoon equivalent).