Loading...
Loading...
Found 485 Skills
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
Systematically explore and test a mobile app on iOS/Android with agent-device to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", or "test this app" on mobile. Produces a structured report with reproducible evidence: screenshots, optional repro videos, and detailed steps for every issue.
Expert in extracting text from images using Tesseract, EasyOCR, PaddleOCR, Google Vision, AWS Textract, Claude Vision. Trigger: When extracting text from images, screenshots, scanned documents, or PDFs.
OCR skill using PaddleOCR model via SiliconFlow API. This skill should be used when the user asks to "recognize text from an image", "extract text from a photo", "OCR this image", "read text from screenshot", or mentions "PaddleOCR", "image text recognition", "text extraction from images".
Write marketing copy and App Store / Google Play listings (ASO keywords, titles, subtitles, short+long descriptions, feature bullets, release notes), plus screenshot caption sets and text-to-image prompt templates for generating store screenshot backgrounds/promo visuals. Use when asked to: write/refresh app marketing copy, craft app store metadata, brainstorm taglines/value props, produce ad/landing/email copy, or generate prompts for screenshot/creative generation.
Build and test the longest uncovered user journey from spec.md. Reads the product spec, checks existing journeys, picks the longest untested path, writes a UI test with screenshots at every step, then runs 3 polish rounds (testability → refactor UI test → UI review) until everything is clean. Use when the user says "next journey", "add journey", "test the next flow", "journey builder", or "cover more user paths".
Deep UI walkthrough with screenshot-based analysis across all pages and viewports (desktop + tablet + mobile). Delivers per-page improvement pitches grounded in what you actually see. Use when user says 'review the UI', 'pitch UI improvements', 'how does this look', 'UX audit', 'walk through the app'.
Use kuri-agent to automate Chrome — navigate pages, interact with elements via a11y refs, capture screenshots, run security audits, enumerate cookies/JWTs, probe for IDOR vulnerabilities, and make authenticated fetches. Use when the user wants to automate a browser, test a web app, scrape data, or run security trajectories against a live site.
Extracts the full design soul, system, and agent rules from reference UI images. Use this skill when the user provides screenshots, Figma exports, or any UI reference images and wants the agent to design with the same soul, taste, feeling, and personality — not just copy colors and spacing. Marrow reads beneath the surface: it extracts the living core of a design — the decisions, proportions, restraint, and emotional intent that make a UI feel the way it does. Triggers on: /marrow, /extract-ui, /design-from-ref, /read-design, or any prompt like "extract the design system from these images", "make it look and feel like this", "get the rules from this UI", "build with the same soul", "match this design". Always use this skill when images are provided alongside a request to replicate, match, or be inspired by a design.
Extract comprehensive, production-ready JSON design specifications from visual inputs using a 7-pass serial architecture with cross-validation. Use when converting screenshots, mockups, or design exports into structured design tokens, component specs, accessibility analysis, and developer handoff artifacts.
Control macOS applications with Pi agents using semantic Accessibility API targets and optional screenshots
Turn an existing HTML page, landing page, oral script, memo draft, result table, or structured source material into a Xiaohongshu card-style image note. Use this when the user wants page-by-page card planning, cover copy, card text, or a design-ready Xiaohongshu图文 brief based on source material rather than writing a plain note from scratch. This skill is especially for 3:4 Xiaohongshu cards that may mix image-led pages with high-density memo pages, using strong information hierarchy and screenshot-worthy text density rather than generic sparse carousel copy.