Loading...
Loading...
Found 351 Skills
Build high-quality visual Web artifacts using HTML/CSS/JavaScript/React — web pages, landing pages, dashboards, interactive prototypes, HTML slide decks, animated demos, UI mockups, data visualizations, and more. Use this skill whenever the user's request involves a visual, interactive, or front-end deliverable, including: - Creating web pages, landing pages, dashboards, marketing pages - Building interactive prototypes or UI mockups (with device frames) - Building HTML slide decks / presentations - Creating CSS/JS animations or timeline-driven animated demos - Turning design mockups, screenshots, or PRDs into interactive implementations - Data visualization (Chart.js / D3, etc.) - Design system / UI Kit exploration Even if the user doesn't explicitly say "HTML" or "web page," this skill applies whenever the intent is to produce something visual, interactive, or presentational. Not applicable: pure back-end logic, CLI tools, data-processing scripts, non-visual code tasks, command-line debugging.
Create aesthetically beautiful interfaces following proven design principles. Use when building UI/UX, analyzing designs from inspiration sites, generating design images with ai-multimodal, implementing visual hierarchy and color theory, adding micro-interactions, or creating design documentation. Includes workflows for capturing and analyzing inspiration screenshots with chrome-devtools and ai-multimodal, iterative design image generation until aesthetic standards are met, and comprehensive design system guidance covering BEAUTIFUL (aesthetic principles), RIGHT (functionality/accessibility), SATISFYING (micro-interactions), and PEAK (storytelling) stages. Integrates with chrome-devtools, ai-multimodal, media-processing, ui-styling, and web-frameworks skills.
Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.
Canonical Playwright hub for E2E tests and ad-hoc browser automation. Use when the user explicitly mentions "Playwright", "@playwright/test", "npx playwright", "playwright.config.ts", "PWDEBUG", "trace viewer", or "toHaveScreenshot". Avoid using for generic browser automation unless Playwright is requested, and avoid using for pure web scraping.
Generate interactive TiddlyWiki-style HTML software manuals with screenshots, API docs, and multi-level code examples. Use when creating user guides, software documentation, or API references. Triggers on "software manual", "user guide", "generate manual", "create docs".
Convert images (screenshots, photos, whiteboard) to Mermaid or DOT/Graphviz diagrams
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Conduct comprehensive, multi-round research that produces rich visual reports. Use when asked for "deep research", "comprehensive analysis", "compare frameworks", "evaluate options", "research the state of X", or any task requiring investigation across 10+ sources. NOT for quick lookups — this is a 5-15 minute deep dive that produces a briefing-quality artifact with screenshots, diagrams, tables, and cited findings.
Plan, implement, and debug frontend tests: unit/integration/E2E/visual/a11y. Use for Playwright/Cypress/Vitest/Jest/RTL, flaky test triage, CI stabilization, and canvas/WebGL games (Phaser) needing deterministic input + screenshot/state assertions.
Guide for extracting code or pseudocode from images using OCR and implementing it correctly. This skill should be used when tasks involve reading code, pseudocode, or algorithms from images (PNG, JPG, screenshots) and executing or implementing the extracted logic.
AI-powered E2E testing for any app — Flutter, React Native, iOS, Android, Electron, Tauri, KMP, .NET MAUI. Test 8 platforms with natural language through MCP. No test code needed. Just describe what to test and the agent sees screenshots, taps elements, enters text, scrolls, and verifies UI state automatically.