Loading...
Loading...
Found 25 Skills
Create aesthetically beautiful interfaces following proven design principles. Use when building UI/UX, analyzing designs from inspiration sites, generating design images with ai-multimodal, implementing visual hierarchy and color theory, adding micro-interactions, or creating design documentation. Includes workflows for capturing and analyzing inspiration screenshots with chrome-devtools and ai-multimodal, iterative design image generation until aesthetic standards are met, and comprehensive design system guidance covering BEAUTIFUL (aesthetic principles), RIGHT (functionality/accessibility), SATISFYING (micro-interactions), and PEAK (storytelling) stages. Integrates with chrome-devtools, ai-multimodal, media-processing, ui-styling, and web-frameworks skills.
Visual design intelligence and UI aesthetics. Integrates: chrome-devtools, ai-multimodal, media-processing. Capabilities: design analysis, visual hierarchy, color theory, typography, micro-interactions, animation, design systems, accessibility. Actions: analyze, design, create, capture, evaluate, implement UI aesthetics. Keywords: Dribbble, Behance, Mobbin, design inspiration, visual hierarchy, color palette, typography, spacing, animation, micro-interaction, design system, style guide, accessibility, WCAG, contrast ratio, golden ratio, whitespace, visual rhythm. Use when: building beautiful UIs, analyzing design inspiration, implementing visual hierarchy, adding animations/micro-interactions, creating design systems, evaluating aesthetic quality, capturing design screenshots.
FFmpeg-based media transcoding workflows with preset-driven conversions, batch processing, and safe backups for web/mobile/archive outputs.
Extends an image canvas by adding padding on all sides with a solid background color. Use when you need to add borders, margins, or expand the canvas area around an image.
Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).
Convert video clips to optimized GIFs with speed control, cropping, text overlays, and file size optimization. Create perfect GIFs for social media, documentation, and presentations.
Background knowledge for droid-control workflows -- not invoked directly. Video assembly via Remotion — title cards, layout, transitions, effects, and showcase polish.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Generate FFmpeg commands from natural language video editing requests - cut, trim, convert, compress, change aspect ratio, extract audio, and more.
Extract frames from videos at specific timestamps or intervals, find best frames, and generate thumbnail grids for previews.
Remotion Best Practices - Create Videos with React
Video Transcript Extraction Expert (based on Doubao Video Understanding Model). Supports links from Bilibili, Douyin, Xiaohongshu, YouTube, or local video files. Runs entirely in the background (headless) on the user's computer, no pop-ups, no requirement to log in to video platforms. Outputs strict verbatim transcripts with "semantic segmentation + paragraph-level timestamps" (retains colloquial words, internet memes, pauses). Long videos are automatically segmented to avoid being summarized by the model. Trigger scenarios: - User says "generate transcript", "extract transcript", "convert to text", "video to text" - User says "dictate video", "extract video copy", "video subtitles" - User uses the /video-transcript command - User pastes a video link (Bilibili/Douyin/Xiaohongshu/YouTube) with the intention of getting a text version