Loading...
Loading...
Found 292 Skills
Modal imperative API guide. Use when creating modal dialogs using createModal from @lobehub/ui. Triggers on modal component implementation or dialog creation tasks.
Use when adding popup forms, confirmations, or detail views to Wix dashboards; creating reusable dialog components across dashboard pages; showing context-specific information in overlays; opening modals from dashboard pages; or passing data between dashboard pages and modals. Do NOT use for static content (use dashboard pages instead) or site-facing UI (use site widgets/embedded scripts).
Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines.
Generate images, video, speech, and transcribe audio using Aliyun Bailian models.
Use when "Modal", "serverless GPU", "cloud GPU", "deploy ML model", or asking about "serverless containers", "GPU compute", "batch processing", "scheduled jobs", "autoscaling ML"
Run GPU workloads on Modal — training, fine-tuning, inference, batch processing. Zero-config serverless: no SSH, no Docker, auto scale-to-zero. Use when user says "modal run", "modal training", "modal inference", "deploy to modal", "need a GPU", "run on modal", "serverless GPU", or needs remote GPU compute.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
基于多模态AI的图片识别与分析。当用户想分析、描述、从图片URL中提取信息、image recognition, image analysis, image description, image content understanding, OCR text recognition, visual Q&A时触发此技能。当用户提到图片识别、图片分析、图片描述、识别图片内容、分析产品图、从图片中读取文字、描述图片、提取视觉内容或理解照片内容时触发。当用户提供图片URL并就其视觉内容提问时,即使未明确说"图片识别",也应触发此技能。
Process multimodal inputs (images, video, audio, PDFs) with Gemini 3 Pro. Covers image understanding, video analysis, audio processing, document extraction, media resolution control, OCR, and token optimization. Use when analyzing images, processing video, transcribing audio, extracting PDF content, or working with multimodal data.
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Use when animating modals, dialogs, popovers, or overlay content to create smooth entrances and exits
Use when visual reasoning is needed with Alibaba Cloud Model Studio QVQ models, including step-by-step image reasoning, chart analysis, and visually grounded problem solving.