Loading...
Loading...
Found 293 Skills
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Fine-tune Gemma 4 and 3n models with audio, images, and text on Apple Silicon using PyTorch and Metal Performance Shaders.
Quality gate via second model. Spawn a different AI model to review work before committing. Includes refusal routing: if one model refuses, silently switch to the next.
Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices
Vision and multimodal capabilities for Claude including image analysis, PDF processing, and document understanding. Activate for image input, base64 encoding, multiple images, and visual analysis.
Minimal multimodal embedding smoke test for Model Studio VL embedding models.
CLIP, SigLIP 2, Voyage multimodal-3 patterns for image+text retrieval, cross-modal search, and multimodal document chunking. Use when building RAG with images, implementing visual search, or hybrid retrieval.
Use when "CLIP", "Whisper", "Stable Diffusion", "SDXL", "speech-to-text", "text-to-image", "image generation", "transcription", "zero-shot classification", "image-text similarity", "inpainting", "ControlNet"
Analyze media files (PDFs, images, diagrams) that require interpretation beyond raw text. Extracts specific information or summaries from documents, describes visual content. Use for document analysis, image understanding, diagram interpretation, chart analysis, table extraction, and any media requiring visual or contextual interpretation beyond literal text extraction.
多模态产品图片相似度分析与分组。当用户提到产品图片相似度、视觉分组、查找外观相似的商品、基于图片去重、竞品同款检测、同款商品聚类、按外观分组、image similarity, product image comparison, visual clustering, same-style recognition, appearance deduplication, image grouping时触发此技能。即使用户未明确说"图片相似度",只要其意图涉及商品主图对比、视觉聚类、识别视觉上相同或相似的商品,或根据外观、颜色、构图等视觉特征对商品列表进行后处理,也应触发此技能。
Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.
Implement modals in Umbraco backoffice using official docs