Loading...
Loading...
Found 37 Skills
Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok
PixVerse CLI — generate AI videos and images from the command line. Supports PixVerse, Veo, Sora, Kling, Hailuo, Wan, and more video models; Nano Banana (Gemini), Seedream, Qwen image models; and PixVerse's rich effect template library. Start here.
Generate AI videos using the Pollo AI API. Supports 13 leading models (Kling, Sora, Runway, Veo, Pixverse, Hailuo, Vidu, Luma, Pika, Wan, Seedance, Hunyuan, Pollo) with 50+ versions. It also supports task polling, credit cost estimation, and credit balance checks. Use this skill whenever the user wants to generate an AI video from text or image, use any AI video model, check Pollo credits, or mentions Pollo AI, pollo.ai, or any of the supported model names. Even if the user just says "generate a video" or "make me a short clip" without mentioning Pollo, this skill should be used.
Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
Direct high-fidelity cinematic video with AI — translates creative intent into technical cinematographic directives for Veo3, Kling, and Luma video models via muapi.ai
Generate AI images, videos, music, and audio from the terminal via muapi.ai — supports 100+ models including Flux, Midjourney v7, Kling 3.0, Veo3, and Suno V5
Professional AI Video Storyboard Designer. This skill must be used when users want to create videos, make storyboard scripts, generate AI video prompts, or plan video content structures. It covers all video types: short videos, commercials, educational content, brand videos, vlogs, micro-films, etc. Even if users only say "Help me make a video" or "I want to create content on the theme of XX", this skill should be triggered. It outputs professional storyboard designs + prompts that can be directly used in mainstream AI video tools like Seedance 2.0 (Jimeng), Sora, Kling, Runway, Veo, etc. Among them, Seedance 2.0 supports specialized output with multimodal @ reference syntax.
Generate videos from text prompts or animate static images using ModelsLab's v7 Video Fusion API. Supports text-to-video, image-to-video, video-to-video, lip-sync, and motion control with 40+ models including Seedance, Wan, Veo, Sora, Kling, and Hailuo.
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Generate videos from text and image prompts via Together AI. 15+ models including Veo 2/3, Sora 2, Kling 2.1, Hailuo 02, Seedance, PixVerse, Vidu. Supports text-to-video, image-to-video, keyframe control, and reference images. Use when users want to generate videos, create video content, animate images, or work with any video generation task.
Complete Google Gemini API reference for 2026. Use whenever writing code that calls Gemini models. Covers the google-genai SDK, Gemini 3/3.1 models, thought signatures, thinking config, Interactions API, File Search (managed RAG), Computer Use, URL Context, Nano Banana image gen, Live API, ephemeral tokens, TTS, Veo video gen, Lyria music gen, and all tools. ALWAYS prefer `from google import genai` over any legacy import. Use this skill for ANY Gemini API question, even simple ones.