Loading...
Loading...
Found 85 Skills
Agent-IM Conversation Skill - Create sessions, send messages such as image/video generation requests via OpenAPI, and query session progress. This skill is activated when users need to generate images/videos or query current session messages.
Generate on-brand marketing images via Codex's built-in image_generation tool. Trigger when user asks to: (a) create marketing assets (ad / logo / slide / product-mockup / scene / lighting-transform / LinkedIn-or-social carousel) for a specific brand, (b) extract or build a brand profile (DESIGN.md) from URL / Tailwind config / tokens.json / Figma Variables / CSS custom props / description / existing brand asset, (c) maintain on-brand consistency across multiple image jobs for the same brand. Do NOT trigger for: UI code generation, frontend reference imagery (use imagegen-frontend-web instead), video generation, or general image editing without brand context.
Create, repair, validate, preview, and package Codex-compatible animated pet spritesheets from character art, screenshots, generated images, or visual references. Use when a user wants to hatch a Codex pet, create a custom animated pet, or build a built-in pet asset with an 8x9 atlas, transparent unused cells, row-by-row animation prompts, QA contact sheets, preview videos, and pet.json packaging. This skill composes the installed $imagegen system skill for visual generation and uses bundled scripts for deterministic spritesheet assembly.
BFL FLUX API integration guide covering endpoints, async polling patterns, rate limiting, error handling, webhooks, and regional endpoints with Python and TypeScript code examples.
Usage guide for the `flux` CLI (@vforsh/flux) to generate and edit images via Black Forest Labs (BFL) FLUX API. Use for setting up the BFL API key, choosing models, generating images, editing with references, inpaint/outpaint, waiting for results, checking credits, using --plain/--json output, and handling common API errors (402/403/429).
Generate images using ModelScope Z-Image models (Z-Image-Turbo, Z-Image, Z-Image-Edit). Use when user asks to generate images, create artwork, or requests image generation functionality. Supports async generation with polling and optional LoRA configurations. IMPORTANT - Model Selection Rule: If the user explicitly mentions "Z-Image-Turbo" in their prompt, use "Tongyi-MAI/Z-Image-Turbo"; if they explicitly mention "Z-Image" (without Turbo), use "Tongyi-MAI/Z-Image"; otherwise, use the default "Tongyi-MAI/Z-Image-Turbo".
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Generate images and videos with Kling O3 — Kling's most powerful model family. Text-to-image, text-to-video, image-to-video, and video-to-video editing. Use when the user requests "Kling", "Kling O3", "Best quality video", "Kling image", "Kling video editing".
Generate images using Minimax image-01, triggered when the user says "Generate images with Minimax".
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.
External AI API integration with retry logic, rate limiting, content safety detection, and multi-turn conversation support for image generation.