Jimeng AI Generation Skill
Overview
The Jimeng Skill enables AI-driven image and video generation through jimeng-mcp-server, an MCP (Model Context Protocol) server integrated with Jimeng AI's multimodal generation capabilities. This skill allows you to create visual content directly through natural language instructions.
Core Capabilities:
- 🎨 Text-to-Image: Generate high-quality images from text descriptions
- 🎭 Image Synthesis: Intelligently merge and blend multiple images
- 🎬 Text-to-Video: Create short videos from text prompts
- 🎞️ Image-to-Video: Add animation effects to static images
When to Use This Skill:
- Users request to generate, create or produce images or videos
- Users mention "jimeng", "Jimeng" or request AI visual content generation
- Users provide text descriptions and expect visual outputs
- Users want to combine, merge or synthesize multiple images
- Users want to add animation or motion effects to static images
Prerequisites
Before using this skill, ensure jimeng-mcp-server is properly configured:
-
Server Must Be Running, in one of the following modes:
- stdio Mode: Configured in MCP clients (Claude Desktop, Cherry Studio)
- SSE Mode: Run as an HTTP server with SSE transmission
- HTTP Mode: Run as a REST API server
-
Environment Variables Configured:
- : Your Jimeng API key (obtained from Jimeng website cookies)
- : API endpoint (default: http://127.0.0.1:8001)
- : Model name (default: jimeng-4.5)
-
Backend API Running: The jimeng-free-api-all Docker container must be active
For detailed setup instructions, refer to
references/setup_guide.md
.
Quick Start
Basic Usage Workflow
When users request image or video generation, follow this workflow:
- Identify Task Type based on user input
- Extract Required Parameters from the request
- Call the Corresponding jimeng-mcp-server Tool
- Return Generated Content URLs to the user
Example Requests
Text-to-Image:
User: "Generate an image with Jimeng: Shiba Inu under cherry blossom trees"
→ Use the text_to_image tool with parameter prompt="Shiba Inu under cherry blossom trees"
Image Synthesis:
User: "Help me synthesize these two images, with the style leaning towards the first one"
→ Use the image_composition tool and provide image URLs
Text-to-Video:
User: "Create a 5-second video: Scene of a pony crossing a river"
→ Use the text_to_video tool, set the prompt and duration
Image-to-Video:
User: "Add animation effects to this image"
→ Use the image_to_video tool and provide the image URL
Core Capabilities
1. Text-to-Image
Generate images from text descriptions using the Jimeng 4.5 engine.
Parameters:
- (required): Text description of the desired image
- (optional): Model version (default: jimeng-4.5)
- (optional): Image aspect ratio ("1:1", "4:3", "3:4", "16:9", "9:16")
- (optional): Resolution preset ("1k", "2k", "4k", default: 2k)
- (optional): Elements to avoid in the generated image
Common Aspect Ratios:
- 16:9 → Landscape/widescreen (video covers, banners)
- 1:1 → Square (avatars, social media)
- 9:16 → Portrait/mobile screen (short video covers)
- 4:3 → Standard landscape (blog illustrations)
- 3:4 → Standard portrait (portrait photos)
Usage Example:
python
# User request: "Generate an image: Beach at sunset with coconut trees"
{
"model": "jimeng-4.5",
"prompt": "Beach at sunset with coconut trees",
"ratio": "16:9",
"resolution": "2k"
}
Return Result:
Returns an array containing multiple image URLs, which can be displayed or downloaded.
Tips:
- Higher resolution (4k) is suitable for print and high-quality displays
- Lower resolution (1k) is suitable for quick previews
- Use descriptive prompts for better results
- Specify art style, lighting, and atmosphere to enhance control
2. Image Synthesis
Merge and blend multiple images through intelligent fusion.
Parameters:
- (required): Description of how to synthesize the images
- (required): Array of 2-5 image URLs to synthesize
- (optional): Model version (default: jimeng-4.5)
- (optional): Output image aspect ratio ("1:1", "4:3", "3:4", "16:9", "9:16")
- (optional): Resolution preset ("1k", "2k", "4k", default: 2k)
Usage Example:
python
# User request: "Synthesize these two images, retaining the style of the first one"
{
"model": "jimeng-4.5",
"prompt": "Seamlessly blend the two images while maintaining the artistic style of the first image",
"images": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"ratio": "4:3",
"resolution": "2k"
}
Usage Scenarios:
- Blend portraits with backgrounds
- Style transfer between images
- Create artistic composite works
- Merge elements from multiple photos
Tips:
- Provide clear synthesis instructions in the prompt
- Images should have compatible resolutions
- Describe the desired blending style (seamless, artistic, realistic)
3. Text-to-Video
Create short videos from text descriptions.
Parameters:
- (required): Text description of the video scene
- (optional): Model version (default: jimeng-video-3.0)
- (optional): Video aspect ratio ("16:9", "9:16", "4:3", "3:4", "1:1")
- (optional): Preset resolution ("480p", "720p", "1080p")
Resolution Presets:
- "480p" → Quick preview
- "720p" → Balanced quality/speed (recommended)
- "1080p" → High quality
Usage Example:
python
# User request: "Generate a 5-second video: Kitten fishing"
{
"model": "jimeng-video-3.0",
"prompt": "An orange kitten sitting by the river, holding a fishing rod and focusing on fishing, sunny weather",
"ratio": "16:9",
"resolution": "720p"
}
Video Features:
- Duration: Typically 3-5 seconds
- Format: MP4
- Generation Time: 30-60 seconds
- Frame Rate: 24-30 fps
Tips:
- Include scene details, actions, and atmosphere
- Keep prompts focused on a single clear action
- Specify time of day, weather, or mood for better results
- Start with 720p to balance quality and speed
4. Image-to-Video Animation
Add motion and animation effects to static images.
Parameters:
- (required): Description of the desired animation effect
- (required): Array of image URLs to animate
- (optional): Model version (default: jimeng-video-3.0)
- (optional): Video aspect ratio ("16:9", "9:16", "4:3", "3:4", "1:1")
- (optional): Preset resolution ("480p", "720p", "1080p")
Usage Example:
python
# User request: "Animate this photo with gentle camera zoom"
{
"model": "jimeng-video-3.0",
"prompt": "Add gentle motion effects and natural camera zoom to create a cinematic feel",
"file_paths": ["https://example.com/photo.jpg"],
"ratio": "16:9",
"resolution": "720p"
}
Animation Types:
- Character motion
- Camera movements
- Scene transitions
- Environmental effects (wind, rain, etc.)
Tips:
- Describe the desired type of motion
- Consider image content when selecting effects
- Portrait photos suit subtle movements
- Landscape photos suit pan/zoom effects
Workflow Guide
Decision Tree
Receive User Request
│
├─ Contains "generate image" or "create image"?
│ └─ Yes → Use text_to_image
│
├─ Contains "synthesize" or "merge/blend images"?
│ └─ Yes → Use image_composition
│
├─ Contains "generate video" or "create video"?
│ └─ Yes → Use text_to_video
│
└─ Contains "animate" or "animate image"?
└─ Yes → Use image_to_video
Parameter Extraction
When processing user requests:
- Extract Prompt: User's description of the desired content
- Identify Aspect Ratio: Extract size preferences (landscape/portrait/square) corresponding to the ratio parameter
- Parse Resolution Requirements: Look for quality requirements corresponding to the resolution parameter
- Collect Image URLs: For synthesis and animation tasks
Error Handling
If tool execution fails:
- Check Server Status: Verify if jimeng-mcp-server is running
- Validate API Key: Ensure JIMENG_API_KEY is configured
- Check Parameters: Confirm all required fields are provided
- Check Image URLs: Verify URLs for synthesis/animation are accessible
- Report Errors Clearly: Explain the problem and suggest solutions
Common Errors:
- : Set JIMENG_API_KEY in the environment
- : Start the jimeng-free-api-all Docker container
- : Ensure the URL is publicly accessible
- : Large videos may take 60+ seconds
Advanced Usage
Combine Multiple Tools
For complex creative tasks, tools can be used in a chain:
Example: Create Animated Artwork
- Use to generate a base image
- Use to add animation to the result
Example: Synthesize and Optimize
- Use to synthesize images
- Generate variants with adjusted prompts
Optimization Tips
Speed Up Generation:
- Use lower resolution (720p instead of 1080p, or 1k instead of 2k)
- Keep prompts concise yet descriptive
Improve Quality:
- Use detailed, specific prompts
- Select appropriate ratio based on the scene
- Use higher resolution (2k or 4k)
- Specify art style and techniques
- Include lighting and atmosphere descriptions
Batch Processing
When users request multiple generations:
- Process requests sequentially (one at a time)
- Provide progress updates for each item
- Collect all results before final response
- Consider resource limits (API quotas)
Troubleshooting
Server Connection Issues
Symptom: Tool returns connection errors
Solutions:
- Check if the jimeng-free-api-all Docker container is running:
- Verify server accessibility:
bash
curl http://127.0.0.1:8001/health
- Restart the Docker container if needed
API Key Issues
Symptom: "Invalid API key" or authentication errors
Solutions:
- Verify JIMENG_API_KEY in the .env file
- Obtain a new API key from Jimeng website cookies (sessionid value)
- Ensure the key format is correct (no extra spaces or quotes)
Generation Quality Issues
Symptom: Poor quality or unexpected results
Solutions:
- Optimize prompts with more specific details
- Adjust the parameter to select an appropriate aspect ratio
- Try different settings
- Add to exclude unwanted elements
Timeout Errors
Symptom: Generation takes too long or times out
Solutions:
- Video generation typically takes 30-60 seconds - please be patient
- If timeouts persist, try lower resolution
- Check server resource usage
- Verify network connection to Jimeng API
Resources
references/
- : Detailed installation and configuration instructions
- : Complete API documentation for all tools
Project Links
Best Practices
- Always Verify Server Status Before Attempting Generation
- Use Appropriate Resolution Based on Use Case and Speed Requirements (ratio controls aspect ratio, resolution controls clarity)
- Provide Detailed Prompts for Better Generation Quality
- Handle Errors Gracefully and Provide Clear User Feedback
- Consider Rate Limits When Processing Multiple Requests
- Test with Simple Prompts Before Complex Synthesis
- Cache Frequently Used Parameters such as preferred ratio and resolution
Limitations
- Free Tier Limits: Official Jimeng API allows 66 credits per day
- Video Duration: Typically limited to 3-10 seconds
- Generation Time: Videos may take 30-60 seconds to generate
- Image Synthesis: Best results with 2-3 images, maximum 5 images supported
- Server Dependency: Requires jimeng-free-api-all backend to run
- Network Requirements: Internet access required to call Jimeng API