muapi-ugc-video-factory
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUGC Video Factory
UGC视频工厂
Turn a person photo + product photo (+ optional script & environment) into a vertical 9:16 UGC-style video ad with native dialogue audio.
A three-stage pipeline:
- GPT writes a director-grade ultra-realistic lifestyle photography prompt from your inputs.
- Nano-Banana Pro Edit fuses the person + product into a single hero photo (1K, 9:16).
- Seedance 2.0 VIP Image-to-Video animates the hero photo into a 10s vertical UGC clip with synced spoken audio.
将人物照片+产品照片(+可选脚本与场景)转换为带有原生对话音频的竖版9:16比例UGC风格视频广告。
分为三个阶段的流程:
- GPT根据你的输入生成导演级超写实生活化摄影提示词。
- Nano-Banana Pro Edit将人物与产品融合为一张主视觉照片(1K分辨率,9:16比例)。
- Seedance 2.0 VIP图生视频工具将主视觉照片制作成10秒竖版UGC短视频,并同步添加语音音频。
Inputs
输入参数
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| image_url | yes | — | Photo of the person who will appear in the ad (face + upper body works best). |
| image_url | yes | — | Clear photo of the product (preferably on neutral background, logo/text legible). |
| text | no | | The exact line the on-screen person will say (kept short — 1–2 sentences fit 10s comfortably). |
| text | no | | Scene / context where the person is using the product (e.g. "bathroom mirror, morning routine", "coffee shop window seat"). |
If or is missing, ask the user to upload them () or offer to generate placeholders before continuing.
personproductmuapi upload file <path>| 名称 | 类型 | 是否必填 | 默认值 | 说明 |
|---|---|---|---|---|
| image_url | 是 | — | 广告中出现的人物照片(脸部+上半身效果最佳)。 |
| image_url | 是 | — | 清晰的产品照片(最好置于中性背景上,标识/文字清晰可辨)。 |
| 文本 | 否 | | 屏幕中人物将说的台词(需简短——1-2句话适合10秒时长)。 |
| 文本 | 否 | | 使用产品的场景/背景(例如:"浴室镜子前,晨间日常"、"咖啡店靠窗座位")。 |
如果缺少或,请要求用户上传(),或提议生成占位图后再继续。
personproductmuapi upload file <path>Steps
操作步骤
Run the three steps sequentially — each step's output feeds the next.
按顺序执行三个步骤——每个步骤的输出作为下一个步骤的输入。
Step 1 — Director Prompt (GPT)
步骤1——导演提示词(GPT)
Use a GPT model ( or whichever chat model is available to the executing agent) with temperature 0 and max ~200 tokens to produce the hero-image prompt.
gpt-5.1System prompt:
You are a helpful assistant.User prompt (substitute , , ):
{{person}}{{product}}{{environment}}Uploaded images are being analyzed. Ultra-realistic lifestyle photography with {{person}} and {{product}} and {{environment}}.
If the product is wearable (e.g., hat, glasses, hooded sweatshirt), the person wears the product naturally.
If the product is carried in the hand (e.g., cream, bottle, thermos), the person holds the product naturally.
The product is clearly visible and is the main focus of the image. The logo or text on the product must be legible.
The person has a natural and modern look with a minimalist style.
The scene is consistent with the context of the product's use: {{environment}}.
Lighting: soft natural daylight.
Background: clean, aesthetic, slightly blurred (shallow depth of field).
Style: high-end commercial lifestyle photography, realistic textures, 4K quality, vertical 9:16 composition, social-media advertising style. The background and environment should be appropriate to the product (e.g. a woman with a serum could be at home). The person's facial details and the product must remain unchanged.Capture the GPT response as .
{{step1_prompt}}使用GPT模型(或执行Agent可用的其他聊天模型),设置temperature为0,最大约200 tokens来生成主视觉图提示词。
gpt-5.1系统提示词:
You are a helpful assistant.用户提示词(替换、、):
{{person}}{{product}}{{environment}}Uploaded images are being analyzed. Ultra-realistic lifestyle photography with {{person}} and {{product}} and {{environment}}.
If the product is wearable (e.g., hat, glasses, hooded sweatshirt), the person wears the product naturally.
If the product is carried in the hand (e.g., cream, bottle, thermos), the person holds the product naturally.
The product is clearly visible and is the main focus of the image. The logo or text on the product must be legible.
The person has a natural and modern look with a minimalist style.
The scene is consistent with the context of the product's use: {{environment}}.
Lighting: soft natural daylight.
Background: clean, aesthetic, slightly blurred (shallow depth of field).
Style: high-end commercial lifestyle photography, realistic textures, 4K quality, vertical 9:16 composition, social-media advertising style. The background and environment should be appropriate to the product (e.g. a woman with a serum could be at home). The person's facial details and the product must remain unchanged.将GPT的响应保存为。
{{step1_prompt}}Step 2 — Hero Image (Nano-Banana Pro Edit)
步骤2——主视觉图(Nano-Banana Pro Edit)
Submit a call against the model:
muapi image editnano-banana-pro-edit- Reference images ():
image_urls— order matters; person first.[ {{person}}, {{product}} ] - Prompt: from Step 1.
{{step1_prompt}} - Aspect ratio:
9:16 - Num images:
1 - Resolution:
1K - Output format:
jpeg
Capture the resulting image URL as . Briefly show it to the user for approval before kicking off the video step.
{{hero_image}}调用接口,使用模型:
muapi image editnano-banana-pro-edit- 参考图片():
image_urls——顺序重要;人物在前。[ {{person}}, {{product}} ] - 提示词:步骤1生成的。
{{step1_prompt}} - 宽高比:
9:16 - 生成图片数量:
1 - 分辨率:
1K - 输出格式:
jpeg
将生成的图片URL保存为。在启动视频制作步骤前,先向用户展示该图片以获得确认。
{{hero_image}}Step 3 — UGC Video (Seedance 2.0 VIP Image-to-Video)
步骤3——UGC视频(Seedance 2.0 VIP图生视频)
Submit a call against (or the variant if the executing agent wants lower latency).
muapi video from-imageseedance-2-vip-image-to-video-fast- Start image: from Step 2.
{{hero_image}} - Aspect ratio:
9:16 - Duration: seconds.
10 - Generate audio: (native dialogue).
true - CFG scale:
0.5 - Negative prompt:
blur, distort, low quality - Prompt (substitute ):
{{script}}
Create a 10-second vertical UGC-style video (9:16).
A person is interacting naturally with their setting and product.
The product is used naturally:
- If wearable → the person is wearing it.
- If handheld → the person is holding or applying it.
The video is a single, uninterrupted shot. No cuts. No color changes. No text on screen.
The person looks directly at the camera with a relaxed and natural expression.
They interact comfortably with the product using their hands (adjusting, holding, pointing).
They say in a natural, conversational tone:
"{{script}}"
Subtle hand gestures while speaking.
End with a small smile or nod.
Style: authentic UGC, handheld phone feel, light natural movement, soft daylight, shallow depth of field, TikTok/Reels aesthetic.Poll the result with and download to the user's outputs directory.
muapi predict wait <request_id>调用接口,使用****(如果执行Agent希望降低延迟,可使用变体)。
muapi video from-imageseedance-2-vip-image-to-video-fast- 起始图片:步骤2生成的。
{{hero_image}} - 宽高比:
9:16 - 时长:秒。
10 - 生成音频:(原生对话)。
true - CFG scale:
0.5 - 负面提示词:
blur, distort, low quality - 提示词(替换):
{{script}}
Create a 10-second vertical UGC-style video (9:16).
A person is interacting naturally with their setting and product.
The product is used naturally:
- If wearable → the person is wearing it.
- If handheld → the person is holding or applying it.
The video is a single, uninterrupted shot. No cuts. No color changes. No text on screen.
The person looks directly at the camera with a relaxed and natural expression.
They interact comfortably with the product using their hands (adjusting, holding, pointing).
They say in a natural, conversational tone:
"{{script}}"
Subtle hand gestures while speaking.
End with a small smile or nod.
Style: authentic UGC, handheld phone feel, light natural movement, soft daylight, shallow depth of field, TikTok/Reels aesthetic.使用轮询结果,并下载到用户的输出目录。
muapi predict wait <request_id>Notes
注意事项
- VIP tier supports 9:16 and durations 4–15s; 10s is the sweet spot for a 1–2 sentence script.
- Keep the script short — Seedance 2.0 will compress longer scripts and clip words.
- Seedance VIP tolerates realistic human faces in references (unlike Chinese tier), making it the right choice for UGC.
- If you want lower latency at the same quality, swap to .
seedance-2-vip-image-to-video-fast - For multi-shot ads, generate several variations in Step 2 and animate each independently — Seedance VIP does not multi-image i2v at 9:16 + audio.
{{hero_image}}
- VIP tier支持9:16比例和4-15秒时长;10秒是适配1-2句台词的最佳时长。
- 台词需简短——Seedance 2.0会压缩较长台词并截断词语。
- Seedance VIP支持参考图中的写实人脸(不同于中国区版本),因此是制作UGC内容的合适选择。
- 如果希望在保持画质的同时降低延迟,可切换为。
seedance-2-vip-image-to-video-fast - 如需制作多镜头广告,可在步骤2中生成多个变体,然后分别制作动画——Seedance VIP不支持9:16比例+音频的多图生视频。
{{hero_image}}
Trigger Keywords
触发关键词
ugc video factoryugc video adperson plus product videotalking product adugc reellifestyle product videovertical ugc videougc video factoryugc video adperson plus product videotalking product adugc reellifestyle product videovertical ugc videoNotes for the Executing Agent
执行Agent注意事项
- This recipe is LLM-orchestrated: read each phase, gather any missing inputs from the user, then call CLI commands. Run
muapifirst ifmuapi auth configureis unset.MUAPI_API_KEY - For local files supplied by the user, upload them first: .
muapi upload file <path> --output-json --jq '.url' - Substitute placeholders with the user's actual inputs before issuing each call.
{{input_name}} - If the CLI does not yet alias
muapiornano-banana-pro-edit, fall back to the raw API:seedance-2-vip-image-to-video, then poll withcurl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'.muapi predict wait <request_id>
- 本流程由LLM编排:阅读每个阶段,向用户收集缺失的输入,然后调用CLI命令。如果
muapi未设置,请先运行MUAPI_API_KEY。muapi auth configure - 对于用户提供的本地文件,需先上传:。
muapi upload file <path> --output-json --jq '.url' - 在发出每个调用前,将占位符替换为用户的实际输入。
{{input_name}} - 如果CLI尚未为
muapi或nano-banana-pro-edit设置别名,可回退到原始API:seedance-2-vip-image-to-video,然后使用curl -X POST https://api.muapi.ai/api/v1/<endpoint> -H "x-api-key: $MUAPI_API_KEY" -H 'content-type: application/json' -d '{...}'轮询结果。muapi predict wait <request_id>