cinematic-cutscene

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

cinematic-cutscene — Locked-Look Story Beat in 5-10 Seconds

cinematic-cutscene — 5-10秒锁定视觉风格的故事节点过场动画

A cutscene is a non-interactive video clip the game plays back at a fixed moment — opening, story beat, character intro, ending. Quality is dominated by two things: continuity (the character on screen looks like the character in the game) and shot length (every clip past 10s drifts in identity, hands, and physics). This skill enforces both — generate a reference image first to lock the look, then image-to-video each shot, then chain shots in sequence rather than asking the model for one long take.
If the user wants 5 seconds of marketing footage, that's
video/trailer-shot
. If they want a 3-second seamless background loop, that's
video/animated-loop
. This skill is for narrative beats with a defined start, middle, and end — usually with dialogue or VO.
过场动画是游戏在固定时间播放的非交互式视频片段——如开场、故事节点、角色介绍、结局。其质量主要取决于两点:一致性(屏幕上的角色与游戏中的角色外观匹配)和镜头时长(超过10秒的片段会在角色身份、手部细节、物理效果上出现偏差)。本技能严格遵循这两点要求——先生成参考图像锁定视觉风格,再将图像转为单个镜头视频,最后将多个镜头串联,而非直接让模型生成长镜头。
如果用户需要5秒的营销素材,请使用
video/trailer-shot
;如果需要3秒的无缝背景循环,请使用
video/animated-loop
。本技能适用于有明确开头、发展和结尾的叙事节点——通常带有对话或旁白。

When to use

适用场景

  • "Generate the opening cinematic for the game."
  • "I need a cutscene where the witch turns to the player and warns them about the curse."
  • "Make a 10-second character intro for the boss."
  • "Add an ending sequence after the player defeats the dragon."
  • "Cinematic where the village burns and the protagonist runs."
  • The user has a defined moment with story content — not a vibe loop, not a marketing splash.
  • "为游戏生成开场动画。"
  • "我需要一个过场动画,其中女巫转向玩家并警告他们关于诅咒的事。"
  • "为BOSS制作10秒的角色介绍动画。"
  • "在玩家击败恶龙后添加结局序列。"
  • "村庄被烧毁、主角奔跑的动画场景。"
  • 用户有明确的故事内容节点——而非氛围循环或营销宣传画面。

When NOT to use

不适用场景

  • "5-second slow-mo combat shot for the trailer" —
    video/trailer-shot
    .
  • "Looping splash screen background" —
    video/animated-loop
    .
  • "Animated logo backdrop on the title screen" —
    video/animated-loop
    .
  • Real-time in-engine cinematic with the actual gameplay characters and camera — that's a Godot AnimationPlayer / Timeline job, not generated video. Route to
    summer:scene-composition
    .
  • Voice-only narration over a static image — generate the dialogue with
    summer_generate_audio
    and use a static
    TextureRect
    , no video needed.
  • "为预告片制作5秒慢动作战斗镜头"——使用
    video/trailer-shot
  • "循环启动画面背景"——使用
    video/animated-loop
  • "标题界面的动画标志背景"——使用
    video/animated-loop
  • 使用游戏实际角色和相机的实时引擎内动画——这是Godot AnimationPlayer / Timeline的工作,而非生成式视频。请转至
    summer:scene-composition
  • 静态图片搭配纯语音旁白——使用
    summer_generate_audio
    生成对话,再搭配静态
    TextureRect
    即可,无需视频。

Steps

操作步骤

1. Read the soul file and any prior cutscenes

1. 读取核心配置文件及过往过场动画

Read .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })
If a reference image of the character already exists from a prior
concept-art
or
character-portrait
run, reuse it as the
imageUrl
— that single decision is the difference between the character looking like themselves and looking like a stranger.
Read .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })
如果之前通过
concept-art
character-portrait
生成过角色参考图像,请复用该图像作为
imageUrl
——这一决定直接影响角色外观是否一致,避免出现“陌生人”效果。

2. Plan the shot list

2. 规划镜头列表

Cutscenes longer than 10 seconds are multiple shots, not one take. Ask the user to break it down:
Want this as one 10s shot, or three shots (e.g. wide → close on face → reaction)? Each shot is 5-10s. I'll generate a reference image once, then image-to-video each shot off of it.
Default if the user is vague: 3 shots, 5s each, ~$1.50 total on kling. Confirm before spending.
超过10秒的过场动画需拆分为多个镜头,而非单个长镜头。请让用户拆分内容:
您希望做成一个10秒的镜头,还是三个镜头(例如:全景→面部特写→反应镜头)?每个镜头时长为5-10秒。我会先生成一张参考图像,再基于它生成每个镜头的视频。
若用户表述模糊,默认方案为:3个镜头,每个5秒,在kling平台上总费用约1.5美元。执行前请确认用户同意。

3. Lock the look — generate a reference image first

3. 锁定视觉风格——先生成参考图像

For any character or hero scene, generate a reference still with
summer_generate_image
before generating video. This still drives
imageUrl
on every subsequent shot, so the character is consistent across cuts.
summer_generate_image(
  prompt="<subject>, <setting>, cinematic lighting, film still, <art style from GameSoul.md>",
  model="nano-banana-2",
  options={ image_size: "landscape_16_9" }
)
Show the user the still and confirm it's the right look before video-ing it. Regenerating a $0.05 still beats regenerating a $0.50 video.
对于任何包含角色或主角的场景,需先使用
summer_generate_image
生成参考静态图,再生成视频。这张静态图将作为后续所有镜头的
imageUrl
,确保角色在不同镜头中保持一致。
summer_generate_image(
  prompt="<主体>, <场景>, 电影级灯光, 电影静帧, <来自GameSoul.md的艺术风格>",
  model="nano-banana-2",
  options={ image_size: "landscape_16_9" }
)
向用户展示静态图并确认视觉风格符合要求后,再生成视频。重新生成一张0.05美元的静态图,远比重新生成0.5美元的视频更划算。

4. Pick the model

4. 选择模型

ModelCostSpeedWhen
ltx
~$0.10~30sIteration, blocking shots, B-roll, throwaway tests
kling
~$0.502-4 minHero shots, character cutscenes, anything the player will sit and watch
kling-turbo
~$0.301-2 minSame as kling when iteration speed matters more than the last 10% of quality
veo3
~$1.003-5 minPitch decks, premium dialogue scenes with synced lip motion, short-form ad
minimax
~$0.402-3 minStylized / anime-leaning content; better at non-photoreal looks than kling
Default policy:
ltx
first
to validate the prompt and shot framing. If it lands the composition but quality is rough, escalate to
kling
. Only reach for
veo3
if dialogue lip-sync matters and the user has approved the cost.
模型成本速度适用场景
ltx
~0.10美元~30秒迭代测试、分镜镜头、B-roll、临时测试片段
kling
~0.50美元2-4分钟主角镜头、角色过场动画、玩家会认真观看的内容
kling-turbo
~0.30美元1-2分钟当迭代速度比最后10%的画质更重要时,替代kling使用
veo3
~1.00美元3-5分钟演示文稿、带唇形同步的高端对话场景、短视频广告
minimax
~0.40美元2-3分钟风格化/偏向动漫的内容;在非写实风格上比kling表现更好
默认策略:**优先使用
ltx
**验证提示词和镜头构图。若构图符合要求但画质粗糙,再升级为
kling
。仅当对话唇形同步至关重要且用户同意成本时,才使用
veo3

5. Generate each shot

5. 生成单个镜头

summer_generate_video(
  prompt="<subject does <action>, <camera move>, <lighting>, cinematic, 16mm film grain>",
  model="kling",
  imageUrl="<reference image fileUrl from step 3>",
  duration=5,
  aspectRatio="16:9"
)
Returns
{ asset: { fileUrl } }
. Show the user the URL and ask:
Shot 1 of 3 done. Land or regenerate? If land, I'll move to shot 2.
summer_generate_video(
  prompt="<主体执行<动作>, <相机运动>, <灯光>, 电影级, 16mm胶片颗粒感>",
  model="kling",
  imageUrl="<步骤3中参考图像的fileUrl>",
  duration=5,
  aspectRatio="16:9"
)
返回结果为
{ asset: { fileUrl } }
。向用户展示URL并询问:
3个镜头中的第1个已完成。是否保留或重新生成?若保留,我将继续生成第2个镜头。

6. Generate dialogue audio (if the scene has dialogue)

6. 生成对话音频(若场景包含对话)

Cutscene dialogue is TTS, not in the video model. The video model can render mouth motion that looks like talking, but the audio comes from
summer_generate_audio
. Generate it separately and the editor (Godot's AnimationPlayer or your
cinematics/
controller scene) syncs them.
summer_generate_audio(
  capability="text_to_speech",
  text="They'll come for you at dawn. Run while you can.",
  voiceId="<from audio bible — see audio/voice-line>"
)
Lip-sync caveat: if the video shows the character clearly mouthing words and the audio is a different cadence, viewers notice. Either (a) keep the camera off the character's face during dialogue, (b) accept the asynchrony for a stylized look, or (c) use
veo3
and prompt the dialogue text directly into the video prompt for synced motion.
过场动画的对话使用TTS生成,而非视频模型自带功能。视频模型可以渲染类似说话的嘴部动作,但音频需通过
summer_generate_audio
单独生成,再由编辑器(Godot的AnimationPlayer或您的
cinematics/
控制器场景)同步。
summer_generate_audio(
  capability="text_to_speech",
  text="They'll come for you at dawn. Run while you can.",
  voiceId="<来自语音规范——参考audio/voice-line>"
)
唇形同步注意事项:如果视频中角色清晰地做出嘴部动作,但音频节奏不符,观众会注意到。您可以选择:(a) 对话时将镜头移开角色面部;(b) 接受风格化的不同步效果;(c) 使用
veo3
并将对话文本直接加入视频提示词,实现同步动作。

7. Import and wire as a VideoStreamPlayer

7. 导入并配置为VideoStreamPlayer

summer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")
Build a controller scene at
cinematics/Intro.tscn
:
summer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade")   # black, alpha 1.0 → 0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)
Attach a
Tween
script that fades the
ColorRect
from black to clear over 0.5s on
_ready
, then back to black when
Video.finished
fires, then
queue_free()
s the scene. For multi-shot cutscenes, queue the next
VideoStreamPlayer
in the
finished
signal handler.
For dialogue, add an
AudioStreamPlayer
sibling with the TTS clip and call
play()
in
_ready
after a small delay matching where the line lands in the video.
summer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")
cinematics/Intro.tscn
构建控制器场景:
summer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade")   # 黑色,透明度从1.0变为0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)
附加
Tween
脚本,在
_ready
时将
ColorRect
从黑色淡化为透明(时长0.5秒),在
Video.finished
触发时再淡回黑色,然后调用
queue_free()
销毁场景。对于多镜头过场动画,在
finished
信号处理器中触发下一个
VideoStreamPlayer
若包含对话,添加
AudioStreamPlayer
作为同级节点,加载TTS音频片段,并在
_ready
时延迟一小段时间(匹配视频中对话出现的时机)调用
play()

Reference card — prompts that work

参考卡片——有效提示词模板

Pattern:
<subject> + <action> + <camera move> + <lighting> + <stylistic anchor>
. Keep prompts under 50 words; over-prompting confuses the model. Always pair with
imageUrl
to lock the character.
GoalModelPromptCostDuration
Opening establishing shot
kling
wide shot of a fog-shrouded medieval village at dawn, slow dolly-in toward the church spire, warm low sun, cinematic, anamorphic lens flare
$0.505s
Character intro (hero turns to camera)
kling
young witch with raven on shoulder turns slowly toward camera, candlelit interior, shallow depth of field, cinematic film still
$0.505s
Dialogue close-up (no synced lips)
kling
close-up of a grizzled knight, eyes downcast then looking up, firelight on his face, dust motes, 16mm film grain
$0.505s
Dialogue close-up (synced lips)
veo3
close-up of the witch saying "they will come for you at dawn", candlelight, cinematic, shallow DOF
$1.005s
Action beat (village burns)
kling
medium wide of a thatched village engulfed in flames at night, embers rising, silhouettes running through smoke, hand-held camera, cinematic
$0.505s
Ending shot (hero walks away)
kling
low wide shot of a cloaked figure walking away across a blasted plain at sunset, slow truck-back, golden hour, dust on the wind
$0.505s
Throwaway iteration / blocking
ltx
same prompt as above$0.105s
Anime / stylized cutscene
minimax
anime-style young swordsman draws blade and steps forward, sakura petals, dramatic wind, Studio Ghibli soft painterly
$0.405s
模板:
<主体> + <动作> + <相机运动> + <灯光> + <风格锚点>
。提示词控制在50词以内;过度提示会让模型困惑。始终搭配
imageUrl
锁定角色外观。
目标模型提示词成本时长
开场全景镜头
kling
黎明时分被雾气笼罩的中世纪村庄全景,缓慢推镜头朝向教堂尖顶,温暖的低角度阳光,电影级,变形镜头光晕
0.50美元5秒
角色介绍(主角转向镜头)
kling
肩上站着乌鸦的年轻女巫缓慢转向镜头,烛光室内环境,浅景深,电影级静帧
0.50美元5秒
对话特写(无唇形同步)
kling
满脸胡须的骑士特写,眼睛低垂后抬起,火光映在脸上,尘埃颗粒,16mm胶片颗粒感
0.50美元5秒
对话特写(唇形同步)
veo3
女巫说“they will come for you at dawn”的特写,烛光,电影级,浅景深
1.00美元5秒
动作场景(村庄燃烧)
kling
夜晚被火焰吞噬的茅草屋村庄中景,灰烬升起,烟雾中奔跑的剪影,手持相机拍摄,电影级
0.50美元5秒
结局镜头(主角离去)
kling
日落时分,斗篷人物在荒芜平原上离去的低角度全景镜头,缓慢拉镜头,黄金时刻,风中尘土
0.50美元5秒
临时迭代/分镜测试
ltx
与上述提示词相同0.10美元5秒
动漫/风格化过场动画
minimax
动漫风格年轻剑士拔剑向前,樱花花瓣,戏剧性狂风,吉卜力工作室柔和绘画风格
0.40美元5秒

Bad prompts and why

无效提示词及原因

BadWhy it fails
epic cutscene of the hero defeating the boss
No subject, no shot, no camera. Returns a generic action montage.
the hero walks into the throne room, kneels before the king, draws his sword and says I refuse to serve, then walks out
Five events in one prompt. Model picks one (badly) or tries all and renders mush. Split into three shots.
cinematic 4k masterpiece trending on artstation
Adjective slop. The model already knows "cinematic"; the rest is dead weight.
make the character look exactly like in the game
Words can't anchor identity. Use
imageUrl
.
the camera does a complex handheld weaving move through the crowd
Video models render simple camera moves (pan, tilt, dolly, truck) reliably and complex moves badly. Pick one verb.
无效提示词失败原因
epic cutscene of the hero defeating the boss
没有明确主体、镜头类型、相机运动。返回的是通用动作蒙太奇。
the hero walks into the throne room, kneels before the king, draws his sword and says I refuse to serve, then walks out
一个提示词包含五个事件。模型会随机选择一个(效果很差),或尝试全部呈现导致画面混乱。需拆分为三个镜头。
cinematic 4k masterpiece trending on artstation
冗余形容词堆砌。模型已经理解“cinematic”,其余内容毫无意义。
make the character look exactly like in the game
文字无法锁定角色身份。请使用
imageUrl
the camera does a complex handheld weaving move through the crowd
视频模型能可靠渲染简单相机运动(摇、移、推、拉),复杂运动效果很差。请只使用一个动词描述相机动作。

Anti-patterns

反模式

  • Generating a 10s shot when you should chain two 5s shots. Identity, hand consistency, and physics drift every additional second past ~6s. Two 5s shots cut together look better and cost the same.
  • Skipping the reference image. Generating four character cutscenes from text prompts alone produces four different-looking people. Always lock the look with
    summer_generate_image
    first, then
    imageUrl
    every subsequent video call.
  • Asking the video model to render dialogue without
    veo3
    .
    kling
    and
    ltx
    will animate mouths but the motion does not match any audio. Use
    veo3
    if dialogue is on-camera, or keep the camera off the speaker.
  • Using
    kling
    for blocking iterations.
    Burn $0.10 on
    ltx
    to validate framing and prompt; only spend $0.50 once the composition lands.
  • Putting the cutscene on a
    VideoStreamPlayer
    without a fade.
    Cuts straight from gameplay to video are jarring. Always wrap in a
    ColorRect
    fade in/out.
  • Forgetting to import the file.
    summer_generate_video
    returns a
    fileUrl
    on Cloudinary; until you call
    summer_import_from_url
    , it isn't in
    res://
    and the scene can't reference it.
  • 生成10秒长镜头而非两个5秒镜头串联:超过约6秒后,角色身份、手部一致性、物理效果会逐渐偏差。两个5秒镜头剪辑在一起效果更好,成本相同。
  • 跳过参考图像:仅通过文本提示生成四个角色过场动画,会得到四个外观完全不同的角色。务必先使用
    summer_generate_image
    锁定视觉风格,再在后续所有视频调用中使用
    imageUrl
  • 不使用
    veo3
    却让视频模型渲染对话
    kling
    ltx
    会制作嘴部动画,但动作与任何音频都不匹配。若对话出现在镜头中,请使用
    veo3
    ,或让镜头避开说话者。
  • 使用
    kling
    进行分镜迭代测试
    :先用
    ltx
    花费0.10美元验证构图和提示词;仅当构图符合要求后,再花费0.50美元使用
    kling
  • 将过场动画直接放在
    VideoStreamPlayer
    上而不添加淡入淡出
    :从游戏直接切到视频会非常突兀。务必用
    ColorRect
    包裹,添加淡入淡出效果。
  • 忘记导入文件
    summer_generate_video
    返回Cloudinary上的
    fileUrl
    ;在调用
    summer_import_from_url
    之前,文件不会存入
    res://
    ,场景无法引用。

Edge cases

边缘情况

  • Vertical cutscene for a mobile target. Set
    aspectRatio="9:16"
    . Reference image must also be 9:16 — generate it with
    image_size: "portrait_16_9"
    or the framing will crop wrong on the video.
  • Cutscene must match an in-engine character precisely. No video model will hit pixel-perfect identity. Either (a) accept the artistic license, (b) render the cutscene in-engine with
    AnimationPlayer
    instead of generating it, or (c) generate a reference image from a screenshot of the in-engine character (image-to-image first, then image-to-video).
  • Dialogue is too long for a 5s shot. Either trim the line, or split: 5s of speaker, 5s of listener reaction (cheaper because the listener doesn't need synced lips).
  • Cutscene needs to play and the player isn't looking. Pause the gameplay timer, push a
    Control
    overlay with the
    VideoStreamPlayer
    , restore on finish.
  • Multi-shot continuity. Use the same reference image as
    imageUrl
    across all shots. If lighting differs by shot, generate one reference per lighting setup, not one per shot.
  • 面向移动平台的竖屏过场动画:设置
    aspectRatio="9:16"
    。参考图像也必须是9:16比例——使用
    image_size: "portrait_16_9"
    生成,否则视频会出现错误裁剪。
  • 过场动画必须与引擎内角色完全匹配:没有视频模型能实现像素级一致。您可以选择:(a) 接受艺术加工;(b) 使用
    AnimationPlayer
    在引擎内渲染过场动画,而非生成式视频;(c) 从引擎内角色截图生成参考图像(先图像转图像,再图像转视频)。
  • 对话时长超过5秒镜头:要么精简台词,要么拆分镜头:5秒说话者镜头,5秒倾听者反应镜头(成本更低,因为倾听者无需唇形同步)。
  • 过场动画播放时玩家不在观看:暂停游戏计时器,推送带有
    VideoStreamPlayer
    Control
    覆盖层,播放完成后恢复。
  • 多镜头一致性:所有镜头使用同一张参考图像作为
    imageUrl
    。若不同镜头灯光不同,为每种灯光设置生成一张参考图像,而非每个镜头一张。

Fallback (no MCP)

备用方案(无MCP)

If the Studio MCP server isn't running, the user can do all of this through the Studio web dashboard at the Summer Engine cloud console:
  1. Generate a reference image in the Image tab.
  2. Open the Video tab, paste the reference URL into the image-to-video field.
  3. Set model, duration, aspect ratio, and prompt as listed above.
  4. Download the
    .mp4
    and drop it into
    assets/video/cinematics/
    in the project, then re-import via the Godot editor.
Print the exact prompt + model + duration + aspect ratio so the user can paste it into the dashboard verbatim.
如果Studio MCP服务器未运行,用户可以通过Summer Engine云控制台的Studio网页仪表盘完成所有操作:
  1. 在Image标签页生成参考图像。
  2. 打开Video标签页,将参考图像URL粘贴到图像转视频字段。
  3. 按照上述要求设置模型、时长、宽高比和提示词。
  4. 下载
    .mp4
    文件,放入项目的
    assets/video/cinematics/
    目录,再通过Godot编辑器重新导入。
打印精确的提示词+模型+时长+宽高比,方便用户直接粘贴到仪表盘。

Handoff

交接说明

Once the cutscene is generated and wired:
Cutscene
intro
wired at
cinematics/Intro.tscn
with fade in/out and
Video.finished
chained to scene change. Next:
  • Add the dialogue track with
    audio/voice-line
    if you haven't yet.
  • Score the moment with
    audio/music-track
    — a cutscene without music feels like a placeholder.
  • For the boss reveal cutscene, run this skill again with the same reference image to keep the witch identity stable.
  • If you want a 5s "money shot" version for marketing, run
    video/trailer-shot
    against the same reference.
过场动画生成并配置完成后:
过场动画
intro
已配置在
cinematics/Intro.tscn
,带有淡入淡出效果,
Video.finished
已关联场景切换。下一步:
  • 若尚未添加对话音轨,请使用
    audio/voice-line
    完成。
  • 使用
    audio/music-track
    为场景添加配乐——没有配乐的过场动画会显得像占位符。
  • 若制作BOSS登场过场动画,请再次运行本技能并使用相同参考图像,确保女巫角色外观一致。
  • 若需要用于营销的5秒“高光镜头”版本,请针对相同参考图像运行
    video/trailer-shot

See also

相关链接

  • video/trailer-shot
    — marketing footage, 5-10s, maximum visual punch.
  • video/animated-loop
    — seamless looping background clips.
  • audio/voice-line
    — TTS dialogue used inside cutscenes.
  • audio/music-track
    — score the cutscene.
  • 2d-assets/concept-art
    — generate the reference image axis if no character reference exists yet.
  • 2d-assets/character-portrait
    — produce a high-fidelity locked character portrait for use as
    imageUrl
    .
  • _shared/mcp-tools-reference.md
    summer_generate_video
    parameter schema and error codes.
  • video/trailer-shot
    — 营销素材,5-10秒,视觉冲击力拉满。
  • video/animated-loop
    — 无缝循环背景片段。
  • audio/voice-line
    — 过场动画中使用的TTS对话。
  • audio/music-track
    — 为过场动画添加配乐。
  • 2d-assets/concept-art
    — 若没有角色参考图像,生成参考图像基准。
  • 2d-assets/character-portrait
    — 生成高保真锁定角色肖像,用作
    imageUrl
  • _shared/mcp-tools-reference.md
    summer_generate_video
    参数 schema 和错误代码。