cinematic-cutscene
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesecinematic-cutscene — Locked-Look Story Beat in 5-10 Seconds
cinematic-cutscene — 5-10秒锁定视觉风格的故事节点过场动画
A cutscene is a non-interactive video clip the game plays back at a fixed moment — opening, story beat, character intro, ending. Quality is dominated by two things: continuity (the character on screen looks like the character in the game) and shot length (every clip past 10s drifts in identity, hands, and physics). This skill enforces both — generate a reference image first to lock the look, then image-to-video each shot, then chain shots in sequence rather than asking the model for one long take.
If the user wants 5 seconds of marketing footage, that's . If they want a 3-second seamless background loop, that's . This skill is for narrative beats with a defined start, middle, and end — usually with dialogue or VO.
video/trailer-shotvideo/animated-loop过场动画是游戏在固定时间播放的非交互式视频片段——如开场、故事节点、角色介绍、结局。其质量主要取决于两点:一致性(屏幕上的角色与游戏中的角色外观匹配)和镜头时长(超过10秒的片段会在角色身份、手部细节、物理效果上出现偏差)。本技能严格遵循这两点要求——先生成参考图像锁定视觉风格,再将图像转为单个镜头视频,最后将多个镜头串联,而非直接让模型生成长镜头。
如果用户需要5秒的营销素材,请使用;如果需要3秒的无缝背景循环,请使用。本技能适用于有明确开头、发展和结尾的叙事节点——通常带有对话或旁白。
video/trailer-shotvideo/animated-loopWhen to use
适用场景
- "Generate the opening cinematic for the game."
- "I need a cutscene where the witch turns to the player and warns them about the curse."
- "Make a 10-second character intro for the boss."
- "Add an ending sequence after the player defeats the dragon."
- "Cinematic where the village burns and the protagonist runs."
- The user has a defined moment with story content — not a vibe loop, not a marketing splash.
- "为游戏生成开场动画。"
- "我需要一个过场动画,其中女巫转向玩家并警告他们关于诅咒的事。"
- "为BOSS制作10秒的角色介绍动画。"
- "在玩家击败恶龙后添加结局序列。"
- "村庄被烧毁、主角奔跑的动画场景。"
- 用户有明确的故事内容节点——而非氛围循环或营销宣传画面。
When NOT to use
不适用场景
- "5-second slow-mo combat shot for the trailer" — .
video/trailer-shot - "Looping splash screen background" — .
video/animated-loop - "Animated logo backdrop on the title screen" — .
video/animated-loop - Real-time in-engine cinematic with the actual gameplay characters and camera — that's a Godot AnimationPlayer / Timeline job, not generated video. Route to .
summer:scene-composition - Voice-only narration over a static image — generate the dialogue with and use a static
summer_generate_audio, no video needed.TextureRect
- "为预告片制作5秒慢动作战斗镜头"——使用。
video/trailer-shot - "循环启动画面背景"——使用。
video/animated-loop - "标题界面的动画标志背景"——使用。
video/animated-loop - 使用游戏实际角色和相机的实时引擎内动画——这是Godot AnimationPlayer / Timeline的工作,而非生成式视频。请转至。
summer:scene-composition - 静态图片搭配纯语音旁白——使用生成对话,再搭配静态
summer_generate_audio即可,无需视频。TextureRect
Steps
操作步骤
1. Read the soul file and any prior cutscenes
1. 读取核心配置文件及过往过场动画
Read .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })If a reference image of the character already exists from a prior or run, reuse it as the — that single decision is the difference between the character looking like themselves and looking like a stranger.
concept-artcharacter-portraitimageUrlRead .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })如果之前通过或生成过角色参考图像,请复用该图像作为——这一决定直接影响角色外观是否一致,避免出现“陌生人”效果。
concept-artcharacter-portraitimageUrl2. Plan the shot list
2. 规划镜头列表
Cutscenes longer than 10 seconds are multiple shots, not one take. Ask the user to break it down:
Want this as one 10s shot, or three shots (e.g. wide → close on face → reaction)? Each shot is 5-10s. I'll generate a reference image once, then image-to-video each shot off of it.
Default if the user is vague: 3 shots, 5s each, ~$1.50 total on kling. Confirm before spending.
超过10秒的过场动画需拆分为多个镜头,而非单个长镜头。请让用户拆分内容:
您希望做成一个10秒的镜头,还是三个镜头(例如:全景→面部特写→反应镜头)?每个镜头时长为5-10秒。我会先生成一张参考图像,再基于它生成每个镜头的视频。
若用户表述模糊,默认方案为:3个镜头,每个5秒,在kling平台上总费用约1.5美元。执行前请确认用户同意。
3. Lock the look — generate a reference image first
3. 锁定视觉风格——先生成参考图像
For any character or hero scene, generate a reference still with before generating video. This still drives on every subsequent shot, so the character is consistent across cuts.
summer_generate_imageimageUrlsummer_generate_image(
prompt="<subject>, <setting>, cinematic lighting, film still, <art style from GameSoul.md>",
model="nano-banana-2",
options={ image_size: "landscape_16_9" }
)Show the user the still and confirm it's the right look before video-ing it. Regenerating a $0.05 still beats regenerating a $0.50 video.
对于任何包含角色或主角的场景,需先使用生成参考静态图,再生成视频。这张静态图将作为后续所有镜头的,确保角色在不同镜头中保持一致。
summer_generate_imageimageUrlsummer_generate_image(
prompt="<主体>, <场景>, 电影级灯光, 电影静帧, <来自GameSoul.md的艺术风格>",
model="nano-banana-2",
options={ image_size: "landscape_16_9" }
)向用户展示静态图并确认视觉风格符合要求后,再生成视频。重新生成一张0.05美元的静态图,远比重新生成0.5美元的视频更划算。
4. Pick the model
4. 选择模型
| Model | Cost | Speed | When |
|---|---|---|---|
| ~$0.10 | ~30s | Iteration, blocking shots, B-roll, throwaway tests |
| ~$0.50 | 2-4 min | Hero shots, character cutscenes, anything the player will sit and watch |
| ~$0.30 | 1-2 min | Same as kling when iteration speed matters more than the last 10% of quality |
| ~$1.00 | 3-5 min | Pitch decks, premium dialogue scenes with synced lip motion, short-form ad |
| ~$0.40 | 2-3 min | Stylized / anime-leaning content; better at non-photoreal looks than kling |
Default policy: first to validate the prompt and shot framing. If it lands the composition but quality is rough, escalate to . Only reach for if dialogue lip-sync matters and the user has approved the cost.
ltxklingveo3| 模型 | 成本 | 速度 | 适用场景 |
|---|---|---|---|
| ~0.10美元 | ~30秒 | 迭代测试、分镜镜头、B-roll、临时测试片段 |
| ~0.50美元 | 2-4分钟 | 主角镜头、角色过场动画、玩家会认真观看的内容 |
| ~0.30美元 | 1-2分钟 | 当迭代速度比最后10%的画质更重要时,替代kling使用 |
| ~1.00美元 | 3-5分钟 | 演示文稿、带唇形同步的高端对话场景、短视频广告 |
| ~0.40美元 | 2-3分钟 | 风格化/偏向动漫的内容;在非写实风格上比kling表现更好 |
默认策略:**优先使用**验证提示词和镜头构图。若构图符合要求但画质粗糙,再升级为。仅当对话唇形同步至关重要且用户同意成本时,才使用。
ltxklingveo35. Generate each shot
5. 生成单个镜头
summer_generate_video(
prompt="<subject does <action>, <camera move>, <lighting>, cinematic, 16mm film grain>",
model="kling",
imageUrl="<reference image fileUrl from step 3>",
duration=5,
aspectRatio="16:9"
)Returns . Show the user the URL and ask:
{ asset: { fileUrl } }Shot 1 of 3 done. Land or regenerate? If land, I'll move to shot 2.
summer_generate_video(
prompt="<主体执行<动作>, <相机运动>, <灯光>, 电影级, 16mm胶片颗粒感>",
model="kling",
imageUrl="<步骤3中参考图像的fileUrl>",
duration=5,
aspectRatio="16:9"
)返回结果为。向用户展示URL并询问:
{ asset: { fileUrl } }3个镜头中的第1个已完成。是否保留或重新生成?若保留,我将继续生成第2个镜头。
6. Generate dialogue audio (if the scene has dialogue)
6. 生成对话音频(若场景包含对话)
Cutscene dialogue is TTS, not in the video model. The video model can render mouth motion that looks like talking, but the audio comes from . Generate it separately and the editor (Godot's AnimationPlayer or your controller scene) syncs them.
summer_generate_audiocinematics/summer_generate_audio(
capability="text_to_speech",
text="They'll come for you at dawn. Run while you can.",
voiceId="<from audio bible — see audio/voice-line>"
)Lip-sync caveat: if the video shows the character clearly mouthing words and the audio is a different cadence, viewers notice. Either (a) keep the camera off the character's face during dialogue, (b) accept the asynchrony for a stylized look, or (c) use and prompt the dialogue text directly into the video prompt for synced motion.
veo3过场动画的对话使用TTS生成,而非视频模型自带功能。视频模型可以渲染类似说话的嘴部动作,但音频需通过单独生成,再由编辑器(Godot的AnimationPlayer或您的控制器场景)同步。
summer_generate_audiocinematics/summer_generate_audio(
capability="text_to_speech",
text="They'll come for you at dawn. Run while you can.",
voiceId="<来自语音规范——参考audio/voice-line>"
)唇形同步注意事项:如果视频中角色清晰地做出嘴部动作,但音频节奏不符,观众会注意到。您可以选择:(a) 对话时将镜头移开角色面部;(b) 接受风格化的不同步效果;(c) 使用并将对话文本直接加入视频提示词,实现同步动作。
veo37. Import and wire as a VideoStreamPlayer
7. 导入并配置为VideoStreamPlayer
summer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")Build a controller scene at :
cinematics/Intro.tscnsummer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade") # black, alpha 1.0 → 0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)Attach a script that fades the from black to clear over 0.5s on , then back to black when fires, then s the scene. For multi-shot cutscenes, queue the next in the signal handler.
TweenColorRect_readyVideo.finishedqueue_free()VideoStreamPlayerfinishedFor dialogue, add an sibling with the TTS clip and call in after a small delay matching where the line lands in the video.
AudioStreamPlayerplay()_readysummer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")在构建控制器场景:
cinematics/Intro.tscnsummer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade") # 黑色,透明度从1.0变为0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)附加脚本,在时将从黑色淡化为透明(时长0.5秒),在触发时再淡回黑色,然后调用销毁场景。对于多镜头过场动画,在信号处理器中触发下一个。
Tween_readyColorRectVideo.finishedqueue_free()finishedVideoStreamPlayer若包含对话,添加作为同级节点,加载TTS音频片段,并在时延迟一小段时间(匹配视频中对话出现的时机)调用。
AudioStreamPlayer_readyplay()Reference card — prompts that work
参考卡片——有效提示词模板
Pattern: . Keep prompts under 50 words; over-prompting confuses the model. Always pair with to lock the character.
<subject> + <action> + <camera move> + <lighting> + <stylistic anchor>imageUrl| Goal | Model | Prompt | Cost | Duration |
|---|---|---|---|---|
| Opening establishing shot | | | $0.50 | 5s |
| Character intro (hero turns to camera) | | | $0.50 | 5s |
| Dialogue close-up (no synced lips) | | | $0.50 | 5s |
| Dialogue close-up (synced lips) | | | $1.00 | 5s |
| Action beat (village burns) | | | $0.50 | 5s |
| Ending shot (hero walks away) | | | $0.50 | 5s |
| Throwaway iteration / blocking | | same prompt as above | $0.10 | 5s |
| Anime / stylized cutscene | | | $0.40 | 5s |
模板:。提示词控制在50词以内;过度提示会让模型困惑。始终搭配锁定角色外观。
<主体> + <动作> + <相机运动> + <灯光> + <风格锚点>imageUrl| 目标 | 模型 | 提示词 | 成本 | 时长 |
|---|---|---|---|---|
| 开场全景镜头 | | | 0.50美元 | 5秒 |
| 角色介绍(主角转向镜头) | | | 0.50美元 | 5秒 |
| 对话特写(无唇形同步) | | | 0.50美元 | 5秒 |
| 对话特写(唇形同步) | | | 1.00美元 | 5秒 |
| 动作场景(村庄燃烧) | | | 0.50美元 | 5秒 |
| 结局镜头(主角离去) | | | 0.50美元 | 5秒 |
| 临时迭代/分镜测试 | | 与上述提示词相同 | 0.10美元 | 5秒 |
| 动漫/风格化过场动画 | | | 0.40美元 | 5秒 |
Bad prompts and why
无效提示词及原因
| Bad | Why it fails |
|---|---|
| No subject, no shot, no camera. Returns a generic action montage. |
| Five events in one prompt. Model picks one (badly) or tries all and renders mush. Split into three shots. |
| Adjective slop. The model already knows "cinematic"; the rest is dead weight. |
| Words can't anchor identity. Use |
| Video models render simple camera moves (pan, tilt, dolly, truck) reliably and complex moves badly. Pick one verb. |
| 无效提示词 | 失败原因 |
|---|---|
| 没有明确主体、镜头类型、相机运动。返回的是通用动作蒙太奇。 |
| 一个提示词包含五个事件。模型会随机选择一个(效果很差),或尝试全部呈现导致画面混乱。需拆分为三个镜头。 |
| 冗余形容词堆砌。模型已经理解“cinematic”,其余内容毫无意义。 |
| 文字无法锁定角色身份。请使用 |
| 视频模型能可靠渲染简单相机运动(摇、移、推、拉),复杂运动效果很差。请只使用一个动词描述相机动作。 |
Anti-patterns
反模式
- Generating a 10s shot when you should chain two 5s shots. Identity, hand consistency, and physics drift every additional second past ~6s. Two 5s shots cut together look better and cost the same.
- Skipping the reference image. Generating four character cutscenes from text prompts alone produces four different-looking people. Always lock the look with first, then
summer_generate_imageevery subsequent video call.imageUrl - Asking the video model to render dialogue without .
veo3andklingwill animate mouths but the motion does not match any audio. Useltxif dialogue is on-camera, or keep the camera off the speaker.veo3 - Using for blocking iterations. Burn $0.10 on
klingto validate framing and prompt; only spend $0.50 once the composition lands.ltx - Putting the cutscene on a without a fade. Cuts straight from gameplay to video are jarring. Always wrap in a
VideoStreamPlayerfade in/out.ColorRect - Forgetting to import the file. returns a
summer_generate_videoon Cloudinary; until you callfileUrl, it isn't insummer_import_from_urland the scene can't reference it.res://
- 生成10秒长镜头而非两个5秒镜头串联:超过约6秒后,角色身份、手部一致性、物理效果会逐渐偏差。两个5秒镜头剪辑在一起效果更好,成本相同。
- 跳过参考图像:仅通过文本提示生成四个角色过场动画,会得到四个外观完全不同的角色。务必先使用锁定视觉风格,再在后续所有视频调用中使用
summer_generate_image。imageUrl - 不使用却让视频模型渲染对话:
veo3和kling会制作嘴部动画,但动作与任何音频都不匹配。若对话出现在镜头中,请使用ltx,或让镜头避开说话者。veo3 - 使用进行分镜迭代测试:先用
kling花费0.10美元验证构图和提示词;仅当构图符合要求后,再花费0.50美元使用ltx。kling - 将过场动画直接放在上而不添加淡入淡出:从游戏直接切到视频会非常突兀。务必用
VideoStreamPlayer包裹,添加淡入淡出效果。ColorRect - 忘记导入文件:返回Cloudinary上的
summer_generate_video;在调用fileUrl之前,文件不会存入summer_import_from_url,场景无法引用。res://
Edge cases
边缘情况
- Vertical cutscene for a mobile target. Set . Reference image must also be 9:16 — generate it with
aspectRatio="9:16"or the framing will crop wrong on the video.image_size: "portrait_16_9" - Cutscene must match an in-engine character precisely. No video model will hit pixel-perfect identity. Either (a) accept the artistic license, (b) render the cutscene in-engine with instead of generating it, or (c) generate a reference image from a screenshot of the in-engine character (image-to-image first, then image-to-video).
AnimationPlayer - Dialogue is too long for a 5s shot. Either trim the line, or split: 5s of speaker, 5s of listener reaction (cheaper because the listener doesn't need synced lips).
- Cutscene needs to play and the player isn't looking. Pause the gameplay timer, push a overlay with the
Control, restore on finish.VideoStreamPlayer - Multi-shot continuity. Use the same reference image as across all shots. If lighting differs by shot, generate one reference per lighting setup, not one per shot.
imageUrl
- 面向移动平台的竖屏过场动画:设置。参考图像也必须是9:16比例——使用
aspectRatio="9:16"生成,否则视频会出现错误裁剪。image_size: "portrait_16_9" - 过场动画必须与引擎内角色完全匹配:没有视频模型能实现像素级一致。您可以选择:(a) 接受艺术加工;(b) 使用在引擎内渲染过场动画,而非生成式视频;(c) 从引擎内角色截图生成参考图像(先图像转图像,再图像转视频)。
AnimationPlayer - 对话时长超过5秒镜头:要么精简台词,要么拆分镜头:5秒说话者镜头,5秒倾听者反应镜头(成本更低,因为倾听者无需唇形同步)。
- 过场动画播放时玩家不在观看:暂停游戏计时器,推送带有的
VideoStreamPlayer覆盖层,播放完成后恢复。Control - 多镜头一致性:所有镜头使用同一张参考图像作为。若不同镜头灯光不同,为每种灯光设置生成一张参考图像,而非每个镜头一张。
imageUrl
Fallback (no MCP)
备用方案(无MCP)
If the Studio MCP server isn't running, the user can do all of this through the Studio web dashboard at the Summer Engine cloud console:
- Generate a reference image in the Image tab.
- Open the Video tab, paste the reference URL into the image-to-video field.
- Set model, duration, aspect ratio, and prompt as listed above.
- Download the and drop it into
.mp4in the project, then re-import via the Godot editor.assets/video/cinematics/
Print the exact prompt + model + duration + aspect ratio so the user can paste it into the dashboard verbatim.
如果Studio MCP服务器未运行,用户可以通过Summer Engine云控制台的Studio网页仪表盘完成所有操作:
- 在Image标签页生成参考图像。
- 打开Video标签页,将参考图像URL粘贴到图像转视频字段。
- 按照上述要求设置模型、时长、宽高比和提示词。
- 下载文件,放入项目的
.mp4目录,再通过Godot编辑器重新导入。assets/video/cinematics/
打印精确的提示词+模型+时长+宽高比,方便用户直接粘贴到仪表盘。
Handoff
交接说明
Once the cutscene is generated and wired:
Cutscenewired atintrowith fade in/out andcinematics/Intro.tscnchained to scene change. Next:Video.finished
- Add the dialogue track with
if you haven't yet.audio/voice-line- Score the moment with
— a cutscene without music feels like a placeholder.audio/music-track- For the boss reveal cutscene, run this skill again with the same reference image to keep the witch identity stable.
- If you want a 5s "money shot" version for marketing, run
against the same reference.video/trailer-shot
过场动画生成并配置完成后:
过场动画已配置在intro,带有淡入淡出效果,cinematics/Intro.tscn已关联场景切换。下一步:Video.finished
- 若尚未添加对话音轨,请使用
完成。audio/voice-line- 使用
为场景添加配乐——没有配乐的过场动画会显得像占位符。audio/music-track- 若制作BOSS登场过场动画,请再次运行本技能并使用相同参考图像,确保女巫角色外观一致。
- 若需要用于营销的5秒“高光镜头”版本,请针对相同参考图像运行
。video/trailer-shot
See also
相关链接
- — marketing footage, 5-10s, maximum visual punch.
video/trailer-shot - — seamless looping background clips.
video/animated-loop - — TTS dialogue used inside cutscenes.
audio/voice-line - — score the cutscene.
audio/music-track - — generate the reference image axis if no character reference exists yet.
2d-assets/concept-art - — produce a high-fidelity locked character portrait for use as
2d-assets/character-portrait.imageUrl - —
_shared/mcp-tools-reference.mdparameter schema and error codes.summer_generate_video
- — 营销素材,5-10秒,视觉冲击力拉满。
video/trailer-shot - — 无缝循环背景片段。
video/animated-loop - — 过场动画中使用的TTS对话。
audio/voice-line - — 为过场动画添加配乐。
audio/music-track - — 若没有角色参考图像,生成参考图像基准。
2d-assets/concept-art - — 生成高保真锁定角色肖像,用作
2d-assets/character-portrait。imageUrl - —
_shared/mcp-tools-reference.md参数 schema 和错误代码。summer_generate_video