cinematic-cutscene

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

cinematic-cutscene — Locked-Look Story Beat in 5-10 Seconds

cinematic-cutscene — 5-10秒锁定视觉风格的故事节点过场动画

A cutscene is a non-interactive video clip the game plays back at a fixed moment — opening, story beat, character intro, ending. Quality is dominated by two things: continuity (the character on screen looks like the character in the game) and shot length (every clip past 10s drifts in identity, hands, and physics). This skill enforces both — generate a reference image first to lock the look, then image-to-video each shot, then chain shots in sequence rather than asking the model for one long take.

If the user wants 5 seconds of marketing footage, that's

video/trailer-shot

. If they want a 3-second seamless background loop, that's

video/animated-loop

. This skill is for narrative beats with a defined start, middle, and end — usually with dialogue or VO.

过场动画是游戏在固定时间播放的非交互式视频片段——如开场、故事节点、角色介绍、结局。其质量主要取决于两点：一致性（屏幕上的角色与游戏中的角色外观匹配）和镜头时长（超过10秒的片段会在角色身份、手部细节、物理效果上出现偏差）。本技能严格遵循这两点要求——先生成参考图像锁定视觉风格，再将图像转为单个镜头视频，最后将多个镜头串联，而非直接让模型生成长镜头。

如果用户需要5秒的营销素材，请使用

video/trailer-shot

；如果需要3秒的无缝背景循环，请使用

video/animated-loop

。本技能适用于有明确开头、发展和结尾的叙事节点——通常带有对话或旁白。

When to use

适用场景

"Generate the opening cinematic for the game."
"I need a cutscene where the witch turns to the player and warns them about the curse."
"Make a 10-second character intro for the boss."
"Add an ending sequence after the player defeats the dragon."
"Cinematic where the village burns and the protagonist runs."
The user has a defined moment with story content — not a vibe loop, not a marketing splash.

"为游戏生成开场动画。"
"我需要一个过场动画，其中女巫转向玩家并警告他们关于诅咒的事。"
"为BOSS制作10秒的角色介绍动画。"
"在玩家击败恶龙后添加结局序列。"
"村庄被烧毁、主角奔跑的动画场景。"
用户有明确的故事内容节点——而非氛围循环或营销宣传画面。

When NOT to use

不适用场景

"5-second slow-mo combat shot for the trailer" —
```
video/trailer-shot
```
.
"Looping splash screen background" —
```
video/animated-loop
```
.
"Animated logo backdrop on the title screen" —
```
video/animated-loop
```
.
Real-time in-engine cinematic with the actual gameplay characters and camera — that's a Godot AnimationPlayer / Timeline job, not generated video. Route to
```
summer:scene-composition
```
.
Voice-only narration over a static image — generate the dialogue with
```
summer_generate_audio
```
and use a static
```
TextureRect
```
, no video needed.

"为预告片制作5秒慢动作战斗镜头"——使用
```
video/trailer-shot
```
。
"循环启动画面背景"——使用
```
video/animated-loop
```
。
"标题界面的动画标志背景"——使用
```
video/animated-loop
```
。
使用游戏实际角色和相机的实时引擎内动画——这是Godot AnimationPlayer / Timeline的工作，而非生成式视频。请转至
```
summer:scene-composition
```
。
静态图片搭配纯语音旁白——使用
```
summer_generate_audio
```
生成对话，再搭配静态
```
TextureRect
```
即可，无需视频。

Steps

操作步骤

1. Read the soul file and any prior cutscenes

1. 读取核心配置文件及过往过场动画

Read .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })

If a reference image of the character already exists from a prior

concept-art

character-portrait

run, reuse it as the

imageUrl

— that single decision is the difference between the character looking like themselves and looking like a stranger.

Read .summer/GameSoul.md
summer_search_assets(query="cutscene", filter={ kind: "video" })
summer_search_assets(query="<character name> reference", filter={ kind: "image" })

如果之前通过

concept-art

或

character-portrait

生成过角色参考图像，请复用该图像作为

imageUrl

——这一决定直接影响角色外观是否一致，避免出现“陌生人”效果。

2. Plan the shot list

2. 规划镜头列表

Cutscenes longer than 10 seconds are multiple shots, not one take. Ask the user to break it down:

Want this as one 10s shot, or three shots (e.g. wide → close on face → reaction)? Each shot is 5-10s. I'll generate a reference image once, then image-to-video each shot off of it.

Default if the user is vague: 3 shots, 5s each, ~$1.50 total on kling. Confirm before spending.

超过10秒的过场动画需拆分为多个镜头，而非单个长镜头。请让用户拆分内容：

您希望做成一个10秒的镜头，还是三个镜头（例如：全景→面部特写→反应镜头）？每个镜头时长为5-10秒。我会先生成一张参考图像，再基于它生成每个镜头的视频。

若用户表述模糊，默认方案为：3个镜头，每个5秒，在kling平台上总费用约1.5美元。执行前请确认用户同意。

3. Lock the look — generate a reference image first

3. 锁定视觉风格——先生成参考图像

For any character or hero scene, generate a reference still with

summer_generate_image

before generating video. This still drives

imageUrl

on every subsequent shot, so the character is consistent across cuts.

summer_generate_image(
  prompt="<subject>, <setting>, cinematic lighting, film still, <art style from GameSoul.md>",
  model="nano-banana-2",
  options={ image_size: "landscape_16_9" }
)

Show the user the still and confirm it's the right look before video-ing it. Regenerating a $0.05 still beats regenerating a $0.50 video.

对于任何包含角色或主角的场景，需先使用

summer_generate_image

生成参考静态图，再生成视频。这张静态图将作为后续所有镜头的

imageUrl

，确保角色在不同镜头中保持一致。

summer_generate_image(
  prompt="<主体>, <场景>, 电影级灯光, 电影静帧, <来自GameSoul.md的艺术风格>",
  model="nano-banana-2",
  options={ image_size: "landscape_16_9" }
)

向用户展示静态图并确认视觉风格符合要求后，再生成视频。重新生成一张0.05美元的静态图，远比重新生成0.5美元的视频更划算。

4. Pick the model

4. 选择模型

Model	Cost	Speed	When
`ltx`	~$0.10	~30s	Iteration, blocking shots, B-roll, throwaway tests
`kling`	~$0.50	2-4 min	Hero shots, character cutscenes, anything the player will sit and watch
`kling-turbo`	~$0.30	1-2 min	Same as kling when iteration speed matters more than the last 10% of quality
`veo3`	~$1.00	3-5 min	Pitch decks, premium dialogue scenes with synced lip motion, short-form ad
`minimax`	~$0.40	2-3 min	Stylized / anime-leaning content; better at non-photoreal looks than kling

Default policy: ltx
first to validate the prompt and shot framing. If it lands the composition but quality is rough, escalate to

kling

. Only reach for

veo3

if dialogue lip-sync matters and the user has approved the cost.

模型	成本	速度	适用场景
`ltx`	~0.10美元	~30秒	迭代测试、分镜镜头、B-roll、临时测试片段
`kling`	~0.50美元	2-4分钟	主角镜头、角色过场动画、玩家会认真观看的内容
`kling-turbo`	~0.30美元	1-2分钟	当迭代速度比最后10%的画质更重要时，替代kling使用
`veo3`	~1.00美元	3-5分钟	演示文稿、带唇形同步的高端对话场景、短视频广告
`minimax`	~0.40美元	2-3分钟	风格化/偏向动漫的内容；在非写实风格上比kling表现更好

默认策略：**优先使用

ltx

**验证提示词和镜头构图。若构图符合要求但画质粗糙，再升级为

kling

。仅当对话唇形同步至关重要且用户同意成本时，才使用

veo3

。

5. Generate each shot

5. 生成单个镜头

summer_generate_video(
  prompt="<subject does <action>, <camera move>, <lighting>, cinematic, 16mm film grain>",
  model="kling",
  imageUrl="<reference image fileUrl from step 3>",
  duration=5,
  aspectRatio="16:9"
)

Returns

{ asset: { fileUrl } }

. Show the user the URL and ask:

Shot 1 of 3 done. Land or regenerate? If land, I'll move to shot 2.

summer_generate_video(
  prompt="<主体执行<动作>, <相机运动>, <灯光>, 电影级, 16mm胶片颗粒感>",
  model="kling",
  imageUrl="<步骤3中参考图像的fileUrl>",
  duration=5,
  aspectRatio="16:9"
)

返回结果为

{ asset: { fileUrl } }

。向用户展示URL并询问：

3个镜头中的第1个已完成。是否保留或重新生成？若保留，我将继续生成第2个镜头。

6. Generate dialogue audio (if the scene has dialogue)

6. 生成对话音频（若场景包含对话）

Cutscene dialogue is TTS, not in the video model. The video model can render mouth motion that looks like talking, but the audio comes from

summer_generate_audio

. Generate it separately and the editor (Godot's AnimationPlayer or your

cinematics/

controller scene) syncs them.

summer_generate_audio(
  capability="text_to_speech",
  text="They'll come for you at dawn. Run while you can.",
  voiceId="<from audio bible — see audio/voice-line>"
)

Lip-sync caveat: if the video shows the character clearly mouthing words and the audio is a different cadence, viewers notice. Either (a) keep the camera off the character's face during dialogue, (b) accept the asynchrony for a stylized look, or (c) use

veo3

and prompt the dialogue text directly into the video prompt for synced motion.

过场动画的对话使用TTS生成，而非视频模型自带功能。视频模型可以渲染类似说话的嘴部动作，但音频需通过

summer_generate_audio

单独生成，再由编辑器（Godot的AnimationPlayer或您的

cinematics/

控制器场景）同步。

summer_generate_audio(
  capability="text_to_speech",
  text="They'll come for you at dawn. Run while you can.",
  voiceId="<来自语音规范——参考audio/voice-line>"
)

唇形同步注意事项：如果视频中角色清晰地做出嘴部动作，但音频节奏不符，观众会注意到。您可以选择：(a) 对话时将镜头移开角色面部；(b) 接受风格化的不同步效果；(c) 使用

veo3

并将对话文本直接加入视频提示词，实现同步动作。

7. Import and wire as a VideoStreamPlayer

7. 导入并配置为VideoStreamPlayer

summer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")

Build a controller scene at

cinematics/Intro.tscn

summer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade")   # black, alpha 1.0 → 0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)

Attach a

Tween

script that fades the

ColorRect

from black to clear over 0.5s on

_ready

, then back to black when

Video.finished

fires, then

queue_free()

s the scene. For multi-shot cutscenes, queue the next

VideoStreamPlayer

in the

finished

signal handler.

For dialogue, add an

AudioStreamPlayer

sibling with the TTS clip and call

play()

_ready

after a small delay matching where the line lands in the video.

summer_import_from_url(url="<fileUrl>", path="assets/video/cinematics/intro_shot_01.mp4")

在

cinematics/Intro.tscn

构建控制器场景：

summer_add_node(parent=".", type="Control", name="Intro")
summer_add_node(parent="./Intro", type="ColorRect", name="Fade")   # 黑色，透明度从1.0变为0.0
summer_add_node(parent="./Intro", type="VideoStreamPlayer", name="Video")
summer_set_prop(path="./Intro/Video", property="stream", value="res://assets/video/cinematics/intro_shot_01.mp4")
summer_set_prop(path="./Intro/Video", property="autoplay", value=true)
summer_set_prop(path="./Intro/Video", property="expand", value=true)
summer_set_prop(path="./Intro/Fade", property="anchors_preset", value=15)

附加

Tween

脚本，在

_ready

时将

ColorRect

从黑色淡化为透明（时长0.5秒），在

Video.finished

触发时再淡回黑色，然后调用

queue_free()

销毁场景。对于多镜头过场动画，在

finished

信号处理器中触发下一个

VideoStreamPlayer

。

若包含对话，添加

AudioStreamPlayer

作为同级节点，加载TTS音频片段，并在

_ready

时延迟一小段时间（匹配视频中对话出现的时机）调用

play()

。

Reference card — prompts that work

参考卡片——有效提示词模板

Pattern:

<subject> + <action> + <camera move> + <lighting> + <stylistic anchor>

. Keep prompts under 50 words; over-prompting confuses the model. Always pair with

imageUrl

to lock the character.

Goal	Model	Prompt	Cost	Duration
Opening establishing shot	`kling`	`wide shot of a fog-shrouded medieval village at dawn, slow dolly-in toward the church spire, warm low sun, cinematic, anamorphic lens flare`	$0.50	5s
Character intro (hero turns to camera)	`kling`	`young witch with raven on shoulder turns slowly toward camera, candlelit interior, shallow depth of field, cinematic film still`	$0.50	5s
Dialogue close-up (no synced lips)	`kling`	`close-up of a grizzled knight, eyes downcast then looking up, firelight on his face, dust motes, 16mm film grain`	$0.50	5s
Dialogue close-up (synced lips)	`veo3`	`close-up of the witch saying "they will come for you at dawn", candlelight, cinematic, shallow DOF`	$1.00	5s
Action beat (village burns)	`kling`	`medium wide of a thatched village engulfed in flames at night, embers rising, silhouettes running through smoke, hand-held camera, cinematic`	$0.50	5s
Ending shot (hero walks away)	`kling`	`low wide shot of a cloaked figure walking away across a blasted plain at sunset, slow truck-back, golden hour, dust on the wind`	$0.50	5s
Throwaway iteration / blocking	`ltx`	same prompt as above	$0.10	5s
Anime / stylized cutscene	`minimax`	`anime-style young swordsman draws blade and steps forward, sakura petals, dramatic wind, Studio Ghibli soft painterly`	$0.40	5s

模板：

<主体> + <动作> + <相机运动> + <灯光> + <风格锚点>

。提示词控制在50词以内；过度提示会让模型困惑。始终搭配

imageUrl

锁定角色外观。

目标	模型	提示词	成本	时长
开场全景镜头	`kling`	`黎明时分被雾气笼罩的中世纪村庄全景，缓慢推镜头朝向教堂尖顶，温暖的低角度阳光，电影级，变形镜头光晕`	0.50美元	5秒
角色介绍（主角转向镜头）	`kling`	`肩上站着乌鸦的年轻女巫缓慢转向镜头，烛光室内环境，浅景深，电影级静帧`	0.50美元	5秒
对话特写（无唇形同步）	`kling`	`满脸胡须的骑士特写，眼睛低垂后抬起，火光映在脸上，尘埃颗粒，16mm胶片颗粒感`	0.50美元	5秒
对话特写（唇形同步）	`veo3`	`女巫说“they will come for you at dawn”的特写，烛光，电影级，浅景深`	1.00美元	5秒
动作场景（村庄燃烧）	`kling`	`夜晚被火焰吞噬的茅草屋村庄中景，灰烬升起，烟雾中奔跑的剪影，手持相机拍摄，电影级`	0.50美元	5秒
结局镜头（主角离去）	`kling`	`日落时分，斗篷人物在荒芜平原上离去的低角度全景镜头，缓慢拉镜头，黄金时刻，风中尘土`	0.50美元	5秒
临时迭代/分镜测试	`ltx`	与上述提示词相同	0.10美元	5秒
动漫/风格化过场动画	`minimax`	`动漫风格年轻剑士拔剑向前，樱花花瓣，戏剧性狂风，吉卜力工作室柔和绘画风格`	0.40美元	5秒

Bad prompts and why

无效提示词及原因

Bad	Why it fails
`epic cutscene of the hero defeating the boss`	No subject, no shot, no camera. Returns a generic action montage.
`the hero walks into the throne room, kneels before the king, draws his sword and says I refuse to serve, then walks out`	Five events in one prompt. Model picks one (badly) or tries all and renders mush. Split into three shots.
`cinematic 4k masterpiece trending on artstation`	Adjective slop. The model already knows "cinematic"; the rest is dead weight.
`make the character look exactly like in the game`	Words can't anchor identity. Use `imageUrl` .
`the camera does a complex handheld weaving move through the crowd`	Video models render simple camera moves (pan, tilt, dolly, truck) reliably and complex moves badly. Pick one verb.

无效提示词	失败原因
`epic cutscene of the hero defeating the boss`	没有明确主体、镜头类型、相机运动。返回的是通用动作蒙太奇。
`the hero walks into the throne room, kneels before the king, draws his sword and says I refuse to serve, then walks out`	一个提示词包含五个事件。模型会随机选择一个（效果很差），或尝试全部呈现导致画面混乱。需拆分为三个镜头。
`cinematic 4k masterpiece trending on artstation`	冗余形容词堆砌。模型已经理解“cinematic”，其余内容毫无意义。
`make the character look exactly like in the game`	文字无法锁定角色身份。请使用 `imageUrl` 。
`the camera does a complex handheld weaving move through the crowd`	视频模型能可靠渲染简单相机运动（摇、移、推、拉），复杂运动效果很差。请只使用一个动词描述相机动作。

Anti-patterns

反模式

Generating a 10s shot when you should chain two 5s shots. Identity, hand consistency, and physics drift every additional second past ~6s. Two 5s shots cut together look better and cost the same.
Skipping the reference image. Generating four character cutscenes from text prompts alone produces four different-looking people. Always lock the look with
```
summer_generate_image
```
first, then
```
imageUrl
```
every subsequent video call.
Asking the video model to render dialogue without
veo3
.
```
kling
```
and
```
ltx
```
will animate mouths but the motion does not match any audio. Use
```
veo3
```
if dialogue is on-camera, or keep the camera off the speaker.
Using
kling
for blocking iterations. Burn $0.10 on
```
ltx
```
to validate framing and prompt; only spend $0.50 once the composition lands.
Putting the cutscene on a
VideoStreamPlayer
without a fade. Cuts straight from gameplay to video are jarring. Always wrap in a
```
ColorRect
```
fade in/out.
Forgetting to import the file.
```
summer_generate_video
```
returns a
```
fileUrl
```
on Cloudinary; until you call
```
summer_import_from_url
```
, it isn't in
```
res://
```
and the scene can't reference it.

生成10秒长镜头而非两个5秒镜头串联：超过约6秒后，角色身份、手部一致性、物理效果会逐渐偏差。两个5秒镜头剪辑在一起效果更好，成本相同。
跳过参考图像：仅通过文本提示生成四个角色过场动画，会得到四个外观完全不同的角色。务必先使用
```
summer_generate_image
```
锁定视觉风格，再在后续所有视频调用中使用
```
imageUrl
```
。
不使用
veo3
却让视频模型渲染对话：
```
kling
```
和
```
ltx
```
会制作嘴部动画，但动作与任何音频都不匹配。若对话出现在镜头中，请使用
```
veo3
```
，或让镜头避开说话者。
使用
kling
进行分镜迭代测试：先用
```
ltx
```
花费0.10美元验证构图和提示词；仅当构图符合要求后，再花费0.50美元使用
```
kling
```
。
将过场动画直接放在
VideoStreamPlayer
上而不添加淡入淡出：从游戏直接切到视频会非常突兀。务必用
```
ColorRect
```
包裹，添加淡入淡出效果。
忘记导入文件：
```
summer_generate_video
```
返回Cloudinary上的
```
fileUrl
```
；在调用
```
summer_import_from_url
```
之前，文件不会存入
```
res://
```
，场景无法引用。

Edge cases

边缘情况

Vertical cutscene for a mobile target. Set
```
aspectRatio="9:16"
```
. Reference image must also be 9:16 — generate it with
```
image_size: "portrait_16_9"
```
or the framing will crop wrong on the video.
Cutscene must match an in-engine character precisely. No video model will hit pixel-perfect identity. Either (a) accept the artistic license, (b) render the cutscene in-engine with
```
AnimationPlayer
```
instead of generating it, or (c) generate a reference image from a screenshot of the in-engine character (image-to-image first, then image-to-video).
Dialogue is too long for a 5s shot. Either trim the line, or split: 5s of speaker, 5s of listener reaction (cheaper because the listener doesn't need synced lips).
Cutscene needs to play and the player isn't looking. Pause the gameplay timer, push a
```
Control
```
overlay with the
```
VideoStreamPlayer
```
, restore on finish.
Multi-shot continuity. Use the same reference image as
```
imageUrl
```
across all shots. If lighting differs by shot, generate one reference per lighting setup, not one per shot.

面向移动平台的竖屏过场动画：设置
```
aspectRatio="9:16"
```
。参考图像也必须是9:16比例——使用
```
image_size: "portrait_16_9"
```
生成，否则视频会出现错误裁剪。
过场动画必须与引擎内角色完全匹配：没有视频模型能实现像素级一致。您可以选择：(a) 接受艺术加工；(b) 使用
```
AnimationPlayer
```
在引擎内渲染过场动画，而非生成式视频；(c) 从引擎内角色截图生成参考图像（先图像转图像，再图像转视频）。
对话时长超过5秒镜头：要么精简台词，要么拆分镜头：5秒说话者镜头，5秒倾听者反应镜头（成本更低，因为倾听者无需唇形同步）。
过场动画播放时玩家不在观看：暂停游戏计时器，推送带有
```
VideoStreamPlayer
```
的
```
Control
```
覆盖层，播放完成后恢复。
多镜头一致性：所有镜头使用同一张参考图像作为
```
imageUrl
```
。若不同镜头灯光不同，为每种灯光设置生成一张参考图像，而非每个镜头一张。

Fallback (no MCP)

备用方案（无MCP）

If the Studio MCP server isn't running, the user can do all of this through the Studio web dashboard at the Summer Engine cloud console:

Generate a reference image in the Image tab.
Open the Video tab, paste the reference URL into the image-to-video field.
Set model, duration, aspect ratio, and prompt as listed above.
Download the
```
.mp4
```
and drop it into
```
assets/video/cinematics/
```
in the project, then re-import via the Godot editor.

Print the exact prompt + model + duration + aspect ratio so the user can paste it into the dashboard verbatim.

如果Studio MCP服务器未运行，用户可以通过Summer Engine云控制台的Studio网页仪表盘完成所有操作：

在Image标签页生成参考图像。
打开Video标签页，将参考图像URL粘贴到图像转视频字段。
按照上述要求设置模型、时长、宽高比和提示词。
下载
```
.mp4
```
文件，放入项目的
```
assets/video/cinematics/
```
目录，再通过Godot编辑器重新导入。

打印精确的提示词+模型+时长+宽高比，方便用户直接粘贴到仪表盘。

Handoff

交接说明

Once the cutscene is generated and wired:

Cutscene
intro
wired at
cinematics/Intro.tscn
with fade in/out and
Video.finished
chained to scene change. Next:
Add the dialogue track with
audio/voice-line
if you haven't yet.
Score the moment with
audio/music-track
— a cutscene without music feels like a placeholder.
For the boss reveal cutscene, run this skill again with the same reference image to keep the witch identity stable.
If you want a 5s "money shot" version for marketing, run
video/trailer-shot
against the same reference.

过场动画生成并配置完成后：

过场动画
intro
已配置在
cinematics/Intro.tscn
，带有淡入淡出效果，
Video.finished
已关联场景切换。下一步：
若尚未添加对话音轨，请使用
audio/voice-line
完成。
使用
audio/music-track
为场景添加配乐——没有配乐的过场动画会显得像占位符。
若制作BOSS登场过场动画，请再次运行本技能并使用相同参考图像，确保女巫角色外观一致。
若需要用于营销的5秒“高光镜头”版本，请针对相同参考图像运行
video/trailer-shot
。