prompt-videos

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompting video models on Replicate

Replicate平台上的视频模型提示词编写指南

Distilled from Replicate's blog posts on prompting video models (2025-2026). Techniques are model-agnostic and focus on transferable principles. For model selection, pricing, and feature comparison, see the compare-models skill.
内容提炼自Replicate 2025-2026年关于视频模型提示词的博客文章。这些技巧与模型无关,聚焦于可迁移的通用原则。如需了解模型选择、定价及功能对比,请参考模型对比技能文档。

Scene description

场景描述

A good video prompt is a scene description, not a caption. Write what happens, where, and how it looks.
优质的视频提示词是场景描述,而非简单的标题。要写出场景中的主体、地点、动作及视觉呈现。

Layer these elements into every prompt

在每个提示词中融入以下元素

  1. Subject: Who or what is in the scene (a person, animal, object, landscape).
  2. Context: Where the subject is (indoors, a city street, a forest, a spaceship corridor).
  3. Action: What the subject does (walks, turns, picks up a phone, runs).
  4. Style: The visual aesthetic (cinematic, animated, stop-motion, documentary).
  5. Camera: How the camera moves (dolly shot, tracking, static, handheld).
  6. Composition: How the shot is framed (wide shot, close-up, over-the-shoulder).
  7. Ambiance: Mood and lighting (warm tones, blue light, golden hour, overcast).
  1. 主体:场景中的人、动物、物体或景观。
  2. 环境:主体所处的位置(室内、城市街道、森林、飞船走廊)。
  3. 动作:主体的行为(行走、转身、拿起手机、奔跑)。
  4. 风格:视觉美学风格(电影质感、动画风格、定格动画、纪录片风格)。
  5. 镜头运动:摄像机的运动方式(推拉镜头、跟拍、固定镜头、手持拍摄)。
  6. 构图:镜头的取景方式(广角镜头、特写镜头、过肩镜头)。
  7. 氛围:情绪与光线(暖色调、冷蓝光、黄金时刻、阴天)。

Be specific, not vague

具体明确,避免模糊

Vague: "A car chase"
Specific: "A high-speed car chase on a rain-drenched highway at night. Two muscle cars weave through heavy traffic at 140mph, headlights slicing through the downpour. One car clips a semi-truck sending sparks showering across six lanes. Tires hydroplane on standing water. Neon highway signs blur overhead."
模糊示例:"一场汽车追逐"
具体示例:"雨夜中,潮湿的高速公路上上演高速汽车追逐。两辆肌肉车以140英里的时速在拥堵车流中穿梭,车灯划破暴雨。其中一辆车剐蹭到半挂卡车,火花溅向六条车道。轮胎在积水路面打滑。头顶的霓虹高速路牌模糊不清。"

Overdescribe

详细描述

Modern video models handle long, dense prompts well. Don't write "a man on the phone." Write "a desperate man in a weathered green trench coat picks up a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign." Every concrete detail you add gives the model less room to improvise poorly.
现代视频模型能很好地处理冗长、细节丰富的提示词。不要写“一个打电话的男人”,而是写“一个绝望的男人穿着破旧的绿色风衣,拿起安装在粗糙砖墙上的旋转电话,身处于绿色霓虹灯的诡异光晕中。”你添加的每一个具体细节,都能减少模型不当即兴发挥的空间。

Name subjects directly

直接明确主体

Use descriptive phrases like "the woman in the red jacket" or "the bearded man in flannel." Avoid pronouns, which are ambiguous to video models just as they are to image models.
使用描述性短语,比如“穿红色夹克的女人”或“穿法兰绒衬衫的大胡子男人”。避免使用代词,因为视频模型和图像模型一样,对代词的理解存在歧义。

Camera and cinematography

镜头与摄影

Video models understand filmmaking language. Use it to direct the shot rather than hoping for good framing.
视频模型能理解电影制作术语。使用这些术语来指导镜头拍摄,而非寄希望于自动生成合适的构图。

Shot types

镜头类型

Use standard shot terminology to control framing:
  • Wide/establishing shot: shows the full scene and environment
  • Medium shot: frames the subject from roughly the waist up
  • Close-up: fills the frame with the subject's face or a key object
  • Extreme close-up: isolates a detail (an eye, a hand gripping a handle, a drop of water)
使用标准镜头术语控制构图:
  • 广角/定场镜头:展示完整场景与环境
  • 中景镜头:取景范围大致到主体腰部以上
  • 特写镜头:画面充满主体的脸部或关键物体
  • 极端特写镜头:聚焦细节(眼睛、握着手柄的手、一滴水)

Camera motion

镜头运动

Describe how the camera moves:
  • Static/tripod: locked-off, no movement
  • Pan: horizontal rotation left or right
  • Tilt: vertical rotation up or down
  • Dolly: camera physically moves toward or away from the subject
  • Tracking: camera moves alongside the subject
  • Crane: camera rises or descends vertically
  • Handheld: shaky, documentary-style movement
  • Drone/aerial: overhead or sweeping bird's-eye shots
  • Dolly zoom (Hitchcock/vertigo effect): background stretches while subject stays locked
描述摄像机的运动方式:
  • 固定/三脚架镜头:固定不动,无运动
  • 摇镜头:左右水平旋转
  • 俯仰镜头:上下垂直旋转
  • 推拉镜头:摄像机向主体靠近或远离
  • 跟拍镜头:摄像机跟随主体移动
  • 升降镜头:摄像机垂直上升或下降
  • 手持镜头:摇晃的纪录片风格运动
  • 无人机/航拍镜头:俯视或全景鸟瞰镜头
  • 推拉变焦(希区柯克/眩晕效果):主体保持固定,背景拉伸

Camera position

镜头位置

Specify the camera's height and angle:
  • Eye level: neutral, natural perspective
  • Low angle / worm's eye: looking up at the subject (makes subjects feel powerful or imposing)
  • High angle / bird's eye: looking down (makes subjects feel small or vulnerable)
  • Over-the-shoulder: frames one subject from behind another
  • POV / first-person: camera is the subject's eyes
指定摄像机的高度与角度:
  • 平视:中立、自然的视角
  • 低角度/仰拍:向上拍摄主体(让主体显得强大或威严)
  • 高角度/俯拍:向下拍摄(让主体显得渺小或脆弱)
  • 过肩镜头:从一个主体身后取景拍摄另一个主体
  • POV/第一人称:摄像机代表主体的视角

Lens and focus language

镜头与对焦术语

  • Shallow depth of field: subject sharp, background blurred
  • Deep focus: everything sharp from foreground to background
  • Macro lens: extreme close-up with shallow focus
  • Wide-angle lens: exaggerated perspective, more environment visible
  • Tilt-shift: miniature effect, selective focus band
  • 浅景深:主体清晰,背景模糊
  • 深景深:从前景到背景所有物体都清晰
  • 微距镜头:极端特写,浅景深
  • 广角镜头:夸张视角,展示更多环境
  • 移轴镜头:微缩模型效果,选择性对焦区域

Escalation pattern

镜头递进模式

A natural progression for short clips is wide > medium > close-up > extreme close-up. This maps well onto 8-15 second clips and gives the model clear structure. For example:
  • 0-3s: wide establishing shot of the location
  • 3-7s: medium shot, the subject enters or acts
  • 7-12s: close-up on the key moment
  • 12-15s: extreme close-up on a detail (a hand, an eye, a drop of rain)
短剪辑的自然递进顺序是:广角 > 中景 > 特写 > 极端特写。这种模式非常适合8-15秒的剪辑,能给模型清晰的结构。例如:
  • 0-3秒:广角定场镜头展示场景
  • 3-7秒:中景镜头,主体进入画面或做出动作
  • 7-12秒:特写镜头聚焦关键瞬间
  • 12-15秒:极端特写镜头展示细节(手、眼睛、雨滴)

Audio and dialogue

音频与对话

Many video models generate audio natively alongside the visuals. If you don't prompt for the audio you want, the model will guess, and it often guesses wrong.
许多视频模型会在生成视觉内容的同时原生生成音频。如果你不明确提示所需的音频,模型会自行猜测,且往往猜测错误。

Prompt all four audio layers

提示所有四个音频层

  1. Dialogue: What characters say, either exact words or described intent.
  2. Ambient sound: The background audio of the scene (rain on metal awnings, city traffic, forest birds).
  3. Sound effects: Specific sounds from actions (a door slamming, glass breaking, a sword being drawn).
  4. Music: Genre, mood, and instrumentation (a tense cinematic score, soft jazz piano, no music).
If you skip ambient audio, models may hallucinate inappropriate sounds. A common failure mode is adding a "live studio audience" laughing in the background. Prevent this by describing the soundscape explicitly: "sounds of distant bands, noisy crowd, ambient background of a busy festival field."
  1. 对话:角色所说的内容,可以是确切的台词或描述性的意图。
  2. 环境音:场景的背景音频(雨水打在金属雨棚上的声音、城市交通声、森林鸟鸣)。
  3. 音效:动作产生的特定声音(门砰地关上、玻璃破碎、拔剑声)。
  4. 音乐:流派、情绪与乐器(紧张的电影配乐、轻柔的爵士钢琴、无音乐)。
如果你忽略环境音,模型可能会生成不合适的声音。常见的失败案例是在背景中添加“现场演播室观众”的笑声。可以通过明确描述音景来避免这种情况:“远处乐队的声音、嘈杂的人群、繁忙音乐节现场的背景环境音。”

Dialogue prompting

对话提示

There are two approaches:
  • Explicit: "The man says: My name is Ben." This gives you exact control over the words.
  • Implicit: "The man introduces himself." This lets the model decide the phrasing.
Explicit dialogue should be short enough to fit the clip duration. Packing too much dialogue into an 8-second clip produces unnaturally fast speech. Too little dialogue can produce awkward silence or AI gibberish.
有两种方法:
  • 明确式:“男人说:我叫本。”这种方式能让你完全控制台词内容。
  • 隐含式:“男人进行自我介绍。”这种方式让模型决定措辞。
明确式对话应足够短,以适配剪辑时长。在8秒的剪辑中塞入过多对话会导致语速异常快。对话太少则可能产生尴尬的沉默或AI无意义语音。

Syntax that avoids subtitles

避免字幕的语法

Many video models were trained on videos with baked-in subtitles and will add them to outputs. To prevent this:
  • Use a colon for dialogue: "She says: Hello there" rather than "She says 'Hello there'"
  • Add "(no subtitles)" to the prompt
  • If subtitles persist, repeat the instruction: "No subtitles. No subtitles!"
许多视频模型是基于带有内嵌字幕的视频训练的,因此会在输出中添加字幕。要避免这种情况:
  • 使用冒号表示对话:“她说:你好”而非“她说‘你好’”
  • 在提示词中添加“(无字幕)”
  • 如果字幕仍然存在,重复指令:“无字幕。无字幕!”

Pronunciation

发音

If a model mispronounces a name or word, spell it phonetically in the prompt. For example, write "foh-fur" instead of "fofr" or "Shreedar" instead of "Shridhar."
如果模型误读了名字或单词,可以在提示词中使用音标拼写。例如,将“fofr”写成“foh-fur”,或将“Shridhar”写成“Shreedar”。

Who says what

明确说话者

In multi-character scenes, the model can mix up who says what. Tie dialogue to distinctive visual descriptions: "The woman wearing pink says: ..." and "The man with glasses replies: ..."
在多角色场景中,模型可能会混淆说话者。将对话与独特的视觉描述绑定:“穿粉色衣服的女人说:...”和“戴眼镜的男人回复:...”

Multi-shot and time-coded prompting

多镜头与时间编码提示

Some models support generating multiple shots within a single clip (up to ~15 seconds). You can direct each shot individually using time codes.
部分模型支持在单个剪辑(最长约15秒)中生成多个镜头。你可以使用时间编码分别指导每个镜头。

Time-coded format

时间编码格式

Write timestamps directly into the prompt:
[0-4s]: Wide establishing shot, static camera, misty bamboo forest at dawn
[4-9s]: Medium shot, slow push-in, the fighter steps forward
[9-15s]: Close-up, orbit shot, the fighter strikes, slow motion
Each shot should specify:
  • Camera position and motion
  • Subject action
  • Lighting or mood shifts
在提示词中直接写入时间戳:
[0-4s]: 广角定场镜头,固定摄像机,黎明时分雾气弥漫的竹林
[4-9s]: 中景镜头,缓慢推近,战士向前迈步
[9-15s]: 特写镜头,环绕拍摄,战士出拳,慢动作
每个镜头应指定:
  • 摄像机位置与运动
  • 主体动作
  • 光线或情绪变化

Transition language

转场术语

Use explicit transition instructions between shots:
  • "Hard cut to..." for an abrupt switch
  • "Seamless morph into..." for a fluid transition
  • "Whip pan to..." for a fast, energetic cut
  • "Snap cut to..." for a jarring, dramatic shift
Without explicit transitions, the model improvises, which may or may not match your intent.
在镜头之间使用明确的转场指令:
  • “硬切到...”:突兀切换
  • “无缝渐变到...”:流畅转场
  • “快速摇切到...”:快速、充满活力的切换
  • “闪切到...”:突兀、戏剧性的转变
如果没有明确的转场指令,模型会自行即兴发挥,结果可能符合也可能不符合你的意图。

Example: multi-shot commercial

示例:多镜头广告

(0-3s) Macro shot of a luxury perfume bottle among scattered pink peonies,
       shallow depth of field, petals floating in warm afternoon light,
       soft ambient music.
(3-7s) Camera glides closer, a feminine hand enters frame from the right,
       fingers gently touch the glass bottle, the sound of silk rustling.
(7-12s) Hard cut to slow-motion spray, golden mist diffuses through the air,
        particles catching rim light against a dark background,
        the hiss of the atomizer.
(12-15s) Seamless pull-out to hero frame, product centered, volumetric
         lighting, minimal cream background, elegant silence.
(0-3s) 微距镜头展示奢华香水瓶散落在粉色牡丹花丛中,
       浅景深,花瓣在温暖的午后光线中漂浮,
       轻柔的环境音乐。
(3-7s) 摄像机缓缓靠近,一只女性的手从右侧进入画面,
       手指轻轻触碰玻璃瓶,丝绸摩擦的声音。
(7-12s) 硬切到慢动作喷雾画面,金色雾气在空中散开,
        颗粒在深色背景下捕捉轮廓光,
        雾化器的嘶嘶声。
(12-15s) 无缝拉远至主画面,产品居中,体积光,
         极简奶油色背景,优雅的沉默。

Reference inputs

参考输入

Many video models accept images, video clips, or audio files as reference inputs alongside a text prompt. This shifts the workflow from "prompting" to something closer to "directing."
许多视频模型除了文本提示词外,还接受图像、视频剪辑或音频文件作为参考输入。这将工作流程从“提示”转变为更接近“执导”的模式。

Image-to-video

图像转视频

Feed a starting image and describe the motion. The model animates from that frame.
  • The input image becomes the first frame of the video
  • Describe what changes (action, camera movement), not the static scene the model can already see
  • Style preservation is a strength: animated styles, paintings, photographs, and color grading all carry through
  • For maximum style control, generate the starting image with a specialized image model first, then pass it to the video model
输入起始图像并描述运动。模型会从该帧开始生成动画。
  • 输入图像成为视频的第一帧
  • 描述变化的内容(动作、摄像机运动),而非模型已能看到的静态场景
  • 风格保留是优势:动画风格、绘画、照片和色彩分级都能得以延续
  • 为了最大化风格控制,先使用专门的图像模型生成起始图像,再将其传入视频模型

First and last frame interpolation

首尾帧插值

Some models accept both a starting and ending image. The model generates the transition between them. This is useful for:
  • Morphing between subjects (e.g. one animal transforming into another)
  • Before/after transformations (room makeover, seasonal change)
  • Controlled narrative arcs where you know the start and end state
部分模型接受起始和结束图像。模型会生成两者之间的过渡画面。这适用于:
  • 主体变形(例如,一种动物转变为另一种动物)
  • 前后对比变换(房间改造、季节变化)
  • 已知起始和结束状态的可控叙事弧

Subject references

主体参考

Some models accept reference images of characters, products, or objects and maintain their appearance in the generated video. This is useful for:
  • UGC-style product review videos (reference image of character + reference image of product)
  • Brand consistency across multiple video clips
  • Placing existing characters in new scenarios
When referencing input assets, many models use a bracket syntax like
[Image1]
or
[Audio1]
in the prompt to specify which reference maps to which role: "[Image2] is in the interior of [Image1]."
部分模型接受角色、产品或物体的参考图像,并在生成的视频中保持其外观。这适用于:
  • UGC风格的产品评测视频(角色参考图像 + 产品参考图像)
  • 多个视频剪辑中的品牌一致性
  • 将现有角色置于新场景中
当引用输入资产时,许多模型使用括号语法(如
[Image1]
[Audio1]
)在提示词中指定哪个参考对应哪个角色:“[Image2]位于[Image1]的内部。”

Audio-driven generation

音频驱动生成

Some models accept audio files and sync the generated video to the audio. The model can match:
  • Lip movements to speech
  • Cuts and motion to musical beats
  • Ambient rhythm to environmental sounds
When using audio references, it helps to also transcribe the audio content in the text prompt itself, and match the video duration to the audio length.
部分模型接受音频文件,并将生成的视频与音频同步。模型可以匹配:
  • 唇形与语音同步
  • 剪辑和运动与音乐节拍同步
  • 环境节奏与环境声音同步
使用音频参考时,将音频内容转录到文本提示词中,并使视频时长与音频长度匹配会有所帮助。

Multi-reference workflows

多参考工作流

The most powerful results come from combining multiple reference types:
  • An image for character appearance
  • A video clip for motion style
  • An audio track for rhythm and pacing
  • A text prompt describing how everything fits together
最强大的结果来自组合多种参考类型:
  • 用于角色外观的图像
  • 用于运动风格的视频剪辑
  • 用于节奏和速度的音轨
  • 描述所有元素如何组合的文本提示词

Style control

风格控制

Name the style explicitly

明确指定风格

Video models understand style labels. Include them directly in your prompt:
  • "In the style of claymation"
  • "Pixar animation style"
  • "Anime"
  • "Stop-motion"
  • "8-bit retro"
  • "Graphic novel"
  • "Documentary footage"
  • "Origami"
  • "LEGO"
  • "Blueprint technical drawing"
Style labels affect not just the visual look but also how characters move and interact. A claymation style produces jerky, stop-motion movement. An anime style produces fluid, exaggerated motion.
视频模型能理解风格标签。直接在提示词中包含这些标签:
  • “黏土动画风格”
  • “皮克斯动画风格”
  • “动漫风格”
  • “定格动画”
  • “8位复古风格”
  • “漫画风格”
  • “纪录片镜头”
  • “折纸风格”
  • “乐高风格”
  • “蓝图技术绘图风格”
风格标签不仅影响视觉外观,还影响角色的动作和互动方式。黏土动画风格会产生生硬的定格动画运动。动漫风格会产生流畅、夸张的动作。

Quality anchors

质量锚点

Phrases like "hyper-realistic, 8k" or "cinematic" push models toward their highest fidelity output. Use them when you want photorealistic results.
诸如“超写实,8k”或“电影质感”之类的短语会推动模型生成最高保真度的输出。当你想要照片级真实效果时可以使用这些短语。

Film and genre language

电影与流派术语

Reference specific genres or filmmaking styles for mood and tone:
  • "Michael Mann cinematography" (neon, night, urban)
  • "Wes Anderson" (symmetrical, pastel, quirky)
  • "Roger Deakins lighting" (naturalistic, precise)
  • "Blade Runner 2049 cinematography" (atmospheric, orange/teal)
  • "National Geographic documentary" (nature, steady, observational)
参考特定流派或电影制作风格来控制情绪与基调:
  • “迈克尔·曼的摄影风格”(霓虹、夜晚、都市)
  • “韦斯·安德森风格”(对称、马卡龙色调、古怪)
  • “罗杰·迪金斯的布光”(自然主义、精准)
  • “《银翼杀手2049》摄影风格”(氛围感、橙蓝配色)
  • “国家地理纪录片风格”(自然、稳定、观察式)

Use input images for style

使用输入图像控制风格

Rather than describing a style verbally, generate an image with the exact aesthetic you want using an image model, then pass it to the video model. This gives you pixel-level control over the look. The video model preserves the style, color grading, and composition while adding motion.
与其用语言描述风格,不如使用图像模型生成具有精确美学风格的图像,再将其传入视频模型。这能让你对视觉效果实现像素级控制。视频模型会在添加运动的同时保留风格、色彩分级和构图。

Grain and texture

颗粒感与纹理

Adding "slightly grainy, film-like" or "VHS aesthetic" pushes output away from the too-clean AI look and makes videos feel more organic.
添加“略带颗粒感,胶片质感”或“VHS美学”可以让输出摆脱过于干净的AI感,使视频更具有机感。

Character consistency

角色一致性

Repeat descriptions verbatim

逐字重复描述

When generating multiple clips with the same character, use identical character descriptions across prompts. Create a "character sheet" with exact wording:
"John, a man in his 40s with short brown hair, wearing a blue jacket and glasses, looking thoughtful"
Paste this description into every prompt where John appears. The more specific and unique the description, the more consistent the results.
当生成包含同一角色的多个剪辑时,在所有提示词中使用完全相同的角色描述。创建一个包含确切措辞的“角色表”:
“约翰,40多岁的男人,棕色短发,穿着蓝色夹克和眼镜,神情若有所思”
将此描述粘贴到每个出现约翰的提示词中。描述越具体、独特,结果的一致性就越高。

What to specify

需要指定的内容

  • Physical appearance: age, hair, skin, build
  • Clothing: exact garments, colors, materials
  • Accessories: glasses, jewelry, hat
  • Expression or demeanor: thoughtful, cheerful, intense
  • 外貌:年龄、发型、肤色、体型
  • 服装:确切的衣物、颜色、材质
  • 配饰:眼镜、首饰、帽子
  • 表情或神态:若有所思、开朗、紧张

Vary the scene, not the character

改变场景,不改变角色

When placing a consistent character in different scenarios, change only the action, location, and camera work. Keep the character description word-for-word identical.
将一致的角色置于不同场景时,只改变动作、地点和镜头操作。保持角色描述逐字完全相同。

Reference images for identity

使用参考图像确保身份一致性

If the model supports subject reference images, use a clear photo of the character as input. This is more reliable than text descriptions alone, especially for maintaining facial features across clips.
如果模型支持主体参考图像,使用清晰的角色照片作为输入。这比仅使用文本描述更可靠,尤其是在多个剪辑中保持面部特征一致方面。

Common pitfalls

常见陷阱

  1. Not describing audio: If you skip audio prompting, models hallucinate ambient sounds. A common failure is adding inappropriate laughter or a "live studio audience." Always describe the soundscape.
  2. Too much dialogue for the clip length: An 8-second clip can hold roughly 2-3 short sentences. Packing in a paragraph produces unnaturally fast speech or truncated output.
  3. Too little dialogue for the clip length: If you only provide a few words for a long clip, the model fills silence with gibberish or awkward pauses. Match dialogue length to clip duration.
  4. Not specifying what to keep unchanged: When using reference images or editing, always state what should stay the same. Without explicit instructions, models may change anything.
  5. Expecting variation from identical prompts: Unlike image models, some video models produce very similar outputs for the same prompt (even with different seeds). If you want variety, change the prompt, don't just rerun it.
  6. Not prompting camera motion: Without camera direction, you get either static shots or unpredictable movement. Describe the camera explicitly.
  7. Subtitle contamination: Many models were trained on videos with baked-in subtitles. Use colons for dialogue (not quotes), add "(no subtitles)", and repeat if necessary.
  8. Vague prompts for complex scenes: Modern video models handle long, detailed prompts. A prompt with 12+ specific requirements (camera moves, lighting, sound design, subject actions, environmental details) can work if each requirement is stated clearly. Don't undersell what you want.
  9. Ignoring aspect ratio and resolution: Most video models have specific resolutions they support (480p, 720p, 1080p). Check what the model supports and choose the right resolution for your use case. If you need vertical video and the model only outputs landscape, you may need to reframe with a separate tool.
  10. Forgetting that video models don't have internet access: No video model has live information. They work from training data. Don't expect them to know about current events or real-time information.
  1. 未描述音频:如果你跳过音频提示,模型会生成幻觉环境音。常见的失败案例是添加不合适的笑声或“现场演播室观众”声音。务必描述音景。
  2. 对话时长与剪辑时长不匹配:8秒的剪辑大约能容纳2-3个短句。塞入一段文字会导致语速异常快或输出被截断。
  3. 对话过少:如果长剪辑仅提供少量对话,模型会用无意义语音或尴尬的停顿填充沉默。使对话时长与剪辑时长匹配。
  4. 未指定需保持不变的内容:使用参考图像或进行编辑时,务必说明哪些内容应保持不变。没有明确指令的话,模型可能会改变任何内容。
  5. 期望相同提示词产生不同输出:与图像模型不同,部分视频模型对相同提示词(即使使用不同种子)会生成非常相似的输出。如果你想要多样化的结果,应修改提示词,而非仅重新运行。
  6. 未提示镜头运动:没有镜头方向的话,你会得到固定镜头或不可预测的运动。明确描述摄像机操作。
  7. 字幕污染:许多模型是基于带有内嵌字幕的视频训练的。使用冒号表示对话(而非引号),添加“(无字幕)”,必要时重复指令。
  8. 复杂场景使用模糊提示词:现代视频模型能处理冗长、详细的提示词。包含12项以上具体要求(镜头运动、光线、声音设计、主体动作、环境细节)的提示词,只要每项要求表述清晰,就能有效发挥作用。不要低估你能表达的需求。
  9. 忽略宽高比与分辨率:大多数视频模型有特定的支持分辨率(480p、720p、1080p)。检查模型支持的分辨率,并为你的使用场景选择合适的分辨率。如果你需要竖屏视频但模型仅输出横屏,可能需要使用单独的工具重新构图。
  10. 忘记视频模型无互联网访问权限:没有视频模型具备实时信息。它们基于训练数据工作。不要期望它们了解当前事件或实时信息。

Sources

来源

All techniques in this skill are sourced from Replicate's blog:
本技能中的所有技巧均来自Replicate博客: