prompt-images

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompting image models on Replicate

在Replicate上撰写图像模型提示词

Distilled from Replicate's blog posts on prompting image models (2024-2026). Techniques are model-agnostic and focus on transferable principles. For model selection, pricing, and feature comparison, see the compare-models skill.
本文提炼自Replicate 2024-2026年关于图像模型提示词撰写的博客文章。这些技巧不依赖特定模型,聚焦于可迁移的通用原则。如需了解模型选择、定价及功能对比,请查看模型对比技能文档。

Writing prompts

提示词撰写要点

Use natural language, not keyword lists

使用自然语言,而非关键词列表

Write full sentences describing what you want. Modern image models understand grammar and context far better than keyword-stuffed prompts.
Good: "A woman standing in a Tokyo alleyway at dusk, neon signs reflecting off wet pavement" Bad: "woman, Tokyo, alleyway, dusk, neon, wet pavement"
用完整的句子描述你想要的内容。现代图像模型对语法和语境的理解远胜于堆砌关键词的提示词。
Good: "A woman standing in a Tokyo alleyway at dusk, neon signs reflecting off wet pavement" Bad: "woman, Tokyo, alleyway, dusk, neon, wet pavement"

Be specific and unambiguous

表述具体且明确

Name exact colors, materials, lighting setups, camera equipment, and spatial relationships. Vague terms like "make it better" or "artistic" give unpredictable results.
Good: "A brutalist concrete building reflected in a perfectly still puddle after rain. A single figure with a red umbrella walks along the edge, the only color in an otherwise monochrome scene. Overcast sky, flat diffused light, tilt-shift lens effect on the edges." Bad: "Cool building with a person near it, rainy day"
明确指定颜色、材质、灯光设置、摄影器材及空间关系。像“做得更好”或“有艺术感”这类模糊表述会导致不可预测的结果。
Good: "A brutalist concrete building reflected in a perfectly still puddle after rain. A single figure with a red umbrella walks along the edge, the only color in an otherwise monochrome scene. Overcast sky, flat diffused light, tilt-shift lens effect on the edges." Bad: "Cool building with a person near it, rainy day"

Name subjects directly

直接指明主体

Use descriptive phrases like "the woman with short black hair" or "the red car." Avoid pronouns, which are often too ambiguous for image models.
使用“留黑色短发的女性”或“红色汽车”这类描述性短语。避免使用代词,因为图像模型通常难以理解代词的指代对象。

Use long, detailed prompts

使用冗长且详细的提示词

Most modern models accept thousands of tokens. Long descriptive prompts with clear structure outperform short ones. A prompt with 12+ specific requirements (text on objects, labeled diagrams, color-coded elements, specific materials) can work if each requirement is stated clearly. But be aware: the longer and more complex the prompt, the more likely something will be missed.
大多数现代模型支持数千个token。结构清晰、描述详细的长提示词效果优于短提示词。如果每个要求都表述清晰,包含12项以上特定要求(如物体上的文字、标注图表、颜色编码元素、特定材质)的提示词也能生效。但需注意:提示词越长、越复杂,就越有可能遗漏某些内容。

Start simple, then iterate

从简单开始,逐步迭代

Begin with basic changes. Test small edits first, then build on what works. Most editing models support iterative editing, so take advantage of that.
从基础修改入手。先测试小幅度调整,再在有效的基础上逐步完善。大多数编辑模型支持迭代编辑,请充分利用这一特性。

Photographic language

摄影术语的运用

Modern image models understand camera and photography terminology deeply. Using this vocabulary gives you precise control over the look.
现代图像模型能深刻理解摄影相关术语。使用这类词汇可以让你精准控制图像效果。

Camera and lens

相机与镜头

  • Film stocks: Kodak Portra 800, Fuji Velvia 50, Ilford HP5
  • Lens characteristics: 50mm Summilux wide open, 85mm f/1.4, 24mm wide-angle
  • Depth of field: shallow (subject sharp, background blurred), deep (everything in focus)
  • Shooting techniques: golden hour, blue hour, long exposure, double exposure
  • 胶卷型号:Kodak Portra 800, Fuji Velvia 50, Ilford HP5
  • 镜头特性:50mm Summilux 全开光圈, 85mm f/1.4, 24mm 广角
  • 景深:浅景深(主体清晰,背景模糊)、深景深(全画面清晰)
  • 拍摄技巧:黄金时刻、蓝调时刻、长曝光、双重曝光

Lighting setups

灯光设置

  • Rembrandt lighting: classic portrait lighting with a triangle of light on the cheek
  • Soft diffused studio lighting: crisp highlights and gentle shadows
  • Rim lighting / backlight: subject outlined with light from behind
  • Flat diffused light: overcast, even illumination, minimal shadows
  • Volumetric lighting: visible light beams, fog, haze
  • 伦勃朗光:经典人像布光,脸颊处形成三角形光斑
  • 柔和漫射工作室灯光:高光清晰,阴影柔和
  • 轮廓光/背光:从后方打光勾勒主体轮廓
  • 平光漫射:阴天光照,光线均匀,阴影极少
  • 体积光:可见光束、雾气、霾

Composition

构图

  • Rule of thirds, centered composition, symmetry
  • Wide shot, medium shot, close-up, macro
  • High angle, low angle, eye level, bird's-eye view
  • Tilt-shift for miniature effects
  • 三分法、居中构图、对称构图
  • 全景、中景、特写、微距
  • 高角度、低角度、平视、鸟瞰
  • 移轴镜头效果模拟微缩场景

Text rendering

文字渲染

Rendering text in images is a common task. These techniques improve accuracy across models.
  • Wrap desired text in double quotation marks within the prompt: "Design a poster with the title "BLUE NOTE SESSIONS" in bold condensed sans-serif"
  • Stick to readable fonts. Highly stylized text may not work as well.
  • When editing text in an existing image, use the pattern: "Change 'old text' to 'new text'"
  • Match text length when possible: big shifts in character count can change layout
  • Be explicit about preserving font style if it matters
  • For complex typography (posters, editorial layouts), look for models that treat text as part of the composition rather than stamping it on top
  • Some models can inpaint text: mask the text region, prompt with new text, and it matches the original font and style
在图像中渲染文字是常见需求。以下技巧可提升各模型的文字生成准确性。
  • 在提示词中用双引号包裹需要渲染的文字:"Design a poster with the title "BLUE NOTE SESSIONS" in bold condensed sans-serif"
  • 选择易读字体。风格过于夸张的文字效果可能不理想。
  • 编辑现有图像中的文字时,使用固定格式:"将'旧文字'改为'新文字'"
  • 尽量匹配文字长度:字符数大幅变化可能改变布局
  • 如果需要保留字体样式,请明确说明
  • 处理复杂排版(如海报、社论版面)时,选择将文字视为构图一部分而非直接叠加的模型
  • 部分模型支持文字修复:框选文字区域,输入新文字提示,模型会匹配原字体和样式

Style transfer

风格迁移

  • Name the exact style: "impressionist painting," "1960s pop art," "Sumi-e ink wash"
  • Reference specific artists or movements for clearer guidance
  • If a style label doesn't work, describe its key traits: "visible brushstrokes, thick paint texture, rich color depth"
  • State what should stay the same: "keep the original composition"
  • When a style is hard to describe in words, some models support example-based editing: provide a before/after pair, then a third image. The model infers the transformation and applies it.
  • Some models accept style reference images: upload visuals capturing the color palette, texture, composition, and mood you want
  • 明确指定风格:“印象派画作”、“1960年代波普艺术”、“水墨画”
  • 参考特定艺术家或艺术流派以获得更清晰的指引
  • 如果风格标签效果不佳,描述其核心特征:“可见笔触、厚重颜料质感、丰富色彩层次”
  • 说明需要保留的元素:“保留原始构图”
  • 当风格难以用语言描述时,部分模型支持基于示例的编辑:提供一组前后对比图,再输入第三张图像,模型会推断转换规则并应用
  • 部分模型支持风格参考图:上传能体现所需配色、质感、构图和氛围的图像

Character consistency

角色一致性

Maintaining the same character across multiple generations is one of the hardest challenges in image generation.
  • Start with a clear reference description: "the woman with short black hair and green eyes wearing a navy blazer"
  • Say what's changing (setting, activity, style) and what should stay the same (face, expression, clothing)
  • Use reference images when the model supports them. Some models handle multiple reference images simultaneously for stronger consistency.
  • Break complex character changes into steps: change outfit first, then change scene
  • Generate synthetic training data: create many images of a character, pick the best ones, and use them for fine-tuning or as references
在多次生成中保持同一角色是图像生成的最大挑战之一。
  • 从清晰的参考描述开始:“留黑色短发、绿眼睛、身穿海军蓝西装外套的女性”
  • 说明哪些元素需要改变(场景、动作、风格),哪些需要保留(面部、表情、服装)
  • 如果模型支持,使用参考图像。部分模型可同时处理多张参考图像以增强一致性
  • 将复杂的角色修改拆分为步骤:先更换服装,再更改场景
  • 生成合成训练数据:生成多张同一角色的图像,挑选最佳作品用于微调或作为参考

Image editing

图像编辑

General principles

通用原则

  • Specify what to keep: explicitly state what should remain unchanged. Use phrases like "keeping the pose and expression unchanged" or "maintain the original composition."
  • Choose verbs carefully: "transform" suggests a full rework. Use specific actions like "change the clothes to a blue jacket" or "replace the background with a beach."
  • Be precise about scope: "Change the background to a beach while keeping the person in the exact same position, maintain identical subject placement, camera angle, framing, and perspective. Only replace the environment around them."
  • 指定需要保留的元素:明确说明哪些内容应保持不变。例如“保持姿势和表情不变”或“保留原始构图”
  • 谨慎选择动词:“transform”意味着全面重制。使用精准的动作描述,如“将衣服改为蓝色夹克,保持面部和表情不变”
  • 明确修改范围:“将背景改为海滩,人物位置完全不变,保持主体位置、拍摄角度、取景框和透视关系一致。仅替换人物周围的环境。”

Object removal

物体移除

  • Describe what should fill the space left behind, not just what to remove
  • Some editing models handle removal cleanly; others leave structural artifacts. If one model struggles, try another.
  • 描述移除物体后填充的内容,而非仅说明要移除的物体
  • 部分编辑模型能干净地移除物体;部分则会留下结构瑕疵。如果某个模型效果不佳,尝试更换其他模型

Background editing

背景编辑

  • Describe the new background in detail: lighting, time of day, environment
  • Specify that the subject should remain in the exact same position with the same lighting
  • 详细描述新背景:灯光、时间、环境
  • 明确说明主体应保持原有位置和光照

Perspective and angle changes

透视与角度修改

  • These are among the hardest edits. Not all models handle them well.
  • Some models restrict themselves to the initial composition and struggle with new angles
  • 这是难度最高的编辑操作之一。并非所有模型都能很好地处理此类修改
  • 部分模型受限于初始构图,难以生成新角度的图像

Inpainting and outpainting

局部重绘(Inpainting)与扩展绘制(Outpainting)

  • For inpainting: mask the region to edit, then prompt with what should fill it
  • Some models have a "magic prompt" or auto-rewrite feature. When this is on, you can focus on describing just the edited region. When it's off, describe the whole scene.
  • Describing only the masked region makes the model emphasize the prompt more, which can produce better results for targeted edits
  • ControlNet-style conditioning (edge detection, depth maps) helps preserve structure during generation
  • 局部重绘:框选需要编辑的区域,然后输入填充内容的提示词
  • 部分模型具备“魔法提示词”或自动重写功能。开启该功能时,你只需描述编辑区域;关闭时,则需描述整个场景
  • 仅描述框选区域会让模型更聚焦于提示词,针对局部编辑可能产生更好的效果
  • ControlNet风格的条件控制(边缘检测、深度图)有助于在生成过程中保留结构

Multi-image and storyboard generation

多图像与故事板生成

Some models can generate multiple related images in a single prompt.
  • Ask for "a series," "a set," or specify a grid layout (e.g., "2x2 storyboard grid")
  • Describe each panel individually with consistent character descriptions
  • Maintain consistent style and character continuity by repeating exact descriptions
  • Some models support example-based editing: show a before/after pair for one image, then apply the same transformation to others
部分模型可通过单个提示词生成多张相关图像。
  • 请求生成“系列图”、“套装图”,或指定网格布局(如“2x2故事板网格”)
  • 用一致的角色描述分别描述每个面板
  • 通过重复精确描述来保持风格和角色一致性
  • 部分模型支持基于示例的编辑:展示一张图像的前后对比,再将相同转换应用到其他图像

Product photography and commercial work

产品摄影与商业应用

  • Specify materials precisely: "brushed steel," "matte aluminum," "kraft paper," "frosted glass"
  • Describe lighting setup: "soft diffused studio lighting, crisp highlights and gentle shadows"
  • For brand assets and icons, look for models that produce native SVG output (real editable vector files)
  • For layouts with branding and copy placement, look for models with strong typography and design composition
  • 精准指定材质:“拉丝钢”、“哑光铝”、“牛皮纸”、“磨砂玻璃”
  • 描述灯光设置:“柔和漫射工作室灯光,高光清晰,阴影柔和”
  • 制作品牌资产和图标时,选择能生成原生SVG输出(可编辑的真实矢量文件)的模型
  • 制作包含品牌标识和文案布局的内容时,选择具备出色排版和设计构图能力的模型

Fine-tuning and LoRAs

微调与LoRAs

  • Use trigger words from your trained model in every prompt
  • When combining multiple LoRAs, balance their influence with scale parameters (typically 0.9-1.1)
  • Generate synthetic training data: generate many images, pick the best, retrain
  • Use consistent-character workflows to generate training data from a single reference image
  • 在每个提示词中使用训练模型的触发词
  • 组合多个LoRAs时,用缩放参数(通常为0.9-1.1)平衡它们的影响
  • 生成合成训练数据:生成多张图像,挑选最佳作品进行再训练
  • 使用角色一致性工作流,从单张参考图像生成训练数据

Common pitfalls

常见误区

  1. Keyword-stuffed prompts: Modern models respond better to natural language sentences than comma-separated keyword lists. Write like you're describing a scene, not tagging a photo.
  2. Using "transform" when you want a small edit: "Transform the person into a Viking" may swap the entire identity. Use targeted language: "change her outfit to Viking armor, keeping her face and expression unchanged."
  3. Not specifying what to keep: When editing, always say what should stay the same. Without explicit instructions, models may change anything.
  4. Negative prompts on models not trained for them: Some models were not trained with negative prompts. Using them on these models introduces noise rather than removing unwanted elements. Check the model's documentation.
  5. Too-high guidance scale (CFG): If images look "burnt" with excessive contrast, lower the guidance scale. Each model has a recommended range.
  6. Expecting real-time knowledge: No image model has internet access. Some have strong world knowledge baked in from training data, but it's not live.
  7. Short prompts for complex scenes: Modern models accept thousands of tokens. For complex compositions with many specific requirements, use that capacity.
  8. Ignoring aspect ratio: Most models have specific resolutions they work best at (commonly ~1 megapixel). Going too large produces edge artifacts. Going too small produces harsh crops. Use the model's recommended aspect ratios.
  9. Wrong model for the task: Not every model is good at every task. Some excel at text rendering but struggle with object removal. Some are great at style transfer but poor at background editing. If a model struggles with a specific edit type, try a different one rather than fighting the prompt. See the compare-models skill for guidance.
  10. Not iterating: The best results come from iterative workflows. Make a small change, evaluate, refine, repeat. Don't try to get everything right in a single generation.
  1. 堆砌关键词的提示词:现代模型对自然语言句子的响应优于逗号分隔的关键词列表。像描述场景一样撰写提示词,而非像给照片打标签。
  2. 需要小修改时使用“transform”:“将人物变为维京人”可能会完全替换身份。使用精准表述:“将她的服装改为维京盔甲,保持面部和表情不变。”
  3. 未指定需要保留的元素:编辑时务必说明哪些内容应保持不变。没有明确指令的话,模型可能随意修改任何内容。
  4. 在不支持负提示词的模型上使用负提示词:部分模型未经过负提示词训练。在这些模型上使用负提示词会引入噪点而非去除 unwanted 元素。请查看模型文档。
  5. 引导尺度(CFG)过高:如果图像对比度过度、看起来“过曝”,降低引导尺度。每个模型都有推荐的参数范围。
  6. 期望实时知识:没有图像模型具备互联网访问能力。部分模型在训练数据中积累了丰富的世界知识,但并非实时更新。
  7. 用短提示词描述复杂场景:现代模型支持数千个token。对于有多项特定要求的复杂构图,请充分利用这一容量。
  8. 忽略宽高比:大多数模型有其最适配的分辨率(通常约100万像素)。分辨率过高会产生边缘瑕疵,过低则会导致生硬裁剪。使用模型推荐的宽高比。
  9. 任务与模型不匹配:并非所有模型擅长所有任务。部分模型擅长文字渲染但难以处理物体移除;部分模型擅长风格迁移但不擅长背景编辑。如果某个模型在特定编辑任务上表现不佳,尝试更换其他模型而非反复调整提示词。请查看模型对比技能文档获取指引。
  10. 未进行迭代优化:最佳效果来自迭代工作流。做小幅度修改,评估效果,优化提示词,重复此过程。不要期望一次生成就能达到理想效果。

Sources

资料来源