tech-prompt-engineering
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProduction Prompt Engineering
生产环境提示词工程
Overview
概述
This skill addresses the failure modes that appear ONLY in production LLM applications: prompt injection, output format drift, silent regression across model versions, instruction decay in long contexts, and hallucination under pressure. It is NOT a tutorial on few-shot or chain-of-thought — assume the agent already knows basic prompting techniques.
该技能仅针对生产环境LLM应用中出现的故障模式:提示词注入、输出格式漂移、跨模型版本的隐性退化、长上下文场景下的指令失效以及高压场景下的幻觉问题。这不是关于少样本(few-shot)或思维链(CoT)的教程——默认Agent已掌握基础提示词技巧。
When to Use
适用场景
Trigger conditions:
- A production LLM feature is misbehaving (inconsistent, unsafe, format-drifting)
- Designing a system prompt for a multi-tenant application
- Hardening prompts against injection or jailbreak attempts
- Diagnosing regression after a model version update
When NOT to use:
- Basic "how do I write a prompt" — the agent already knows few-shot, CoT, role-play
- One-off content generation (just write the prompt directly)
- RAG architecture design (use a RAG-specific skill)
触发条件:
- 生产环境中的LLM功能表现异常(输出不一致、不安全、格式漂移)
- 为多租户应用设计系统提示词
- 强化提示词以抵御注入或越狱攻击
- 诊断模型版本更新后的性能退化问题
不适用场景:
- 基础的「如何编写提示词」问题——默认Agent已掌握few-shot、CoT、角色扮演等技巧
- 一次性内容生成(直接编写提示词即可)
- RAG架构设计(使用RAG专属技能)
Framework
框架
IRON LAW: Treat User Input as Hostile by Default
In production, user input WILL be used to attempt prompt injection.
The only reliable defense is structural separation:
1. System prompt carries ALL rules and behavior (never trust user input to override)
2. User input is NEVER concatenated directly into instructions
3. Output is validated against an expected schema BEFORE being used downstream
A prompt that works in dev with clean input will fail in production with adversarial input.IRON LAW: Treat User Input as Hostile by Default
In production, user input WILL be used to attempt prompt injection.
The only reliable defense is structural separation:
1. System prompt carries ALL rules and behavior (never trust user input to override)
2. User input is NEVER concatenated directly into instructions
3. Output is validated against an expected schema BEFORE being used downstream
A prompt that works in dev with clean input will fail in production with adversarial input.Production Failure Modes
生产环境故障模式
| Failure Mode | Observable Symptom | Root Cause | Fix |
|---|---|---|---|
| Prompt injection | User input overrides system instructions | Instructions concatenated with untrusted input | Structural separation: use ChatML roles; validate outputs against schema; never use "ignore previous instructions" susceptible templates |
| Format drift | JSON response breaks 1/1000 calls | Model temperature > 0 + unconstrained output | Constrained decoding (JSON mode, grammar), schema validation + retry, lower temperature |
| Instruction decay | Rules followed early, ignored after N turns | Long context pushes system prompt out of attention | Reinforce critical rules in EACH user message; use model's native tool/system role; shorter contexts |
| Silent regression | Same prompt, worse output after model update | Provider updated model weights | Pin model version; maintain regression test suite; A/B test before rolling upgrades |
| Hallucination under pressure | Model invents facts when uncertain | No explicit "I don't know" escape hatch | Add "If uncertain, respond with {null}. Do not guess." + grounding constraint |
| Cross-model portability | Works on GPT-4, fails on Claude/Gemini | Model-specific prompt conventions | Test on all target models; avoid model-specific jailbreaks; use common-denominator patterns |
| 故障模式 | 可观测症状 | 根本原因 | 修复方案 |
|---|---|---|---|
| 提示词注入(Prompt injection) | 用户输入覆盖系统指令 | 指令与不可信输入直接拼接 | 结构化隔离:使用ChatML角色;基于Schema验证输出;绝不使用易受"忽略之前的指令"攻击的模板 |
| 格式漂移(Format drift) | JSON响应每1000次调用会出现1次异常 | 模型温度(temperature)>0 + 输出未受约束 | 约束解码(JSON模式、语法限制);基于Schema验证并重试;降低温度值 |
| 指令失效(Instruction decay) | 初期遵循规则,多轮对话后忽略规则 | 长上下文导致系统提示词超出注意力范围 | 在每一条用户消息中强化关键规则;使用模型原生的工具/系统角色;缩短上下文长度 |
| 隐性退化(Silent regression) | 相同提示词,模型更新后输出质量下降 | 服务商更新了模型权重 | 固定模型版本;维护退化测试套件;升级前进行A/B测试 |
| 高压场景幻觉(Hallucination under pressure) | 模型在不确定时编造事实 | 未设置明确的「我不知道」退出机制 | 添加「若不确定,返回{null}。请勿猜测。」+ 事实锚定约束 |
| 跨模型兼容性问题(Cross-model portability) | 在GPT-4上正常工作,在Claude/Gemini上失效 | 模型专属的提示词约定 | 在所有目标模型上测试;避免模型专属的越狱技巧;使用通用模式 |
Methodology
方法论
Phase 1: Reproduce the Failure
阶段1:复现故障
Collect: exact input, exact output, expected output, model + version, temperature. Reproduce in isolation (outside the app) to rule out application bugs.
Gate: Failure reproduces consistently in a minimal test case.
收集:精确输入、精确输出、预期输出、模型+版本、温度值。在隔离环境(应用外)复现,以排除应用本身的bug。
验收标准: 故障在最小测试用例中可稳定复现。
Phase 2: Classify the Failure Mode
阶段2:分类故障模式
Match against the table above. Most production failures fall into one of 6 categories. Don't guess — identify which mode applies.
Gate: Failure mode classified with evidence.
与上述表格匹配。大多数生产环境故障属于6类中的一种。请勿猜测——明确识别对应的故障模式。
验收标准: 故障模式已结合证据完成分类。
Phase 3: Apply the Targeted Fix
阶段3:应用针对性修复
Fix the SPECIFIC failure mode. Don't rewrite the whole prompt. Generic rewrites often introduce new failure modes.
Gate: Fix addresses root cause, not symptom.
修复特定的故障模式。不要重写整个提示词。通用改写往往会引入新的故障模式。
验收标准: 修复方案针对根本原因,而非仅解决表面症状。
Phase 4: Build a Regression Test
阶段4:构建退化测试
Add the failing case to a regression test suite. Run the suite before every prompt change or model version update.
Gate: Test suite catches the original failure AND any reintroduction.
将故障案例添加到退化测试套件中。在每次修改提示词或更新模型版本前运行该套件。
验收标准: 测试套件可捕获原始故障以及故障复发的情况。
Output Format
输出格式
markdown
undefinedmarkdown
undefinedPrompt Debug Report: {Feature Name}
提示词调试报告: {功能名称}
Failure Reproduction
故障复现
- Input: {exact input}
- Observed: {what happened}
- Expected: {what should have happened}
- Model: {name + version + temperature}
- 输入: {精确输入}
- 观测结果: {实际发生情况}
- 预期结果: {应发生情况}
- 模型: {名称 + 版本 + 温度值}
Failure Mode
故障模式
{One of: injection, format drift, instruction decay, silent regression, hallucination, cross-model}
{以下之一:注入、格式漂移、指令失效、隐性退化、幻觉、跨模型}
Root Cause
根本原因
{Specific mechanism, not generic "prompt was bad"}
{具体机制,而非笼统的「提示词质量差」}
Fix
修复方案
{Targeted change with before/after prompt diff}
{针对性修改,包含修改前后的提示词差异}
Regression Test
退化测试
{Test case added to prevent reintroduction}
undefined{添加的测试用例,防止故障复发}
undefinedGotchas
注意事项
- "Ignore previous instructions" is only the beginning: Modern injection uses role-play ("Pretend you are DAN..."), language switching, Unicode tricks, and encoded payloads. Defense requires input validation AND output validation, not just instruction phrasing.
- Temperature 0 is not deterministic across calls: Even at T=0, outputs can vary across API calls due to backend GPU non-determinism (batch effects). Don't rely on exact string equality in tests; use semantic or schema equality.
- Few-shot examples override your instructions: If your examples show 500-word responses and you say "be concise", the model follows the examples. Examples are STRONGER than instructions.
- System prompts are NOT absolute: Even with a system prompt, sufficiently adversarial user input can override behavior. System prompts are a strong hint, not a security boundary. For real security, use output validation and sandboxing.
- Provider model updates are silent: OpenAI's "gpt-4" alias changes weights without notice. Pin to dated versions (gpt-4-0613) for stability. Rerun regression tests after every update.
- Context window size ≠ effective context: A 128K context model may only attend well to the first 32K and last 4K. Put critical instructions at START and END, not in the middle ("lost in the middle" effect).
- 「忽略之前的指令」只是开始:现代注入攻击使用角色扮演(「假装你是DAN...」)、语言切换、Unicode技巧和编码 payload。防御需要输入验证+输出验证,而非仅依赖指令措辞。
- 温度值为0不代表跨调用完全确定:即使T=0,由于后端GPU的非确定性(批量效应),API调用间的输出仍可能存在差异。测试中不要依赖精确字符串匹配;使用语义或Schema匹配。
- 少样本示例的优先级高于指令:如果示例显示500字的回复,而你要求「简洁」,模型会遵循示例。示例的优先级远高于指令。
- 系统提示词并非绝对安全:即使有系统提示词,足够恶意的用户输入仍可能覆盖行为。系统提示词是强提示,而非安全边界。如需真正的安全性,使用输出验证和沙箱机制。
- 服务商的模型更新是静默的:OpenAI的「gpt-4」别名会在无通知的情况下更改权重。为了稳定性,固定到带日期的版本(如gpt-4-0613)。每次更新后重新运行退化测试。
- 上下文窗口大小≠有效上下文:128K上下文的模型可能仅能较好地关注前32K和最后4K内容。将关键指令放在开头和结尾,而非中间(「中间丢失」效应)。
References
参考资料
- For prompt injection attack patterns, see
references/injection-patterns.md - For regression testing frameworks, see
references/regression-testing.md - For cross-model prompt portability, see
references/cross-model-testing.md
- 关于提示词注入攻击模式,参见
references/injection-patterns.md - 关于退化测试框架,参见
references/regression-testing.md - 关于跨模型提示词兼容性,参见
references/cross-model-testing.md