prompt-engineer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Prompt Engineer

提示词工程师

Overview

概述

This skill covers systematic design and iteration of prompts for large language models (LLMs). It applies proven techniques — zero-shot, few-shot learning, chain-of-thought reasoning, role prompting, structured output constraints, system prompt design, and multi-step prompt chaining — to improve accuracy, consistency, and reliability of LLM outputs. The skill is applicable to any LLM API (OpenAI, Anthropic, Gemini, Mistral, open-source models) and covers both single-turn and multi-turn conversation design, as well as production-grade prompt templates with variable injection.

本技能涵盖大语言模型（LLM）提示词的系统化设计与迭代。它应用经过验证的技术——零样本、少样本学习、思维链推理、角色提示、结构化输出约束、系统提示词设计以及多步骤提示词链——来提升LLM输出的准确性、一致性和可靠性。该技能适用于所有LLM API（OpenAI、Anthropic、Gemini、Mistral以及开源模型），涵盖单轮和多轮对话设计，以及支持变量注入的生产级提示词模板。

When to Use

适用场景

A prompt produces inconsistent, vague, or off-format outputs and needs iteration
Designing prompts that must return structured JSON, XML, Markdown, or CSV output
Building few-shot examples to guide classification, extraction, or transformation tasks
Creating system prompts that establish persona, tone, constraints, or output rules
Chaining multiple prompts together for complex multi-step reasoning tasks
Reducing hallucination by adding grounding instructions, citations requirements, or self-checks
Optimizing a prompt for a specific model (GPT-4, Claude, Llama, etc.) given its strengths
Converting a vague user request into a precise, production-ready prompt template

提示词输出不一致、模糊或格式不符，需要迭代优化
设计需返回结构化JSON、XML、Markdown或CSV格式的提示词
构建少样本示例以指导分类、提取或转换任务
创建用于设定角色、语气、约束或输出规则的系统提示词
将多个提示词链接起来完成复杂的多步骤推理任务
通过添加基础指令、引用要求或自我检查来减少幻觉输出
根据目标模型（GPT-4、Claude、Llama等）的优势优化提示词
将模糊的用户请求转换为精准的生产级提示词模板

When NOT to Use

不适用场景

Fine-tuning or training a model on new data (use model training skills)
Evaluating model quality across a benchmark suite (use eval-designer skill)
Writing application code that calls the LLM API (use a coding skill)
Comparing different LLMs for a use case (use model-comparator skill)
RAG pipeline design (retrieval-augmented generation requires its own architecture skill)

在新数据上微调或训练模型（请使用模型训练类技能）
跨基准套件评估模型质量（请使用eval-designer技能）
编写调用LLM API的应用代码（请使用编码类技能）
针对特定用例对比不同LLM（请使用model-comparator技能）
RAG管道设计（检索增强生成需要专用的架构类技能）

Quick Reference

快速参考

Task	Approach
Get consistent structured output	Add explicit format spec + JSON schema example in the prompt
Improve reasoning accuracy	Use chain-of-thought: "Think step by step before answering"
Classify text reliably	Provide 2–3 labeled few-shot examples per class
Set model persona and constraints	Write a detailed system prompt before the user turn
Handle long complex tasks	Break into a prompt chain with intermediate outputs
Reduce hallucinations	Instruct model to cite sources or say "I don't know" explicitly
Make outputs deterministic	Lower temperature + explicit format constraints

任务	方法
获取一致的结构化输出	在提示词中添加明确的格式规范+JSON schema示例
提升推理准确性	使用思维链：“在回答前逐步思考”
可靠地分类文本	为每个类别提供2-3个带标签的少样本示例
设定模型角色与约束	在用户轮次前编写详细的系统提示词
处理长复杂任务	将其拆分为带中间输出的提示词链
减少幻觉输出	指示模型引用来源或明确说“我不知道”
让输出具有确定性	降低温度参数+添加明确的格式约束

Instructions

操作步骤

Define the task precisely — Write a one-sentence task definition: what the model must do, what the input is, and what the output must look like. Vague tasks produce vague outputs. Example: "Extract all company names and their associated revenue figures from the following earnings call transcript and return them as a JSON array."
Choose the prompting technique — Select based on task complexity:
- Zero-shot: Simple tasks with clear instructions. No examples needed.
- Few-shot: Classification, formatting, or style tasks. Provide 2–5 labeled examples.
- Chain-of-thought (CoT): Math, logic, multi-step reasoning. Add "Think step by step."
- Role prompting: Tasks requiring expertise or persona. "You are a senior tax attorney…"
- Self-consistency: Run the same CoT prompt N times and majority-vote the answer.
- Prompt chaining: Decompose a complex task into sequential prompts where each feeds the next.
Write the system prompt — For APIs that support system prompts (OpenAI, Anthropic), put persistent instructions here: role, output format, constraints, what to do when uncertain. Keep it under 500 tokens unless the task genuinely requires more context.
Structure the user prompt — Use clear delimiters to separate instructions from input data. Use XML tags (
```
<document>
```
,
```
<query>
```
), triple backticks, or
```
---
```
separators. Place instructions before the data, not after.
Specify output format explicitly — Tell the model exactly what format to use. If JSON, provide the schema or a filled example. If Markdown, show the heading structure. If a list, show how items should be formatted. Include a negative example if there is a common wrong format to avoid.
Add few-shot examples — For classification or extraction tasks, include 2–5 examples in the prompt. Format them identically to the real input/output pair. Choose examples that cover edge cases and are representative of the real distribution.
Iterate and test — Test on at least 10 representative inputs. Track: did the model follow the format? Did it hallucinate? Was the reasoning correct? Identify failure patterns and add instructions or examples to address them.
Version and document the prompt — Save prompts in a template file with variable placeholders (
```
{{input}}
```
). Document what model version it was tested on, what temperature, and what the expected pass rate is.
Optimize for the target model — Different models respond differently: Claude prefers XML tags and explicit role instructions; GPT-4 responds well to numbered instructions; open-source models often need more explicit format constraints. Test the same prompt on the target model even if it worked on another.
Add safety and fallback instructions — Include: what to do if the input is out of scope, how to handle ambiguous inputs, whether to ask for clarification or make a best-effort attempt, and how to indicate low confidence.

精准定义任务 — 用一句话定义任务：模型必须做什么、输入是什么、输出格式是什么。模糊的任务会产生模糊的输出。示例：“从以下财报电话会议记录中提取所有公司名称及其相关收入数据，并以JSON数组形式返回。”
选择提示词技术 — 根据任务复杂度选择：
- 零样本（Zero-shot）：简单且指令清晰的任务，无需示例。
- 少样本（Few-shot）：分类、格式化或风格类任务，提供2-5个带标签的示例。
- 思维链（Chain-of-thought, CoT）：数学、逻辑、多步骤推理任务，添加“逐步思考”指令。
- 角色提示（Role prompting）：需要专业知识或特定角色的任务，比如“你是一名资深税务律师……”
- 自一致性（Self-consistency）：对同一个思维链提示词运行N次，取多数投票结果作为答案。
- 提示词链（Prompt chaining）：将复杂任务分解为连续的提示词，每个提示词的输出作为下一个的输入。
编写系统提示词 — 对于支持系统提示词的API（OpenAI、Anthropic），在此处添加持久化指令：角色、输出格式、约束、不确定时的处理方式。除非任务确实需要更多上下文，否则请控制在500token以内。
结构化用户提示词 — 使用清晰的分隔符区分指令和输入数据，比如XML标签（
```
<document>
```
、
```
<query>
```
）、三重反引号或
```
---
```
分隔符。将指令放在数据之前，而非之后。
明确指定输出格式 — 准确告知模型要使用的格式。如果是JSON，提供schema或填充好的示例；如果是Markdown，展示标题结构；如果是列表，展示项目格式。如果有常见的错误格式需要避免，可以添加负面示例。
添加少样本示例 — 对于分类或提取任务，在提示词中包含2-5个示例，格式要与真实输入输出对完全一致。选择能覆盖边缘情况且代表真实数据分布的示例。
迭代与测试 — 至少用10个有代表性的输入进行测试。跟踪：模型是否遵循格式？是否产生幻觉？推理是否正确？识别失败模式并添加指令或示例来解决问题。
版本化并记录提示词 — 将提示词保存到带变量占位符（
```
{{input}}
```
）的模板文件中。记录测试时使用的模型版本、温度参数以及预期通过率。
针对目标模型优化 — 不同模型的响应不同：Claude偏好XML标签和明确的角色指令；GPT-4对编号指令响应良好；开源模型通常需要更明确的格式约束。即使提示词在其他模型上有效，也要在目标模型上测试。
添加安全与回退指令 — 包含：输入超出范围时的处理方式、如何处理模糊输入、是请求澄清还是尽力尝试，以及如何表示低置信度。

Examples

示例

Example 1: Improving a Vague Prompt to Get Structured JSON Output

示例1：优化模糊提示词以获取结构化JSON输出

Input:

Original vague prompt: "Summarize this customer review and tell me if it's positive or negative."

Output:

Improved prompt (system + user):

SYSTEM:
You are a sentiment analysis assistant. You always respond with valid JSON only — no prose, no markdown fences. If you cannot determine sentiment, set "sentiment" to "unknown" and explain in "reason".

USER:
Analyze the following customer review. Return a JSON object with exactly these fields:
{
  "sentiment": "positive" | "negative" | "neutral" | "unknown",
  "confidence": 0.0–1.0,
  "key_themes": ["theme1", "theme2"],
  "summary": "one sentence summary",
  "reason": "brief explanation of sentiment classification"
}

Review:
<review>
{{customer_review_text}}
</review>

Why this works:

System prompt enforces JSON-only output at the model level
Explicit schema with types eliminates ambiguity
XML tag; the model from treating the review text as instructions
```
"unknown"
```
fallback; hallucinated confident answers
```
confidence
```
field enables downstream filtering of low-quality classifications

输入：

原始模糊提示词：“总结这条客户评价，并告诉我它是正面还是负面的。”

输出：

优化后的提示词（系统+用户）：

SYSTEM:
你是一名情感分析助手。你始终只返回有效的JSON——不返回散文、不返回Markdown围栏。如果无法确定情感，将"sentiment"设为"unknown"并在"reason"中说明。

USER:
分析以下客户评价。返回一个包含以下字段的JSON对象：
{
  "sentiment": "positive" | "negative" | "neutral" | "unknown",
  "confidence": 0.0–1.0,
  "key_themes": ["theme1", "theme2"],
  "summary": "一句话总结",
  "reason": "情感分类的简要说明"
}

评价：
<review>
{{customer_review_text}}
</review>

为何有效：

系统提示词在模型层面强制要求仅输出JSON
带类型的明确schema消除了歧义
XML标签分隔符防止模型将评价文本视为指令
"unknown"回退避免了产生虚假的高置信度答案
"confidence"字段支持下游过滤低质量分类结果

Example 2: Few-Shot Chain-of-Thought Prompt for Legal Clause Classification

示例2：用于法律条款分类的少样本思维链提示词