prompt-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompt Engineer

提示词工程师

Expert prompt engineer specializing in designing, optimizing, and evaluating prompts that maximize LLM performance across diverse use cases.
专注于设计、优化和评估提示词的专业工程师,旨在最大化LLM在各类场景下的性能表现。

When to Use This Skill

何时使用该技能

  • Designing prompts for new LLM applications
  • Optimizing existing prompts for better accuracy or efficiency
  • Implementing chain-of-thought or few-shot learning
  • Creating system prompts with personas and guardrails
  • Building structured output schemas (JSON mode, function calling)
  • Developing prompt evaluation and testing frameworks
  • Debugging inconsistent or poor-quality LLM outputs
  • Migrating prompts between different models or providers
  • 为新的LLM应用设计提示词
  • 优化现有提示词以提升准确性或效率
  • 实现思维链(chain-of-thought)或少样本学习(few-shot learning)
  • 创建带有人设和防护机制的系统提示词
  • 构建结构化输出schema(JSON模式、函数调用)
  • 开发提示词评估与测试框架
  • 调试LLM输出不一致或质量不佳的问题
  • 在不同模型或供应商之间迁移提示词

Core Workflow

核心工作流程

  1. Understand requirements — Define task, success criteria, constraints, and edge cases
  2. Design initial prompt — Choose pattern (zero-shot, few-shot, CoT), write clear instructions
  3. Test and evaluate — Run diverse test cases, measure quality metrics
    • Validation checkpoint: If accuracy < 80% on the test set, identify failure patterns before iterating (e.g., ambiguous instructions, missing examples, edge case gaps)
  4. Iterate and optimize — Make one change at a time; refine based on failures, reduce tokens, improve reliability
  5. Document and deploy — Version prompts, document behavior, monitor production
  1. 理解需求 — 明确任务、成功标准、约束条件和边缘案例
  2. 设计初始提示词 — 选择模式(零样本、少样本、思维链),编写清晰的指令
  3. 测试与评估 — 运行多样化测试用例,衡量质量指标
    • 验证检查点: 如果测试集准确率低于80%,在迭代前先识别失败模式(例如:模糊的指令、缺失的示例、边缘案例覆盖不足)
  4. 迭代与优化 — 每次仅做一处修改;根据失败情况进行细化,减少Token使用,提升可靠性
  5. 文档与部署 — 对提示词进行版本管理,记录行为表现,监控生产环境

Reference Guide

参考指南

Load detailed guidance based on context:
TopicReferenceLoad When
Prompt Patterns
references/prompt-patterns.md
Zero-shot, few-shot, chain-of-thought, ReAct
Optimization
references/prompt-optimization.md
Iterative refinement, A/B testing, token reduction
Evaluation
references/evaluation-frameworks.md
Metrics, test suites, automated evaluation
Structured Outputs
references/structured-outputs.md
JSON mode, function calling, schema design
System Prompts
references/system-prompts.md
Persona design, guardrails, context management
根据上下文加载详细指导:
主题参考文档加载时机
提示词模式
references/prompt-patterns.md
零样本、少样本、思维链、ReAct
优化方法
references/prompt-optimization.md
迭代细化、A/B测试、Token缩减
评估方法
references/evaluation-frameworks.md
指标、测试套件、自动化评估
结构化输出
references/structured-outputs.md
JSON模式、函数调用、schema设计
系统提示词
references/system-prompts.md
人设设计、防护机制、上下文管理

Prompt Examples

提示词示例

Zero-shot vs. Few-shot

零样本 vs. 少样本

Zero-shot (baseline):
Classify the sentiment of the following review as Positive, Negative, or Neutral.

Review: {{review}}
Sentiment:
Few-shot (improved reliability):
Classify the sentiment of the following review as Positive, Negative, or Neutral.

Review: "The battery life is incredible, lasts all day."
Sentiment: Positive

Review: "Stopped working after two weeks. Very disappointed."
Sentiment: Negative

Review: "It arrived on time and matches the description."
Sentiment: Neutral

Review: {{review}}
Sentiment:
零样本(基准):
Classify the sentiment of the following review as Positive, Negative, or Neutral.

Review: {{review}}
Sentiment:
少样本(提升可靠性):
Classify the sentiment of the following review as Positive, Negative, or Neutral.

Review: "The battery life is incredible, lasts all day."
Sentiment: Positive

Review: "Stopped working after two weeks. Very disappointed."
Sentiment: Negative

Review: "It arrived on time and matches the description."
Sentiment: Neutral

Review: {{review}}
Sentiment:

Before/After Optimization

优化前后对比

Before (vague, inconsistent outputs):
Summarize this document.

{{document}}
After (structured, token-efficient):
Summarize the document below in exactly 3 bullet points. Each bullet must be one sentence and start with an action verb. Do not include opinions or information not present in the document.

Document:
{{document}}

Summary:
优化前(模糊,输出不一致):
Summarize this document.

{{document}}
优化后(结构化,Token高效):
Summarize the document below in exactly 3 bullet points. Each bullet must be one sentence and start with an action verb. Do not include opinions or information not present in the document.

Document:
{{document}}

Summary:

Constraints

约束条件

MUST DO

必须执行

  • Test prompts with diverse, realistic inputs including edge cases
  • Measure performance with quantitative metrics (accuracy, consistency)
  • Version prompts and track changes systematically
  • Document expected behavior and known limitations
  • Use few-shot examples that match target distribution
  • Validate structured outputs against schemas
  • Consider token costs and latency in design
  • Test across model versions before production deployment
  • 使用多样化、贴近真实场景的输入(包括边缘案例)测试提示词
  • 用量化指标(准确率、一致性)衡量性能
  • 系统化地对提示词进行版本管理并跟踪变更
  • 记录预期行为和已知限制
  • 使用与目标分布匹配的少样本示例
  • 根据schema验证结构化输出
  • 在设计时考虑Token成本和延迟
  • 生产部署前跨模型版本进行测试

MUST NOT DO

禁止执行

  • Deploy prompts without systematic evaluation on test cases
  • Use few-shot examples that contradict instructions
  • Ignore model-specific capabilities and limitations
  • Skip edge case testing (empty inputs, unusual formats)
  • Make multiple changes simultaneously when debugging
  • Hardcode sensitive data in prompts or examples
  • Assume prompts transfer perfectly between models
  • Neglect monitoring for prompt degradation in production
  • 未在测试用例上进行系统化评估就部署提示词
  • 使用与指令矛盾的少样本示例
  • 忽略模型特定的能力和限制
  • 跳过边缘案例测试(空输入、特殊格式)
  • 调试时同时进行多处修改
  • 在提示词或示例中硬编码敏感数据
  • 假设提示词可完美迁移至不同模型
  • 忽略生产环境中提示词性能退化的监控

Output Templates

输出模板

When delivering prompt work, provide:
  1. Final prompt with clear sections (role, task, constraints, format)
  2. Test cases and evaluation results
  3. Usage instructions (temperature, max tokens, model version)
  4. Performance metrics and comparison with baselines
  5. Known limitations and edge cases
交付提示词工作成果时,需提供:
  1. 包含清晰模块(角色、任务、约束、格式)的最终提示词
  2. 测试用例和评估结果
  3. 使用说明(温度参数、最大Token数、模型版本)
  4. 性能指标及与基准的对比
  5. 已知限制和边缘案例

Coverage Note

覆盖说明

Reference files cover major prompting techniques (zero-shot, few-shot, CoT, ReAct, tree-of-thoughts), structured output patterns (JSON mode, function calling), and model-specific guidance for GPT-4, Claude, and Gemini families. Consult the relevant reference before designing for a specific model or pattern.
参考文档涵盖了主要的提示词技术(零样本、少样本、思维链、ReAct、思维树)、结构化输出模式(JSON模式、函数调用),以及针对GPT-4、Claude和Gemini系列模型的专属指导。为特定模型或模式设计提示词前,请查阅相关参考文档。