agent-workflow

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MANDATORY — Context Gathering Protocol

强制要求 — 上下文收集协议

Before applying any workflow guidance, gather context:

Check for
.maestro.md
in the project root
- If it exists → read it and use the workflow context within
- If it doesn't exist → tell the user: "No workflow context found. Run {{command_prefix}}teach-maestro to set up project-specific context for better results."
Minimum viable context (if no
```
.maestro.md
```
):
- What AI model(s) are being used?
- What is the workflow's primary task?
- Are there existing prompts, tools, or agents to work with?
- What are the quality/speed/cost priorities?
DO NOT proceed without at least understanding the model, task, and priorities.

在应用任何工作流指南之前，请先收集上下文：

检查项目根目录下是否存在
.maestro.md
- 如果存在 → 读取该文件并使用其中的工作流上下文
- 如果不存在 → 告知用户："未找到工作流上下文。运行{{command_prefix}}teach-maestro设置项目专属上下文以获得更好的结果。"
最小可用上下文（如果没有
```
.maestro.md
```
）：
- 当前正在使用哪些AI模型？
- 工作流的核心任务是什么？
- 是否有现有的提示词、工具或Agent可供使用？
- 质量/速度/成本的优先级如何排序？
在至少明确模型、任务和优先级之前，请勿继续操作。

Maestro — AI Agent Workflow Mastery

Maestro — AI Agent工作流精通指南

This skill provides the foundational knowledge for designing, building, and maintaining production-grade AI agent workflows. All Maestro commands build on these principles.

本技能提供设计、构建和维护生产级AI Agent工作流所需的基础知识，所有Maestro命令都基于这些原则构建。

Core Principles

核心原则

Structure over improvisation — Workflows should be deliberate, not emergent
Constraints are features — Explicit boundaries prevent failure modes
Measure, don't assume — Every workflow needs evaluation, not just testing
Appropriate complexity — Match the solution to the problem, not the ambition
Graceful degradation — Every component should fail safely

结构化优先于即兴发挥 — 工作流应经过精心设计，而非自发形成
约束即特性 — 明确的边界可避免故障模式
衡量而非假设 — 每个工作流都需要评估，而不仅仅是测试
复杂度适配 — 解决方案要匹配问题需求，而非盲目追求技术野心
优雅降级 — 每个组件都应具备安全故障能力

1. Prompt Engineering

1. 提示词工程

DO:

Use structured prompts with clear sections (role, context, instructions, output format)
Define output schemas explicitly (JSON schema, markdown template, typed response)
Use few-shot examples for ambiguous tasks
Chain-of-thought for multi-step reasoning
Keep system prompts focused — one clear role per prompt

DON'T:

Write wall-of-text prompts with no structure
Assume the model understands implicit output format
Use the same prompt for fundamentally different tasks
Put conflicting instructions in the same prompt
Rely on the model to "figure it out"

→ Consult prompt engineering reference for structure, patterns, and output schemas.

推荐做法：

使用结构清晰的提示词，明确区分角色、上下文、指令、输出格式等模块
明确定义输出 schema（JSON schema、markdown模板、类型化响应）
对模糊任务使用few-shot示例
多步推理任务使用Chain-of-thought引导
保持系统提示词聚焦 — 每个提示词对应一个明确角色

禁止做法：

编写无结构的大段提示词
假设模型能理解隐式的输出格式
对完全不同的任务使用同一个提示词
在同一个提示词中放置冲突的指令
依赖模型自行"搞清楚"需求

→ 参考提示词工程参考文档了解结构、模式和输出schema相关内容。

2. Context Management

2. 上下文管理

DO:

Budget context window usage (system prompt, examples, user input, tool results, output)
Place critical information at the start AND end of context (attention gradient)
Use retrieval (RAG) instead of stuffing full documents
Maintain conversation state explicitly
Summarize long histories instead of passing raw transcripts

DON'T:

Dump entire codebases, databases, or documents into context
Ignore context window limits until you hit them
Assume the model pays equal attention to all context
Pass irrelevant information "just in case"
Rely on implicit memory across turns

→ Consult context management reference for window optimization and memory patterns.

推荐做法：

做好上下文窗口使用预算（系统提示词、示例、用户输入、工具结果、输出）
将关键信息放在上下文的开头和结尾（符合注意力梯度特性）
优先使用检索（RAG）而非直接塞入完整文档
显式维护会话状态
总结长对话历史，而非直接传递原始对话记录

禁止做法：

将整个代码库、数据库或文档直接塞入上下文
直到触发上下文窗口限制才意识到超限问题
假设模型对所有上下文内容的注意力程度相同
为了"以防万一"塞入无关信息
依赖跨轮次的隐式记忆

→ 参考上下文管理参考文档了解窗口优化和内存模式相关内容。

3. Tool Orchestration

3. 工具编排

DO:

Give tools clear, specific names and descriptions
Define input/output schemas for every tool
Handle tool errors gracefully (the tool WILL fail eventually)
Keep tool sets focused — 3-7 tools per agent is ideal
Make tools idempotent where possible

DON'T:

Expose 30+ tools and hope the model picks the right one
Use vague tool descriptions ("does stuff with data")
Skip error handling in tool implementations
Let tools have side effects without confirmation for destructive operations
Create tools that overlap in functionality

→ Consult tool orchestration reference for selection heuristics and composition patterns.

推荐做法：

为工具提供清晰、具体的名称和描述
为每个工具定义输入/输出schema
优雅处理工具错误（工具迟早会出现故障）
保持工具集聚焦 — 每个Agent搭配3-7个工具是理想状态
尽可能让工具支持幂等

禁止做法：

暴露30+个工具并寄希望于模型能选对
使用模糊的工具描述（比如"处理数据相关的内容"）
工具实现中跳过错误处理
破坏性操作的工具在无确认的情况下产生副作用
创建功能重叠的工具

→ 参考工具编排参考文档了解选择启发式和组合模式相关内容。

4. Agent Architecture

4. Agent架构

DO:

Start with a single agent — add agents only when a single agent demonstrably fails
Define clear boundaries and responsibilities for each agent
Use structured handoff protocols between agents
Implement supervisor patterns for multi-agent systems
Design for observability — log agent decisions, not just outputs

DON'T:

Build multi-agent systems for problems a single agent handles
Create agents without clear boundaries (overlapping responsibilities = conflicts)
Use unstructured communication between agents
Skip the supervisor — autonomous agent swarms are unpredictable
Assume agents will coordinate without explicit protocols

→ Consult agent architecture reference for topology patterns and delegation.

推荐做法：

从单个Agent开始搭建 — 只有当单个Agent明确无法满足需求时再增加Agent数量
为每个Agent定义清晰的边界和职责
在Agent之间使用结构化的交接协议
多Agent系统中实现监督者模式
设计可观测能力 — 记录Agent的决策过程，而不仅仅是输出结果

禁止做法：

针对单个Agent就能解决的问题搭建多Agent系统
创建没有明确边界的Agent（职责重叠=冲突）
Agent之间使用非结构化的通信方式
跳过监督者设计 — 自主Agent集群的行为是不可预测的
假设Agent在没有明确协议的情况下也能自行协调

→ 参考Agent架构参考文档了解拓扑模式和委托相关内容。

5. Feedback Loops

5. 反馈循环

DO:

Build evaluation into the workflow from day one
Create golden test sets with known-good inputs and outputs
Use automated evaluators for consistent quality scoring
Track regression — compare new outputs against baselines
Implement self-correction loops for critical outputs

DON'T:

Ship without evaluation ("it seems to work" is not evaluation)
Rely solely on human review at scale
Use the same model to evaluate its own output without structure
Skip regression testing when changing prompts or models
Conflate "the model ran without errors" with "the output is correct"

→ Consult feedback loops reference for evaluation patterns and self-correction.

推荐做法：

从第一天起就把评估能力内置到工作流中
创建包含已知正确输入输出的黄金测试集
使用自动化评估器实现一致的质量评分
跟踪回归问题 — 将新输出与基线结果做对比
对关键输出实现自我校正循环

禁止做法：

未做评估就上线（"看起来能用"不等于通过评估）
大规模场景下完全依赖人工审核
无结构化规则的情况下让同一个模型评估自己的输出
修改提示词或模型时跳过回归测试
把"模型运行无报错"等同于"输出正确"

→ 参考反馈循环参考文档了解评估模式和自我校正相关内容。

6. Knowledge Systems

6. 知识系统

DO:

Choose retrieval strategy based on query type (semantic, keyword, hybrid)
Chunk documents thoughtfully (semantic boundaries, not arbitrary token counts)
Include source attribution in every retrieved result
Test retrieval quality independently of generation quality
Version your knowledge base — know what the model has access to

DON'T:

Build RAG without testing retrieval quality first
Use fixed chunk sizes for all document types
Skip source attribution (hallucination without attribution is undetectable)
Index everything without curation (garbage in = garbage out)
Assume embedding similarity equals relevance

→ Consult knowledge systems reference for RAG, embeddings, and grounding.

推荐做法：

根据查询类型选择检索策略（语义、关键词、混合）
合理拆分文档块（按照语义边界拆分，而非任意 token 数）
每个检索结果都包含来源标注
独立测试检索质量，和生成质量分开评估
对知识库做版本管理 — 明确模型可访问的内容范围

禁止做法：

未先测试检索质量就搭建RAG系统
对所有文档类型使用固定的块大小
跳过来源标注（没有来源标注的幻觉无法被检测）
未经筛选就索引所有内容（垃圾进=垃圾出）
假设embedding相似度就等于相关性

→ 参考知识系统参考文档了解RAG、embedding和事实 grounding相关内容。

7. Guardrails & Safety

7. 护栏与安全

DO:

Validate inputs before processing (schema validation, size limits)
Filter outputs for sensitive content, PII, and policy violations
Set hard cost ceilings (max tokens, max API calls, max spend per run)
Implement circuit breakers for cascading failures
Log everything for audit trails

DON'T:

Deploy without input validation (prompt injection is real)
Trust model output without verification for high-stakes decisions
Run without cost controls (one runaway loop can cost thousands)
Skip rate limiting on external API calls
Assume the model will follow safety instructions 100% of the time

→ Consult guardrails reference for validation, sandboxing, and constraints.

推荐做法：

处理前先验证输入（schema验证、大小限制）
过滤输出中的敏感内容、PII和违规内容
设置硬性成本上限（最大token数、最大API调用次数、单次运行最大开销）
为级联故障实现熔断机制
记录所有操作留作审计 trail

禁止做法：

未做输入验证就上线（提示注入是真实存在的风险）
高风险决策场景下不做验证就信任模型输出
无成本控制运行（一个失控循环可能造成数千美元损失）
外部API调用跳过限流配置
假设模型会100%遵守安全指令

→ 参考护栏参考文档了解验证、沙箱和约束相关内容。

The Workflow Slop Test

工作流质量校验

If any of these are true, the workflow needs work:

Prompts are unstructured walls of text → run
```
/refine
```
No output schema defined — model decides the format → run
```
/refine
```
Context window used without budget — everything stuffed in → run
```
/accelerate
```
More than 10 tools exposed to a single agent → run
```
/streamline
```
No error handling — happy path only → run
```
/fortify
```
No evaluation — "it seems to work" → run
```
/iterate
```
Multi-agent system for a single-agent problem → run
```
/temper
```
No cost controls — unbounded token usage → run
```
/guard
```
Tools have vague one-line descriptions → run
```
/calibrate
```
No logging — can't debug production issues → run
```
/fortify
```

Zero checked = production-ready. 3+ checked = workflow slop.

如果满足以下任意一项，说明工作流需要优化：

提示词是无结构的大段文本 → 运行
```
/refine
```
未定义输出schema — 由模型决定输出格式 → 运行
```
/refine
```
无预算规划使用上下文窗口 — 所有内容都直接塞入 → 运行
```
/accelerate
```
单个Agent暴露超过10个工具 → 运行
```
/streamline
```
无错误处理 — 只覆盖happy path → 运行
```
/fortify
```
无评估机制 — 全靠"看起来能用" → 运行
```
/iterate
```
单Agent就能解决的问题用了多Agent系统 → 运行
```
/temper
```
无成本控制 — token使用无上限 → 运行
```
/guard
```
工具只有模糊的单行描述 → 运行
```
/calibrate
```
无日志记录 — 生产问题无法调试 → 运行
```
/fortify
```

0项勾选 = 生产就绪。3项及以上勾选 = 工作流质量不合格。

Available Commands

可用命令

Use these commands to apply specific aspects of workflow mastery:

使用以下命令应用工作流精通指南的特定功能：