agent-workflow
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMANDATORY — Context Gathering Protocol
强制要求 — 上下文收集协议
Before applying any workflow guidance, gather context:
-
Check forin the project root
.maestro.md- If it exists → read it and use the workflow context within
- If it doesn't exist → tell the user: "No workflow context found. Run {{command_prefix}}teach-maestro to set up project-specific context for better results."
-
Minimum viable context (if no):
.maestro.md- What AI model(s) are being used?
- What is the workflow's primary task?
- Are there existing prompts, tools, or agents to work with?
- What are the quality/speed/cost priorities?
-
DO NOT proceed without at least understanding the model, task, and priorities.
在应用任何工作流指南之前,请先收集上下文:
-
检查项目根目录下是否存在
.maestro.md- 如果存在 → 读取该文件并使用其中的工作流上下文
- 如果不存在 → 告知用户:"未找到工作流上下文。运行{{command_prefix}}teach-maestro设置项目专属上下文以获得更好的结果。"
-
最小可用上下文(如果没有):
.maestro.md- 当前正在使用哪些AI模型?
- 工作流的核心任务是什么?
- 是否有现有的提示词、工具或Agent可供使用?
- 质量/速度/成本的优先级如何排序?
-
在至少明确模型、任务和优先级之前,请勿继续操作。
Maestro — AI Agent Workflow Mastery
Maestro — AI Agent工作流精通指南
This skill provides the foundational knowledge for designing, building, and maintaining
production-grade AI agent workflows. All Maestro commands build on these principles.
本技能提供设计、构建和维护生产级AI Agent工作流所需的基础知识,所有Maestro命令都基于这些原则构建。
Core Principles
核心原则
- Structure over improvisation — Workflows should be deliberate, not emergent
- Constraints are features — Explicit boundaries prevent failure modes
- Measure, don't assume — Every workflow needs evaluation, not just testing
- Appropriate complexity — Match the solution to the problem, not the ambition
- Graceful degradation — Every component should fail safely
- 结构化优先于即兴发挥 — 工作流应经过精心设计,而非自发形成
- 约束即特性 — 明确的边界可避免故障模式
- 衡量而非假设 — 每个工作流都需要评估,而不仅仅是测试
- 复杂度适配 — 解决方案要匹配问题需求,而非盲目追求技术野心
- 优雅降级 — 每个组件都应具备安全故障能力
1. Prompt Engineering
1. 提示词工程
DO:
- Use structured prompts with clear sections (role, context, instructions, output format)
- Define output schemas explicitly (JSON schema, markdown template, typed response)
- Use few-shot examples for ambiguous tasks
- Chain-of-thought for multi-step reasoning
- Keep system prompts focused — one clear role per prompt
DON'T:
- Write wall-of-text prompts with no structure
- Assume the model understands implicit output format
- Use the same prompt for fundamentally different tasks
- Put conflicting instructions in the same prompt
- Rely on the model to "figure it out"
→ Consult prompt engineering reference for structure, patterns, and output schemas.
推荐做法:
- 使用结构清晰的提示词,明确区分角色、上下文、指令、输出格式等模块
- 明确定义输出 schema(JSON schema、markdown模板、类型化响应)
- 对模糊任务使用few-shot示例
- 多步推理任务使用Chain-of-thought引导
- 保持系统提示词聚焦 — 每个提示词对应一个明确角色
禁止做法:
- 编写无结构的大段提示词
- 假设模型能理解隐式的输出格式
- 对完全不同的任务使用同一个提示词
- 在同一个提示词中放置冲突的指令
- 依赖模型自行"搞清楚"需求
→ 参考提示词工程参考文档了解结构、模式和输出schema相关内容。
2. Context Management
2. 上下文管理
DO:
- Budget context window usage (system prompt, examples, user input, tool results, output)
- Place critical information at the start AND end of context (attention gradient)
- Use retrieval (RAG) instead of stuffing full documents
- Maintain conversation state explicitly
- Summarize long histories instead of passing raw transcripts
DON'T:
- Dump entire codebases, databases, or documents into context
- Ignore context window limits until you hit them
- Assume the model pays equal attention to all context
- Pass irrelevant information "just in case"
- Rely on implicit memory across turns
→ Consult context management reference for window optimization and memory patterns.
推荐做法:
- 做好上下文窗口使用预算(系统提示词、示例、用户输入、工具结果、输出)
- 将关键信息放在上下文的开头和结尾(符合注意力梯度特性)
- 优先使用检索(RAG)而非直接塞入完整文档
- 显式维护会话状态
- 总结长对话历史,而非直接传递原始对话记录
禁止做法:
- 将整个代码库、数据库或文档直接塞入上下文
- 直到触发上下文窗口限制才意识到超限问题
- 假设模型对所有上下文内容的注意力程度相同
- 为了"以防万一"塞入无关信息
- 依赖跨轮次的隐式记忆
→ 参考上下文管理参考文档了解窗口优化和内存模式相关内容。
3. Tool Orchestration
3. 工具编排
DO:
- Give tools clear, specific names and descriptions
- Define input/output schemas for every tool
- Handle tool errors gracefully (the tool WILL fail eventually)
- Keep tool sets focused — 3-7 tools per agent is ideal
- Make tools idempotent where possible
DON'T:
- Expose 30+ tools and hope the model picks the right one
- Use vague tool descriptions ("does stuff with data")
- Skip error handling in tool implementations
- Let tools have side effects without confirmation for destructive operations
- Create tools that overlap in functionality
→ Consult tool orchestration reference for selection heuristics and composition patterns.
推荐做法:
- 为工具提供清晰、具体的名称和描述
- 为每个工具定义输入/输出schema
- 优雅处理工具错误(工具迟早会出现故障)
- 保持工具集聚焦 — 每个Agent搭配3-7个工具是理想状态
- 尽可能让工具支持幂等
禁止做法:
- 暴露30+个工具并寄希望于模型能选对
- 使用模糊的工具描述(比如"处理数据相关的内容")
- 工具实现中跳过错误处理
- 破坏性操作的工具在无确认的情况下产生副作用
- 创建功能重叠的工具
→ 参考工具编排参考文档了解选择启发式和组合模式相关内容。
4. Agent Architecture
4. Agent架构
DO:
- Start with a single agent — add agents only when a single agent demonstrably fails
- Define clear boundaries and responsibilities for each agent
- Use structured handoff protocols between agents
- Implement supervisor patterns for multi-agent systems
- Design for observability — log agent decisions, not just outputs
DON'T:
- Build multi-agent systems for problems a single agent handles
- Create agents without clear boundaries (overlapping responsibilities = conflicts)
- Use unstructured communication between agents
- Skip the supervisor — autonomous agent swarms are unpredictable
- Assume agents will coordinate without explicit protocols
→ Consult agent architecture reference for topology patterns and delegation.
推荐做法:
- 从单个Agent开始搭建 — 只有当单个Agent明确无法满足需求时再增加Agent数量
- 为每个Agent定义清晰的边界和职责
- 在Agent之间使用结构化的交接协议
- 多Agent系统中实现监督者模式
- 设计可观测能力 — 记录Agent的决策过程,而不仅仅是输出结果
禁止做法:
- 针对单个Agent就能解决的问题搭建多Agent系统
- 创建没有明确边界的Agent(职责重叠=冲突)
- Agent之间使用非结构化的通信方式
- 跳过监督者设计 — 自主Agent集群的行为是不可预测的
- 假设Agent在没有明确协议的情况下也能自行协调
→ 参考Agent架构参考文档了解拓扑模式和委托相关内容。
5. Feedback Loops
5. 反馈循环
DO:
- Build evaluation into the workflow from day one
- Create golden test sets with known-good inputs and outputs
- Use automated evaluators for consistent quality scoring
- Track regression — compare new outputs against baselines
- Implement self-correction loops for critical outputs
DON'T:
- Ship without evaluation ("it seems to work" is not evaluation)
- Rely solely on human review at scale
- Use the same model to evaluate its own output without structure
- Skip regression testing when changing prompts or models
- Conflate "the model ran without errors" with "the output is correct"
→ Consult feedback loops reference for evaluation patterns and self-correction.
推荐做法:
- 从第一天起就把评估能力内置到工作流中
- 创建包含已知正确输入输出的黄金测试集
- 使用自动化评估器实现一致的质量评分
- 跟踪回归问题 — 将新输出与基线结果做对比
- 对关键输出实现自我校正循环
禁止做法:
- 未做评估就上线("看起来能用"不等于通过评估)
- 大规模场景下完全依赖人工审核
- 无结构化规则的情况下让同一个模型评估自己的输出
- 修改提示词或模型时跳过回归测试
- 把"模型运行无报错"等同于"输出正确"
→ 参考反馈循环参考文档了解评估模式和自我校正相关内容。
6. Knowledge Systems
6. 知识系统
DO:
- Choose retrieval strategy based on query type (semantic, keyword, hybrid)
- Chunk documents thoughtfully (semantic boundaries, not arbitrary token counts)
- Include source attribution in every retrieved result
- Test retrieval quality independently of generation quality
- Version your knowledge base — know what the model has access to
DON'T:
- Build RAG without testing retrieval quality first
- Use fixed chunk sizes for all document types
- Skip source attribution (hallucination without attribution is undetectable)
- Index everything without curation (garbage in = garbage out)
- Assume embedding similarity equals relevance
→ Consult knowledge systems reference for RAG, embeddings, and grounding.
推荐做法:
-
根据查询类型选择检索策略(语义、关键词、混合)
-
合理拆分文档块(按照语义边界拆分,而非任意 token 数)
-
每个检索结果都包含来源标注
-
独立测试检索质量,和生成质量分开评估
-
对知识库做版本管理 — 明确模型可访问的内容范围
禁止做法:
- 未先测试检索质量就搭建RAG系统
- 对所有文档类型使用固定的块大小
- 跳过来源标注(没有来源标注的幻觉无法被检测)
- 未经筛选就索引所有内容(垃圾进=垃圾出)
- 假设embedding相似度就等于相关性
→ 参考知识系统参考文档了解RAG、embedding和事实 grounding相关内容。
7. Guardrails & Safety
7. 护栏与安全
DO:
- Validate inputs before processing (schema validation, size limits)
- Filter outputs for sensitive content, PII, and policy violations
- Set hard cost ceilings (max tokens, max API calls, max spend per run)
- Implement circuit breakers for cascading failures
- Log everything for audit trails
DON'T:
- Deploy without input validation (prompt injection is real)
- Trust model output without verification for high-stakes decisions
- Run without cost controls (one runaway loop can cost thousands)
- Skip rate limiting on external API calls
- Assume the model will follow safety instructions 100% of the time
→ Consult guardrails reference for validation, sandboxing, and constraints.
推荐做法:
- 处理前先验证输入(schema验证、大小限制)
- 过滤输出中的敏感内容、PII和违规内容
- 设置硬性成本上限(最大token数、最大API调用次数、单次运行最大开销)
- 为级联故障实现熔断机制
- 记录所有操作留作审计 trail
禁止做法:
- 未做输入验证就上线(提示注入是真实存在的风险)
- 高风险决策场景下不做验证就信任模型输出
- 无成本控制运行(一个失控循环可能造成数千美元损失)
- 外部API调用跳过限流配置
- 假设模型会100%遵守安全指令
→ 参考护栏参考文档了解验证、沙箱和约束相关内容。
The Workflow Slop Test
工作流质量校验
If any of these are true, the workflow needs work:
- Prompts are unstructured walls of text → run
/refine - No output schema defined — model decides the format → run
/refine - Context window used without budget — everything stuffed in → run
/accelerate - More than 10 tools exposed to a single agent → run
/streamline - No error handling — happy path only → run
/fortify - No evaluation — "it seems to work" → run
/iterate - Multi-agent system for a single-agent problem → run
/temper - No cost controls — unbounded token usage → run
/guard - Tools have vague one-line descriptions → run
/calibrate - No logging — can't debug production issues → run
/fortify
Zero checked = production-ready. 3+ checked = workflow slop.
如果满足以下任意一项,说明工作流需要优化:
- 提示词是无结构的大段文本 → 运行
/refine - 未定义输出schema — 由模型决定输出格式 → 运行
/refine - 无预算规划使用上下文窗口 — 所有内容都直接塞入 → 运行
/accelerate - 单个Agent暴露超过10个工具 → 运行
/streamline - 无错误处理 — 只覆盖happy path → 运行
/fortify - 无评估机制 — 全靠"看起来能用" → 运行
/iterate - 单Agent就能解决的问题用了多Agent系统 → 运行
/temper - 无成本控制 — token使用无上限 → 运行
/guard - 工具只有模糊的单行描述 → 运行
/calibrate - 无日志记录 — 生产问题无法调试 → 运行
/fortify
0项勾选 = 生产就绪。3项及以上勾选 = 工作流质量不合格。
Available Commands
可用命令
Use these commands to apply specific aspects of workflow mastery:
{{available_commands}}
使用以下命令应用工作流精通指南的特定功能:
{{available_commands}}