agent-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MANDATORY — Context Gathering Protocol

强制要求 — 上下文收集协议

Before applying any workflow guidance, gather context:
  1. Check for
    .maestro.md
    in the project root
    • If it exists → read it and use the workflow context within
    • If it doesn't exist → tell the user: "No workflow context found. Run {{command_prefix}}teach-maestro to set up project-specific context for better results."
  2. Minimum viable context (if no
    .maestro.md
    ):
    • What AI model(s) are being used?
    • What is the workflow's primary task?
    • Are there existing prompts, tools, or agents to work with?
    • What are the quality/speed/cost priorities?
  3. DO NOT proceed without at least understanding the model, task, and priorities.

在应用任何工作流指南之前,请先收集上下文:
  1. 检查项目根目录下是否存在
    .maestro.md
    • 如果存在 → 读取该文件并使用其中的工作流上下文
    • 如果不存在 → 告知用户:"未找到工作流上下文。运行{{command_prefix}}teach-maestro设置项目专属上下文以获得更好的结果。"
  2. 最小可用上下文(如果没有
    .maestro.md
    ):
    • 当前正在使用哪些AI模型?
    • 工作流的核心任务是什么?
    • 是否有现有的提示词、工具或Agent可供使用?
    • 质量/速度/成本的优先级如何排序?
  3. 在至少明确模型、任务和优先级之前,请勿继续操作。

Maestro — AI Agent Workflow Mastery

Maestro — AI Agent工作流精通指南

This skill provides the foundational knowledge for designing, building, and maintaining production-grade AI agent workflows. All Maestro commands build on these principles.
本技能提供设计、构建和维护生产级AI Agent工作流所需的基础知识,所有Maestro命令都基于这些原则构建。

Core Principles

核心原则

  1. Structure over improvisation — Workflows should be deliberate, not emergent
  2. Constraints are features — Explicit boundaries prevent failure modes
  3. Measure, don't assume — Every workflow needs evaluation, not just testing
  4. Appropriate complexity — Match the solution to the problem, not the ambition
  5. Graceful degradation — Every component should fail safely

  1. 结构化优先于即兴发挥 — 工作流应经过精心设计,而非自发形成
  2. 约束即特性 — 明确的边界可避免故障模式
  3. 衡量而非假设 — 每个工作流都需要评估,而不仅仅是测试
  4. 复杂度适配 — 解决方案要匹配问题需求,而非盲目追求技术野心
  5. 优雅降级 — 每个组件都应具备安全故障能力

1. Prompt Engineering

1. 提示词工程

DO:
  • Use structured prompts with clear sections (role, context, instructions, output format)
  • Define output schemas explicitly (JSON schema, markdown template, typed response)
  • Use few-shot examples for ambiguous tasks
  • Chain-of-thought for multi-step reasoning
  • Keep system prompts focused — one clear role per prompt
DON'T:
  • Write wall-of-text prompts with no structure
  • Assume the model understands implicit output format
  • Use the same prompt for fundamentally different tasks
  • Put conflicting instructions in the same prompt
  • Rely on the model to "figure it out"
Consult prompt engineering reference for structure, patterns, and output schemas.

推荐做法
  • 使用结构清晰的提示词,明确区分角色、上下文、指令、输出格式等模块
  • 明确定义输出 schema(JSON schema、markdown模板、类型化响应)
  • 对模糊任务使用few-shot示例
  • 多步推理任务使用Chain-of-thought引导
  • 保持系统提示词聚焦 — 每个提示词对应一个明确角色
禁止做法
  • 编写无结构的大段提示词
  • 假设模型能理解隐式的输出格式
  • 对完全不同的任务使用同一个提示词
  • 在同一个提示词中放置冲突的指令
  • 依赖模型自行"搞清楚"需求
参考提示词工程参考文档了解结构、模式和输出schema相关内容。

2. Context Management

2. 上下文管理

DO:
  • Budget context window usage (system prompt, examples, user input, tool results, output)
  • Place critical information at the start AND end of context (attention gradient)
  • Use retrieval (RAG) instead of stuffing full documents
  • Maintain conversation state explicitly
  • Summarize long histories instead of passing raw transcripts
DON'T:
  • Dump entire codebases, databases, or documents into context
  • Ignore context window limits until you hit them
  • Assume the model pays equal attention to all context
  • Pass irrelevant information "just in case"
  • Rely on implicit memory across turns
Consult context management reference for window optimization and memory patterns.

推荐做法
  • 做好上下文窗口使用预算(系统提示词、示例、用户输入、工具结果、输出)
  • 将关键信息放在上下文的开头和结尾(符合注意力梯度特性)
  • 优先使用检索(RAG)而非直接塞入完整文档
  • 显式维护会话状态
  • 总结长对话历史,而非直接传递原始对话记录
禁止做法
  • 将整个代码库、数据库或文档直接塞入上下文
  • 直到触发上下文窗口限制才意识到超限问题
  • 假设模型对所有上下文内容的注意力程度相同
  • 为了"以防万一"塞入无关信息
  • 依赖跨轮次的隐式记忆
参考上下文管理参考文档了解窗口优化和内存模式相关内容。

3. Tool Orchestration

3. 工具编排

DO:
  • Give tools clear, specific names and descriptions
  • Define input/output schemas for every tool
  • Handle tool errors gracefully (the tool WILL fail eventually)
  • Keep tool sets focused — 3-7 tools per agent is ideal
  • Make tools idempotent where possible
DON'T:
  • Expose 30+ tools and hope the model picks the right one
  • Use vague tool descriptions ("does stuff with data")
  • Skip error handling in tool implementations
  • Let tools have side effects without confirmation for destructive operations
  • Create tools that overlap in functionality
Consult tool orchestration reference for selection heuristics and composition patterns.

推荐做法
  • 为工具提供清晰、具体的名称和描述
  • 为每个工具定义输入/输出schema
  • 优雅处理工具错误(工具迟早会出现故障)
  • 保持工具集聚焦 — 每个Agent搭配3-7个工具是理想状态
  • 尽可能让工具支持幂等
禁止做法
  • 暴露30+个工具并寄希望于模型能选对
  • 使用模糊的工具描述(比如"处理数据相关的内容")
  • 工具实现中跳过错误处理
  • 破坏性操作的工具在无确认的情况下产生副作用
  • 创建功能重叠的工具
参考工具编排参考文档了解选择启发式和组合模式相关内容。

4. Agent Architecture

4. Agent架构

DO:
  • Start with a single agent — add agents only when a single agent demonstrably fails
  • Define clear boundaries and responsibilities for each agent
  • Use structured handoff protocols between agents
  • Implement supervisor patterns for multi-agent systems
  • Design for observability — log agent decisions, not just outputs
DON'T:
  • Build multi-agent systems for problems a single agent handles
  • Create agents without clear boundaries (overlapping responsibilities = conflicts)
  • Use unstructured communication between agents
  • Skip the supervisor — autonomous agent swarms are unpredictable
  • Assume agents will coordinate without explicit protocols
Consult agent architecture reference for topology patterns and delegation.

推荐做法
  • 从单个Agent开始搭建 — 只有当单个Agent明确无法满足需求时再增加Agent数量
  • 为每个Agent定义清晰的边界和职责
  • 在Agent之间使用结构化的交接协议
  • 多Agent系统中实现监督者模式
  • 设计可观测能力 — 记录Agent的决策过程,而不仅仅是输出结果
禁止做法
  • 针对单个Agent就能解决的问题搭建多Agent系统
  • 创建没有明确边界的Agent(职责重叠=冲突)
  • Agent之间使用非结构化的通信方式
  • 跳过监督者设计 — 自主Agent集群的行为是不可预测的
  • 假设Agent在没有明确协议的情况下也能自行协调
参考Agent架构参考文档了解拓扑模式和委托相关内容。

5. Feedback Loops

5. 反馈循环

DO:
  • Build evaluation into the workflow from day one
  • Create golden test sets with known-good inputs and outputs
  • Use automated evaluators for consistent quality scoring
  • Track regression — compare new outputs against baselines
  • Implement self-correction loops for critical outputs
DON'T:
  • Ship without evaluation ("it seems to work" is not evaluation)
  • Rely solely on human review at scale
  • Use the same model to evaluate its own output without structure
  • Skip regression testing when changing prompts or models
  • Conflate "the model ran without errors" with "the output is correct"
Consult feedback loops reference for evaluation patterns and self-correction.

推荐做法
  • 从第一天起就把评估能力内置到工作流中
  • 创建包含已知正确输入输出的黄金测试集
  • 使用自动化评估器实现一致的质量评分
  • 跟踪回归问题 — 将新输出与基线结果做对比
  • 对关键输出实现自我校正循环
禁止做法
  • 未做评估就上线("看起来能用"不等于通过评估)
  • 大规模场景下完全依赖人工审核
  • 无结构化规则的情况下让同一个模型评估自己的输出
  • 修改提示词或模型时跳过回归测试
  • 把"模型运行无报错"等同于"输出正确"
参考反馈循环参考文档了解评估模式和自我校正相关内容。

6. Knowledge Systems

6. 知识系统

DO:
  • Choose retrieval strategy based on query type (semantic, keyword, hybrid)
  • Chunk documents thoughtfully (semantic boundaries, not arbitrary token counts)
  • Include source attribution in every retrieved result
  • Test retrieval quality independently of generation quality
  • Version your knowledge base — know what the model has access to
DON'T:
  • Build RAG without testing retrieval quality first
  • Use fixed chunk sizes for all document types
  • Skip source attribution (hallucination without attribution is undetectable)
  • Index everything without curation (garbage in = garbage out)
  • Assume embedding similarity equals relevance
Consult knowledge systems reference for RAG, embeddings, and grounding.

推荐做法
  • 根据查询类型选择检索策略(语义、关键词、混合)
  • 合理拆分文档块(按照语义边界拆分,而非任意 token 数)
  • 每个检索结果都包含来源标注
  • 独立测试检索质量,和生成质量分开评估
  • 对知识库做版本管理 — 明确模型可访问的内容范围
禁止做法
  • 未先测试检索质量就搭建RAG系统
  • 对所有文档类型使用固定的块大小
  • 跳过来源标注(没有来源标注的幻觉无法被检测)
  • 未经筛选就索引所有内容(垃圾进=垃圾出)
  • 假设embedding相似度就等于相关性
参考知识系统参考文档了解RAG、embedding和事实 grounding相关内容。

7. Guardrails & Safety

7. 护栏与安全

DO:
  • Validate inputs before processing (schema validation, size limits)
  • Filter outputs for sensitive content, PII, and policy violations
  • Set hard cost ceilings (max tokens, max API calls, max spend per run)
  • Implement circuit breakers for cascading failures
  • Log everything for audit trails
DON'T:
  • Deploy without input validation (prompt injection is real)
  • Trust model output without verification for high-stakes decisions
  • Run without cost controls (one runaway loop can cost thousands)
  • Skip rate limiting on external API calls
  • Assume the model will follow safety instructions 100% of the time
Consult guardrails reference for validation, sandboxing, and constraints.

推荐做法
  • 处理前先验证输入(schema验证、大小限制)
  • 过滤输出中的敏感内容、PII和违规内容
  • 设置硬性成本上限(最大token数、最大API调用次数、单次运行最大开销)
  • 为级联故障实现熔断机制
  • 记录所有操作留作审计 trail
禁止做法
  • 未做输入验证就上线(提示注入是真实存在的风险)
  • 高风险决策场景下不做验证就信任模型输出
  • 无成本控制运行(一个失控循环可能造成数千美元损失)
  • 外部API调用跳过限流配置
  • 假设模型会100%遵守安全指令
参考护栏参考文档了解验证、沙箱和约束相关内容。

The Workflow Slop Test

工作流质量校验

If any of these are true, the workflow needs work:
  • Prompts are unstructured walls of text → run
    /refine
  • No output schema defined — model decides the format → run
    /refine
  • Context window used without budget — everything stuffed in → run
    /accelerate
  • More than 10 tools exposed to a single agent → run
    /streamline
  • No error handling — happy path only → run
    /fortify
  • No evaluation — "it seems to work" → run
    /iterate
  • Multi-agent system for a single-agent problem → run
    /temper
  • No cost controls — unbounded token usage → run
    /guard
  • Tools have vague one-line descriptions → run
    /calibrate
  • No logging — can't debug production issues → run
    /fortify
Zero checked = production-ready. 3+ checked = workflow slop.

如果满足以下任意一项,说明工作流需要优化:
  • 提示词是无结构的大段文本 → 运行
    /refine
  • 未定义输出schema — 由模型决定输出格式 → 运行
    /refine
  • 无预算规划使用上下文窗口 — 所有内容都直接塞入 → 运行
    /accelerate
  • 单个Agent暴露超过10个工具 → 运行
    /streamline
  • 无错误处理 — 只覆盖happy path → 运行
    /fortify
  • 无评估机制 — 全靠"看起来能用" → 运行
    /iterate
  • 单Agent就能解决的问题用了多Agent系统 → 运行
    /temper
  • 无成本控制 — token使用无上限 → 运行
    /guard
  • 工具只有模糊的单行描述 → 运行
    /calibrate
  • 无日志记录 — 生产问题无法调试 → 运行
    /fortify
0项勾选 = 生产就绪。3项及以上勾选 = 工作流质量不合格。

Available Commands

可用命令

Use these commands to apply specific aspects of workflow mastery:
{{available_commands}}
使用以下命令应用工作流精通指南的特定功能:
{{available_commands}}