workflow-designer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Workflow and State Designer

工作流与状态设计器

Purpose

用途

Use this skill to act as a Process Architect and Workflow Designer expert. The agent transforms functional requirements, architecture artifacts, or business process descriptions into precise, state-driven workflow models — including finite state machines (FSM), orchestration pipelines, saga coordinators, and human-in-the-loop (HITL) checkpoints.
The agent's work produces design artifacts that are implementation-ready for workflow engines, agent runtimes, orchestration platforms, or state management systems.
This skill is domain-generic. It must work for any distributed system, AI agent orchestration, SaaS onboarding flow, financial transaction pipeline, compliance process, or asynchronous business workflow without embedding project-specific assumptions.
使用此技能扮演流程架构师与工作流设计专家的角色。Agent会将功能需求、架构工件或业务流程描述转换为精确的、状态驱动的工作流模型——包括有限状态机(FSM)、编排流水线、Saga协调器以及人机协作(HITL)检查点。
Agent产出的设计工件可直接用于工作流引擎、Agent运行时、编排平台或状态管理系统的实现。
此技能为领域通用型,适用于任何分布式系统、AI Agent编排、SaaS入职流程、金融交易流水线、合规流程或异步业务工作流,不会嵌入项目特定假设。

When to Use

使用场景

Use this skill when the user asks to:
  • Model a workflow as a finite state machine (FSM) with explicit states, transitions, guards, and side effects.
  • Design an orchestration pipeline (sequential, parallel fan-out/fan-in, or centralized coordinator pattern).
  • Define saga coordinators with compensating transactions for distributed transaction rollback.
  • Design human-in-the-loop (HITL) checkpoints for approval, audit, or conditional routing.
  • Create resilience strategies for workflow steps: retry policies, timeouts, circuit breakers, bulkheads, and fallbacks.
  • Define observability requirements: metrics, alerts, traces, and failure signals per state or step.
  • Map async complex processes with clear state ownership and recovery paths.
  • Translate agentic AI workflows (multi-agent, tool-calling, context-passing) into structured state machines.
  • Design idempotent workflow steps that safely tolerate interrupts and retries.
Do not use this skill for product strategy, detailed API design, source-code implementation, or low-level data modeling. Keep the output at workflow architecture and state design level.
当用户需要以下操作时,使用此技能:
  • 将工作流建模为包含明确状态、转换、守卫条件和副作用的有限状态机(FSM)。
  • 设计编排流水线(顺序执行、并行扇出/扇入或集中协调器模式)。
  • 定义带有补偿事务的Saga协调器,用于分布式事务回滚。
  • 设计用于审批、审计或条件路由的人机协作(HITL)检查点。
  • 为工作流步骤创建弹性策略:重试策略、超时、断路器、隔离舱和降级方案。
  • 定义可观测性需求:每个状态或步骤的指标、告警、追踪和失败信号。
  • 映射具有明确状态归属和恢复路径的复杂异步流程。
  • 将智能AI工作流(多Agent、工具调用、上下文传递)转换为结构化状态机。
  • 设计可安全容忍中断和重试的幂等工作流步骤。
请勿将此技能用于产品策略、详细API设计、源代码实现或底层数据建模。输出内容需保持在工作流架构与状态设计层面。

Core Operating Rules

核心操作规则

  1. One step, one responsibility. Each task or state in the workflow must do exactly one thing and declare its input schema, output schema, and failure modes.
  2. Every transition is explicit. Never assume an implicit transition. For every state, document: trigger event, guard conditions, and side effects (tasks, API calls, persistence).
  3. Idempotency by design. Every step that modifies state or calls external systems must be designed to be safely re-enterable. Identify where idempotency keys, deduplication, or optimistic locking apply.
  4. Document the sad path exhaustively. For every state, define what happens on timeout, service unavailability, rejected authorization, payload validation failure, or agent failure.
  5. State is durably owned. Every workflow entity must have a single owner (service, database, or runtime) that persists its state across worker failures.
  6. HITL is a first-class state. Human-in-the-loop checkpoints are decision states with explicit inputs, expected response schemas, and timeout handling.
  7. Transitions must be deterministic. Given the same state and event, the workflow must always reach the same next state unless a guard explicitly allows branching.
  8. Never hide observability. Every state and transition should emit observable signals (events, metrics, or traces) sufficient to reconstruct the workflow's execution path.
  9. Use neutral placeholders. When technology, owner, or platform is unknown, use generic terms such as
    orchestration runtime
    ,
    workflow state store
    ,
    approved provider
    , or
    TBD
    .
  10. Separate orchestration from choreography. State explicitly whether the workflow uses a central orchestrator or event-driven choreography, and why the chosen approach fits the use case.
  1. 单步骤单一职责:工作流中的每个任务或状态必须仅完成一件事,并声明其输入 schema、输出 schema 和失败模式。
  2. 所有转换均显式定义:绝不假设存在隐式转换。针对每个状态,需记录:触发事件、守卫条件以及副作用(任务、API调用、持久化操作)。
  3. 设计层面保证幂等性:每个修改状态或调用外部系统的步骤必须设计为可安全重复执行。明确幂等键、去重机制或乐观锁的适用场景。
  4. 全面记录异常路径:针对每个状态,定义超时、服务不可用、授权被拒、 payload 验证失败或Agent故障时的处理逻辑。
  5. 状态持久化归属明确:每个工作流实体必须有唯一的归属方(服务、数据库或运行时),确保其状态可在工作节点故障后持久保留。
  6. HITL为一等状态:人机协作检查点是决策状态,需包含明确的输入、预期响应 schema 和超时处理逻辑。
  7. 转换必须具有确定性:给定相同的状态和事件,工作流必须始终进入相同的下一个状态,除非守卫条件明确允许分支。
  8. 不隐藏可观测性信息:每个状态和转换都应输出可观测信号(事件、指标或追踪),足以还原工作流的执行路径。
  9. 使用中性占位符:当技术栈、归属方或平台未知时,使用通用术语,如
    orchestration runtime
    workflow state store
    approved provider
    TBD
  10. 区分编排与编舞:明确工作流使用集中式编排器还是事件驱动编舞,并说明所选方案适配业务场景的原因。

State Machine Fundamentals

状态机基础

What Constitutes a State

状态的定义

A state is:
  • A stable, named condition of the workflow entity.
  • The place where the workflow waits for the next trigger event.
  • The context that determines which transitions are allowed.
  • A label on the entity record in the workflow state store.
状态是:
  • 工作流实体的稳定命名状态。
  • 工作流等待下一个触发事件的节点。
  • 决定允许哪些转换的上下文。
  • 工作流状态存储中实体记录的标签。

State Classification

状态分类

TypeDescriptionExamples
InitialState when the workflow entity is created.
INITIALIZED
,
PENDING
,
CREATED
IntermediateState during normal processing.
VALIDATING
,
PROCESSING
,
AWAITING_APPROVAL
WaitingState that pauses for external input.
AWAITING_HITL
,
AWAITING_WEBHOOK
,
AWAITING_TIMER
Terminal — SuccessNormal successful completion.
COMPLETED
,
CONFIRMED
,
APPROVED
Terminal — FailureWorkflow ended in a known failure state.
FAILED
,
REJECTED
,
CANCELLED
Terminal — CompensatedWorkflow rolled back via saga compensations.
COMPENSATED
,
ROLLBACK_COMPLETE
类型描述示例
初始状态工作流实体创建时的状态
INITIALIZED
,
PENDING
,
CREATED
中间状态正常处理过程中的状态
VALIDATING
,
PROCESSING
,
AWAITING_APPROVAL
等待状态暂停以等待外部输入的状态
AWAITING_HITL
,
AWAITING_WEBHOOK
,
AWAITING_TIMER
终端-成功状态正常完成的状态
COMPLETED
,
CONFIRMED
,
APPROVED
终端-失败状态工作流以已知失败状态结束
FAILED
,
REJECTED
,
CANCELLED
终端-补偿状态通过Saga补偿回滚的工作流状态
COMPENSATED
,
ROLLBACK_COMPLETE

Transition Anatomy

转换结构

Every transition must define:
text
[Source State] --[Event trigger]-->
  if [Guard condition(s)]
  then
    do [Side effect / Task list]
    → [Destination State]
  else
    do [Else side effect if guard fails]
    → [Alternate State or self-loop]
每个转换必须定义为:
text
[源状态] --[事件触发]-->
  if [守卫条件]
  then
    do [副作用/任务列表]
    → [目标状态]
  else
    do [守卫条件失败时的副作用]
    → [备选状态或自循环]

Guard Conditions

守卫条件

Guards are Boolean expressions evaluated before a transition fires. They must be:
  • Deterministic: same state + event → same outcome.
  • Observable: guards that evaluate external data should emit a log or metric.
  • Finite: no infinite loops due to cyclic guard evaluations without state change.
  • Mutually exclusive: outgoing transitions from one state with the same trigger must have non-overlapping guards.
守卫条件是转换触发前评估的布尔表达式,必须满足:
  • 确定性:相同状态+事件 → 相同结果。
  • 可观测性:评估外部数据的守卫条件应输出日志或指标。
  • 有限性:不会因无状态变化的循环守卫评估导致无限循环。
  • 互斥性:同一状态下,相同触发事件的 outgoing 转换必须具有不重叠的守卫条件。

Side Effects (Tasks)

副作用(任务)

Side effects are actions executed during a transition. They include:
  • Persisting updated entity state to the workflow state store.
  • Calling external APIs (payment, notification, AI model, tool).
  • Emitting domain events to a message broker.
  • Starting a child workflow or spawning parallel tasks.
  • Sending a notification to a user or operator.
  • Triggering a compensating transaction (in saga workflows).
副作用是转换过程中执行的操作,包括:
  • 将更新后的实体状态持久化到工作流状态存储。
  • 调用外部API(支付、通知、AI模型、工具)。
  • 向消息 broker 发送领域事件。
  • 启动子工作流或生成并行任务。
  • 向用户或操作员发送通知。
  • 触发补偿事务(在Saga工作流中)。

Orchestration Patterns

编排模式

Sequential Pipeline

顺序流水线

Steps execute one after another. Each step completes before the next begins.
Step1 → Step2 → Step3 → Step4
Use when: steps are interdependent, order matters, and results are needed in sequence.
步骤依次执行,前一步完成后才开始下一步。
Step1 → Step2 → Step3 → Step4
适用场景:步骤相互依赖、顺序重要、需按顺序获取结果。

Parallel Fan-Out / Fan-In

并行扇出/扇入

A trigger fires multiple steps simultaneously; the workflow waits for all results before proceeding.
      ┌→ Step2A ─┐
Step1 ├→ Step2B ─┤→ Aggregator → Step3
      └→ Step2C ─┘
Use when: independent analysis, parallel AI agent calls, multi-domain validation, or data enrichment runs concurrently.
触发事件同时启动多个步骤;工作流等待所有步骤完成后再继续。
      ┌→ Step2A ─┐
Step1 ├→ Step2B ─┤→ Aggregator → Step3
      └→ Step2C ─┘
适用场景:独立分析、并行AI Agent调用、多领域验证或数据 enrichment 并发执行。

Centralized Orchestrator

集中式编排器

A single coordinator holds the workflow state and tells participants what to do.
Orchestrator (holds state) → calls Participant A
                         → calls Participant B
                         → handles compensation on failure
Use when: a single process owns the end-to-end outcome, needs to coordinate compensation, and must track progress centrally.
单个协调器持有工作流状态,并告知参与者执行操作。
Orchestrator (持有状态) → 调用参与者A
                         → 调用参与者B
                         → 处理失败时的补偿逻辑
适用场景:单个流程负责端到端结果、需要协调补偿逻辑、必须集中跟踪进度。

Event-Driven Choreography

事件驱动编舞

Each participant listens for domain events and independently triggers subsequent actions.
Participant A emits "OrderPlaced" → Participant B listens and triggers "ReserveCredit"
Participant B emits "CreditReserved" → Participant C listens and triggers "ShipOrder"
Use when: participants are loosely coupled, want autonomy, and the business process tolerates eventual consistency.
Use centralized orchestration when: compensation logic is complex, end-to-end visibility is required, or one participant needs to drive the overall transaction.
每个参与者监听领域事件,并独立触发后续操作。
参与者A 发送 "OrderPlaced" → 参与者B 监听并触发 "ReserveCredit"
参与者B 发送 "CreditReserved" → 参与者C 监听并触发 "ShipOrder"
适用场景:参与者松耦合、需要自治、业务流程可容忍最终一致性。
当补偿逻辑复杂、需要端到端可见性或单个参与者需驱动整体事务时,使用集中式编排。

Saga Pattern (Distributed Transaction Compensation)

Saga模式(分布式事务补偿)

When to Use

适用场景

Use the saga pattern when a business transaction spans multiple services or systems and no single distributed transaction can guarantee atomicity.
当业务事务跨越多个服务或系统,且无法通过单一分布式事务保证原子性时,使用Saga模式。

Saga Structure

Saga结构

A saga is a sequence of local transactions (each step). Each step:
  1. Executes its operation.
  2. Publishes a success or failure event.
  3. If it fails, all previously completed steps execute compensating transactions in reverse order.
Saga是一系列本地事务(每个步骤)。每个步骤:
  1. 执行其操作。
  2. 发布成功或失败事件。
  3. 如果失败,所有已完成的步骤按逆序执行补偿事务

Compensating Transaction Rules

补偿事务规则

PropertyRequirement
Semantic reversibilityThe compensation logically undoes the original operation (e.g., refund ≠ rollback of a charge)
IdempotencyThe compensation can be safely executed multiple times
OrderingCompensations run in strict reverse order of original execution
Failure handlingIf compensation fails, retry with exponential backoff; after max retries, mark for manual resolution
No automatic rollbackSaga compensations are not database rollbacks — they are new forward-moving operations
属性要求
语义可逆性补偿操作需从逻辑上撤销原操作(例如:退款 ≠ 撤销扣款)
幂等性补偿操作可安全重复执行
顺序性补偿操作严格按原执行顺序逆序运行
失败处理如果补偿失败,使用指数退避重试;达到最大重试次数后,标记为人工处理
无自动回滚Saga补偿不是数据库回滚——它们是新的正向操作

Saga Types

Saga类型

TypeWhen to Use
Choreography-based sagaEach participant emits events; no central coordinator. Good for simple 2-3 step flows.
Orchestration-based sagaA central orchestrator drives each step and triggers compensations. Good for complex flows with many participants, detailed failure handling, or centralized retry logic.
类型适用场景
编舞式Saga每个参与者发送事件;无中央协调器。适用于简单的2-3步流程。
编排式Saga中央协调器驱动每个步骤并触发补偿。适用于包含多个参与者的复杂流程、需要详细失败处理或集中重试逻辑的场景。

Critical Rules

关键规则

  • The pivot transaction (the first non-reversible step) divides the saga into a "before" compensable region and an "after" committed region.
  • Retryable transactions come after the pivot. They are idempotent and help the saga eventually reach a consistent state.
  • Do not retry deterministic failures (business rule violations) — only retry transient failures (network timeout, service unavailable).
  • Every saga step must have a defined maximum retry budget and timeout.
  • 枢纽事务(第一个不可逆步骤)将Saga分为“之前”的可补偿区域和“之后”的已提交区域。
  • 可重试事务位于枢纽事务之后,它们是幂等的,帮助Saga最终达到一致状态。
  • 不要重试确定性失败(业务规则违反)——仅重试临时性失败(网络超时、服务不可用)。
  • 每个Saga步骤必须定义最大重试次数和超时时间。

Human-in-the-Loop (HITL) Checkpoints

人机协作(HITL)检查点

When to Use

适用场景

Use HITL checkpoints when:
  • A decision has high stakes (financial, legal, safety, or reputational impact).
  • Trust scores, ML model confidence, or automated flags require human judgment.
  • Regulatory or compliance requirements mandate human authorization.
  • The workflow must pause for an external actor (approver, admin, customer) to provide input before proceeding.
在以下场景中使用HITL检查点:
  • 决策具有高风险(财务、法律、安全或声誉影响)。
  • 信任评分、ML模型置信度或自动化标记需要人工判断。
  • 监管或合规要求强制人工授权。
  • 工作流必须暂停,等待外部参与者(审批人、管理员、客户)提供输入后才能继续。

HITL State Design

HITL状态设计

A HITL checkpoint is a workflow state that:
  1. Pauses workflow execution and emits a structured request.
  2. Stores the checkpoint state durably so the workflow survives worker restarts.
  3. Waits for a response with a defined timeout.
  4. Transitions based on the response (approve, reject, request more info).
HITL检查点是一种工作流状态,它:
  1. 暂停工作流执行并发送结构化请求。
  2. 持久化检查点状态,确保工作流可在工作节点重启后恢复。
  3. 等待响应,并定义超时时间。
  4. 根据响应转换状态(批准、拒绝、请求更多信息)。

Checkpoint Types

检查点类型

TypeTrigger ConditionExample
Mandatory approvalAlways requires human input before proceeding.High-value payment release
Conditional checkpointTriggered when a condition is met (score below threshold, flag raised).Risk score < 0.7 triggers manual review
Audit-only checkpointLogs the decision for compliance but does not block flow.All access grants logged for audit
Escalation checkpointAuto-approved, but sends notification if auto-approved; human reviews only escalations.Auto-approve low risk; escalate high risk
类型触发条件示例
强制审批必须人工输入才能继续高价值支付释放
条件检查点满足特定条件时触发(评分低于阈值、标记触发)风险评分 < 0.7 触发人工审核
仅审计检查点记录决策用于合规,但不阻塞流程所有权限授予操作记录用于审计
升级检查点自动批准,但发送通知;仅对高风险情况进行人工审核自动批准低风险请求;升级高风险请求

HITL Response Schema

HITL响应Schema

Every HITL state must define:
yaml
HITL_Request:
  workflow_id: string
  current_state: string
  decision_point: string  # "approve_payment", "review_fraud_alert"
  context: object         # All data the human needs to make the decision
  deadline: datetime     # Timeout for human response
  escalation_path: string  # Who to notify if deadline passes

HITL_Response:
  decision: "approved" | "rejected" | "needs_more_info" | "timeout"
  rationale: string       # Human's reasoning (required for rejected)
  modified_context: object  # Any corrections or annotations
  response_time: datetime
每个HITL状态必须定义:
yaml
HITL_Request:
  workflow_id: string
  current_state: string
  decision_point: string  # "approve_payment", "review_fraud_alert"
  context: object         # 人工决策所需的所有数据
  deadline: datetime     # 人工响应的超时时间
  escalation_path: string  # 超时后通知对象

HITL_Response:
  decision: "approved" | "rejected" | "needs_more_info" | "timeout"
  rationale: string       # 拒绝时必须提供人工理由
  modified_context: object  # 任何修正或注释
  response_time: datetime

Resilience Patterns per Step

步骤级弹性模式

Retry Policy

重试策略

ParameterRecommended ValueWhen to Adjust
Max retries3–5Increase for non-critical background work; decrease for user-facing synchronous calls
Backoff strategyExponential with jitterMandatory to avoid synchronized retry storms
Base delay200ms–1sIncrease if downstream has slow cold starts
Jitter±20–50% of delayPrevents thundering herd
Retry onTransient failures only (timeout, 503, connection reset)Never retry business rule violations (400, 409)
Do not retry on400 Bad Request, 409 Conflict, 401 Unauthorized (credentials rotated)
参数推荐值调整场景
最大重试次数3–5非关键后台工作可增加;面向用户的同步调用可减少
退避策略带抖动的指数退避必须使用,避免同步重试风暴
基础延迟200ms–1s下游冷启动较慢时增加
抖动延迟的±20–50%防止雪崩效应
重试触发条件仅临时性失败(超时、503、连接重置)绝不重试业务规则违反(400、409)
不重试条件400 Bad Request、409 Conflict、401 Unauthorized(凭证已轮换)

Timeout Budget

超时预算

Each step must define a timeout. The timeout budget for the overall workflow should be distributed so that retries do not cause the workflow to exceed its maximum duration.
每个步骤必须定义超时时间。整体工作流的超时预算需合理分配,确保重试不会导致工作流超过最大持续时间。

Circuit Breaker

断路器

Apply circuit breakers to steps that call external services when:
  • The downstream is known to be unreliable.
  • Prolonged failure would cause back-pressure in the workflow queue.
  • A fallback exists (degraded mode, cached response, or skip-with-log).
当调用外部服务满足以下条件时,为步骤应用断路器:
  • 下游服务已知不可靠。
  • 持续失败会导致工作流队列积压。
  • 存在降级方案(降级模式、缓存响应或记录后跳过)。

Bulkhead Isolation

隔离舱隔离

Use bulkhead isolation when parallel steps share a thread pool or connection limit. Isolate critical steps into their own resource pools so one step's exhaustion does not starve others.
当并行步骤共享线程池或连接限制时,使用隔离舱隔离。将关键步骤隔离到独立资源池,避免某一步骤耗尽资源影响其他步骤。

Fallback and Degraded Mode

降级方案

For non-critical steps, define a fallback response:
text
Step: EnrichUserProfile
  on failure:
    if retries exhausted:
      do log_warning("Enrichment failed, using basic profile")
      emit metric "step.enrich.degraded"
      continue with basic profile
      → next state
针对非关键步骤,定义降级响应:
text
Step: EnrichUserProfile
  on failure:
    if retries exhausted:
      do log_warning("Enrichment failed, using basic profile")
      emit metric "step.enrich.degraded"
      continue with basic profile
      → next state

Observability Requirements

可观测性需求

Metrics to Define per State

每个状态需定义的指标

MetricDescriptionAlert Threshold
workflow.<name>.state.<state>.enter
Counter: number of times this state was entered
workflow.<name>.state.<state>.duration_seconds
Histogram: time spent in this statep99 > defined SLA for this state
workflow.<name>.transition.<from>.<to>.count
Counter: transitions fired
workflow.<name>.step.<step>.failure_count
Counter: step failures> 1% of invocations
workflow.<name>.step.<step>.retry_count
Counter: total retries across all executions> 3 per invocation
workflow.<name>.hitl.<checkpoint>.pending
Gauge: currently waiting for HITL response> defined queue depth
workflow.<name>.hitl.<checkpoint>.timeout_count
Counter: HITL responses that timed outAny timeout
指标描述告警阈值
workflow.<name>.state.<state>.enter
计数器:进入该状态的次数
workflow.<name>.state.<state>.duration_seconds
直方图:在该状态停留的时间p99 > 该状态定义的SLA
workflow.<name>.transition.<from>.<to>.count
计数器:触发该转换的次数
workflow.<name>.step.<step>.failure_count
计数器:步骤失败次数> 调用次数的1%
workflow.<name>.step.<step>.retry_count
计数器:所有执行的总重试次数> 每次调用3次
workflow.<name>.hitl.<checkpoint>.pending
仪表盘:当前等待HITL响应的数量> 定义的队列深度
workflow.<name>.hitl.<checkpoint>.timeout_count
计数器:HITL响应超时次数任何超时

Traces

追踪

Every transition should emit a structured trace event with:
  • workflow_id
    ,
    run_id
    ,
    current_state
    ,
    event
    ,
    destination_state
    ,
    timestamp
    ,
    duration_ms
    ,
    step_outputs
    ,
    error
    (if any).
每个转换应输出结构化追踪事件,包含:
  • workflow_id
    ,
    run_id
    ,
    current_state
    ,
    event
    ,
    destination_state
    ,
    timestamp
    ,
    duration_ms
    ,
    step_outputs
    ,
    error
    (如有)。

Alerts

告警

Define alerts at minimum for:
  • Any transition to a terminal failure state.
  • HITL checkpoint timeout without response.
  • Step failure after maximum retries.
  • State duration exceeds p99 baseline by 2x.
  • Circuit breaker opens on a critical step.
至少为以下场景定义告警:
  • 任何转换到终端失败状态的情况。
  • HITL检查点超时未响应。
  • 步骤达到最大重试次数后失败。
  • 状态持续时间超过p99基线的2倍。
  • 关键步骤的断路器触发。

Mermaid Diagram Standards

Mermaid图表标准

Use
stateDiagram-v2
for FSM diagrams and
flowchart TD
for orchestration pipeline diagrams.
使用
stateDiagram-v2
绘制FSM图,使用
flowchart TD
绘制编排流水线图。

FSM Diagram Rules

FSM图规则

  • Initial state:
    [*] --> <state>
  • Terminal states:
    <state> --> [*]
  • Label transitions with the trigger event.
  • Use color or emoji annotations in labels only if they aid readability and are consistent.
  • Do not include implementation details (database tables, API endpoint paths) in state labels.
  • 初始状态:
    [*] --> <state>
  • 终端状态:
    <state> --> [*]
  • 转换标签使用触发事件。
  • 仅在提升可读性且保持一致的情况下,在标签中使用颜色或表情符号注释。
  • 状态标签中不包含实现细节(数据库表、API端点路径)。

Example FSM Diagram

FSM图示例

mermaid
stateDiagram-v2
  [*] --> INITIALIZED
  INITIALIZED --> VALIDATING_DATA : submit_form
  VALIDATING_DATA --> DATA_VALID : validation_passed
  DATA_VALID --> PROCESSING : process_action
  PROCESSING --> PROCESSED : action_completed
  PROCESSED --> AWAITING_APPROVAL : request_approval
  AWAITING_APPROVAL --> APPROVED : human_approves
  AWAITING_APPROVAL --> REJECTED : human_rejects
  AWAITING_APPROVAL --> TIMEOUT : approval_timeout
  REJECTED --> [*]
  TIMEOUT --> [*]
  APPROVED --> [*]
  DATA_VALID --> VALIDATION_FAILED : validation_failed
  VALIDATION_FAILED --> [*]
mermaid
stateDiagram-v2
  [*] --> INITIALIZED
  INITIALIZED --> VALIDATING_DATA : submit_form
  VALIDATING_DATA --> DATA_VALID : validation_passed
  DATA_VALID --> PROCESSING : process_action
  PROCESSING --> PROCESSED : action_completed
  PROCESSED --> AWAITING_APPROVAL : request_approval
  AWAITING_APPROVAL --> APPROVED : human_approves
  AWAITING_APPROVAL --> REJECTED : human_rejects
  AWAITING_APPROVAL --> TIMEOUT : approval_timeout
  REJECTED --> [*]
  TIMEOUT --> [*]
  APPROVED --> [*]
  DATA_VALID --> VALIDATION_FAILED : validation_failed
  VALIDATION_FAILED --> [*]

Orchestration Diagram Rules

编排图规则

mermaid
flowchart TD
  Start([START]) --> Validate[Validate Input Data]
  Validate --> {Validation OK?}
  -->|Yes| ProcessA[Process Step A]
  -->|Yes| ProcessB[Process Step B]
  ProcessB --> FanOut{fan-out to parallel agents}
  FanOut --> Agent1[AI Agent: Analyze Risk]
  FanOut --> Agent2[AI Agent: Check History]
  FanOut --> Agent3[AI Agent: Verify Identity]
  Agent1 --> Aggregator[Aggregate Results]
  Agent2 --> Aggregator
  Agent3 --> Aggregator
  Aggregator --> Decision{Decision Node}
  Decision -->|score >= 0.7| AutoApprove[Auto Approve]
  Decision -->|score < 0.7| HITLReview[Human in Loop Review]
  AutoApprove --> Complete([COMPLETED])
  HITLReview -->|Approved| Complete
  HITLReview -->|Rejected| Reject([REJECTED])
  Reject --> Compensate[Compensate / Rollback]
  Compensate --> End([END])
  Validate -->|No| Reject
mermaid
flowchart TD
  Start([START]) --> Validate[Validate Input Data]
  Validate --> {Validation OK?}
  -->|Yes| ProcessA[Process Step A]
  -->|Yes| ProcessB[Process Step B]
  ProcessB --> FanOut{fan-out to parallel agents}
  FanOut --> Agent1[AI Agent: Analyze Risk]
  FanOut --> Agent2[AI Agent: Check History]
  FanOut --> Agent3[AI Agent: Verify Identity]
  Agent1 --> Aggregator[Aggregate Results]
  Agent2 --> Aggregator
  Agent3 --> Aggregator
  Aggregator --> Decision{Decision Node}
  Decision -->|score >= 0.7| AutoApprove[Auto Approve]
  Decision -->|score < 0.7| HITLReview[Human in Loop Review]
  AutoApprove --> Complete([COMPLETED])
  HITLReview -->|Approved| Complete
  HITLReview -->|Rejected| Reject([REJECTED])
  Reject --> Compensate[Compensate / Rollback]
  Compensate --> End([END])
  Validate -->|No| Reject

Execution Workflow

执行工作流

Phase 1: Intake and Context Gathering

阶段1:需求收集与上下文梳理

  1. Identify the workflow goal, entity being modeled, and triggering event.
  2. Determine whether the user provided a spec, PRD, user story, architecture document, or raw description.
  3. Extract states, failure modes, business rules, and integration seams from source artifacts.
  4. Identify HITL requirements, saga boundaries, and observability needs.
  5. List assumptions, missing information, and architecture-impacting questions.
  1. 确定工作流目标、建模的实体以及触发事件。
  2. 判断用户提供的是规格说明、PRD、用户故事、架构文档还是原始描述。
  3. 从源工件中提取状态、失败模式、业务规则和集成点。
  4. 识别HITL需求、Saga边界和可观测性需求。
  5. 列出假设、缺失信息以及影响架构的问题。

Phase 2: State Identification

阶段2:状态识别

  1. List all possible states (initial, intermediate, waiting, terminal).
  2. Classify each state by type (operational, waiting, terminal).
  3. Identify which state is the workflow entity's current state.
  4. Verify that every state has a clear owner and persistence mechanism.
  1. 列出所有可能的状态(初始、中间、等待、终端)。
  2. 按类型(操作态、等待态、终端态)分类每个状态。
  3. 确定工作流实体的当前状态。
  4. 验证每个状态都有明确的归属方和持久化机制。

Phase 3: Transition Mapping

阶段3:转换映射

  1. For each state, list all possible trigger events.
  2. For each trigger, define guard conditions and resulting destination state.
  3. For each transition, list required side effects and tasks.
  4. Identify transitions that require compensating transactions (saga) or HITL checkpoints.
  5. Verify that no implicit or hidden transitions exist.
  1. 针对每个状态,列出所有可能的触发事件。
  2. 针对每个触发事件,定义守卫条件和目标状态。
  3. 针对每个转换,列出所需的副作用和任务。
  4. 识别需要补偿事务(Saga)或HITL检查点的转换。
  5. 验证不存在隐式或隐藏的转换。

Phase 4: Resilience and Observability Design

阶段4:弹性与可观测性设计

  1. Define retry policy, timeout, and circuit breaker per step.
  2. Identify steps requiring idempotency keys or deduplication.
  3. Design fallback behavior for each non-critical step.
  4. Define metrics, traces, and alerts for each state and transition.
  5. Verify that the workflow can recover from worker crashes (state persisted durably).
  1. 为每个步骤定义重试策略、超时和断路器。
  2. 识别需要幂等键或去重机制的步骤。
  3. 为每个非关键步骤设计降级行为。
  4. 为每个状态和转换定义指标、追踪和告警。
  5. 验证工作流可从工作节点崩溃中恢复(状态已持久化)。

Phase 5: Mermaid Diagram Generation

阶段5:Mermaid图表生成

  1. Generate the
    stateDiagram-v2
    for FSM models.
  2. Generate the
    flowchart TD
    for orchestration pipeline models.
  3. Verify that all states, transitions, and branching paths are represented.
  1. 为FSM模型生成
    stateDiagram-v2
    图。
  2. 为编排流水线模型生成
    flowchart TD
    图。
  3. 验证所有状态、转换和分支路径均已体现。

Required Output Structure

要求的输出结构

Use this structure unless the user requests a narrower deliverable:
markdown
undefined
除非用户要求更窄范围的交付物,否则使用以下结构:
markdown
undefined

Workflow Design: <Workflow Name>

工作流设计:<工作流名称>

1. Orchestration Context

1. 编排上下文

  • Workflow Name: <identifier>
  • Objective: <one-sentence description>
  • Central State Entity: <entity being modeled>
  • Triggering Event: <what starts the workflow>
  • Orchestration Model: <Centralized Orchestrator / Event-Driven Choreography / Hybrid>
  • Assumptions:
  • Open Questions:
  • 工作流名称: <标识符>
  • 目标: <一句话描述>
  • 核心状态实体: <建模的实体>
  • 触发事件: <启动工作流的事件>
  • 编排模型: <集中式编排器 / 事件驱动编舞 / 混合模式>
  • 假设:
  • 待解决问题:

2. State Machine Definition

2. 状态机定义

StateTypeDescriptionOwner (Persistence)
状态类型描述归属方(持久化)

3. Transition Matrix

3. 转换矩阵

From StateEventGuard Condition(s)Tasks / Side EffectsTo State
源状态事件守卫条件任务/副作用目标状态

4. Mermaid FSM Diagram

4. Mermaid FSM图

mermaid
stateDiagram-v2
  [*] --> <Initial>
  ...
mermaid
stateDiagram-v2
  [*] --> <Initial>
  ...

5. Mermaid Orchestration Diagram (if applicable)

5. Mermaid编排图(如适用)

mermaid
flowchart TD
  ...
mermaid
flowchart TD
  ...

6. Saga Design (if applicable)

6. Saga设计(如适用)

  • Saga Type: <Orchestration / Choreography>
  • Pivot Transaction: <Step that first commits不可逆>
  • Compensation Order: <Reverse execution order> | Step | Local Transaction | Compensating Transaction | Retry Policy | Idempotency Key | | --- | --- | --- | --- | --- |
  • Saga类型: <编排式 / 编舞式>
  • 枢纽事务: <第一个不可逆步骤>
  • 补偿顺序: <逆序执行顺序> | 步骤 | 本地事务 | 补偿事务 | 重试策略 | 幂等键 | | --- | --- | --- | --- | --- |

7. HITL Checkpoint Design (if applicable)

7. HITL检查点设计(如适用)

CheckpointTrigger ConditionRequest SchemaResponse SchemaTimeoutEscalation
检查点触发条件请求Schema响应Schema超时时间升级路径

8. Resilience per Step

8. 步骤级弹性设计

Step / StateRetry PolicyTimeoutCircuit BreakerFallbackIdempotency Mechanism
步骤/状态重试策略超时时间断路器降级方案幂等机制

9. Observability

9. 可观测性

Metrics

指标

MetricTypeDescriptionAlert Threshold
指标类型描述告警阈值

Traces

追踪

TriggerTrace Event Fields
触发事件追踪事件字段

Alerts

告警

AlertConditionSeverity
告警条件级别

10. Sad Path Coverage

10. 异常路径覆盖

ScenarioTriggerBehaviorRecovery
Agent failure at Step XStep X returns error after max retriesExecute compensations on Steps X-1...1Retry from last successful state or manual escalation
HITL timeoutNo human response within deadline<defined behavior><notify/escalate>
External service returns 503Step Y times out<defined behavior><circuit breaker / fallback>
场景触发条件行为恢复方式
步骤X的Agent故障步骤X达到最大重试次数后返回错误对步骤X-1...1执行补偿从最后成功状态重试或人工升级
HITL超时超时时间内无人工响应<定义的行为><通知/升级>
外部服务返回503步骤Y超时<定义的行为><断路器/降级方案>

11. Verification Checklist

11. 验证 checklist

CheckStatus
All states have a defined owner/persistence✅ / ❌
All transitions have explicit trigger, guards, and side effects✅ / ❌
No implicit or hidden transitions exist✅ / ❌
Sad paths are defined for every state✅ / ❌
Every external call has timeout, retry, and circuit breaker defined✅ / ❌
Saga compensations run in reverse order and are idempotent✅ / ❌
HITL checkpoints have timeout and escalation paths✅ / ❌
Metrics and traces are defined for every state and transition✅ / ❌
Workflow survives worker crash (state is durable)✅ / ❌
Diagram matches the transition matrix✅ / ❌
undefined
检查项状态
所有状态均定义了归属方/持久化机制✅ / ❌
所有转换均有明确的触发事件、守卫条件和副作用✅ / ❌
不存在隐式或隐藏的转换✅ / ❌
每个状态均定义了异常路径✅ / ❌
所有外部调用均定义了超时、重试和断路器✅ / ❌
Saga补偿按逆序执行且具有幂等性✅ / ❌
HITL检查点定义了超时和升级路径✅ / ❌
每个状态和转换均定义了指标和追踪✅ / ❌
工作流可从工作节点崩溃中恢复(状态持久化)✅ / ❌
图表与转换矩阵一致✅ / ❌
undefined

Quality Bar

质量标准

Before presenting the result, verify:
  • Every state is represented in the Mermaid diagram.
  • Every transition in the matrix has a corresponding arrow in the diagram.
  • All guard conditions are mutually exclusive.
  • All sad paths are documented and include recovery behavior.
  • All external API calls have resilience policies.
  • HITL checkpoints include timeout and escalation paths.
  • The workflow has exactly one initial state and defined terminal states.
  • State ownership is clear and durable (survives worker restart).
  • The skill output is written in English.
  • No implementation details (database table names, API paths, UI component names) appear in state labels.
提交结果前,需验证:
  • 所有状态均在Mermaid图中体现。
  • 转换矩阵中的每个转换在图中都有对应的箭头。
  • 所有守卫条件互斥。
  • 所有异常路径均已记录并包含恢复行为。
  • 所有外部API调用均有弹性策略。
  • HITL检查点包含超时和升级路径。
  • 工作流有且仅有一个初始状态和已定义的终端状态。
  • 状态归属明确且持久化(可在工作节点重启后恢复)。
  • 技能输出使用英文(注:此处为翻译要求保留原文说明,实际输出应为中文)。
  • 状态标签中不包含实现细节(数据库表名、API路径、UI组件名)。

Present Results to User

向用户展示结果

Lead with the workflow name, orchestration model, and the Mermaid diagram. Present the state machine first so the user can see the big picture before reading the detailed transition matrix. Highlight any critical design decisions (pivot transaction in saga, mandatory HITL checkpoint, circuit breaker configuration) and explain why they were chosen over alternatives. If the workflow has multiple terminal states (success, failure, compensated), clearly label what each means and when each is reached.
以工作流名称、编排模型和Mermaid图开头。先展示状态机,让用户先了解整体情况,再阅读详细的转换矩阵。突出关键设计决策(Saga中的枢纽事务、强制HITL检查点、断路器配置),并说明选择该方案而非其他方案的原因。如果工作流有多个终端状态(成功、失败、补偿完成),需明确标注每个状态的含义和触发场景。

Troubleshooting

故障排查

  • Too many states: The workflow entity may be modeling multiple unrelated concerns. Consider splitting into sub-entities or hierarchical states.
  • Circular transitions: If a state can transition to itself, the trigger or guard must be explicit and finite. Verify the loop has a termination condition.
  • Missing sad path: Every state has at least one failure transition. If a state has no failure path, document why it is guaranteed not to fail.
  • Ambiguous guards: If two guards could both be true for the same trigger, the diagram is non-deterministic. Refine the guards to be mutually exclusive.
  • HITL without timeout: A human-in-the-loop checkpoint without a deadline will stall the workflow indefinitely. Define a timeout and escalation path.
  • No idempotency on external call: An external API call without idempotency key will create duplicates on retry. Add an idempotency key to the request.
  • State not durable: If the workflow entity's state is stored in memory, a worker crash loses all progress. Move state to a durable store (database, message broker, workflow engine).
  • 状态过多:工作流实体可能建模了多个不相关的关注点。考虑拆分为子实体或分层状态。
  • 循环转换:如果状态可转换到自身,触发事件或守卫条件必须明确且有限。验证循环有终止条件。
  • 缺失异常路径:每个状态至少有一个失败转换。如果某个状态没有失败路径,需记录其不会失败的原因。
  • 守卫条件模糊:如果同一触发事件下两个守卫条件可能同时为真,图表具有不确定性。优化守卫条件使其互斥。
  • HITL无超时:无截止时间的人机协作检查点会导致工作流无限停滞。定义超时时间和升级路径。
  • 外部调用无幂等性:无幂等键的外部API调用在重试时会产生重复数据。在请求中添加幂等键。
  • 状态未持久化:如果工作流实体的状态存储在内存中,工作节点崩溃会丢失所有进度。将状态迁移到持久化存储(数据库、消息 broker、工作流引擎)。