dogfood

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Dogfood Skill

Dogfood技能

VCS Provider

VCS提供商

This skill uses VCS operations through Exarchos MCP actions (

create_issue

, etc.). These actions automatically detect and route to the correct VCS provider (GitHub, GitLab, Azure DevOps). No

gh

glab

az

commands needed — the MCP server handles provider dispatch.

该技能通过Exarchos MCP操作（

create_issue

等）使用VCS操作。这些操作会自动检测并路由到正确的VCS提供商（GitHub、GitLab、Azure DevOps）。无需使用

gh

glab

az

命令——MCP服务器会处理提供商的调度。

Overview

概述

Retrospective analysis of Exarchos MCP tool usage. Uses the MCP server's own self-service capabilities as the primary diagnostic instrument — describe APIs, views, playbooks, and runbooks turned inward to diagnose failures.

Three distinct failure modes require different fixes — code changes, documentation updates, or skill instruction improvements. Mixing them wastes effort.

对Exarchos MCP工具使用情况的回顾性分析。将MCP服务器自身的自助服务能力作为主要诊断工具——描述用于诊断故障的API、视图、剧本（playbook）和运行手册（runbook）。三种不同的故障模式需要不同的修复方式——代码变更、文档更新或技能指令改进。混淆它们会浪费精力。

Platform-Agnosticity

平台无关性

Per

docs/designs/2026-03-09-platform-agnosticity.md

: the MCP server is the self-sufficient, platform-agnostic core. The debug trace relies entirely on MCP tools — not conversation introspection — so it works for any MCP client. Conversation scanning is supplementary.

Diagnostic self-service tools:

describe(topology)

for HSM verification,

describe(playbook)

for adherence checks,

describe(eventTypes, emissionGuide)

for event schema/catalog comparison,

describe(actions)

for schema/gate metadata,

runbook(phase)

for step conformance,

pipeline

convergence

telemetry

views for health metrics.

根据

docs/designs/2026-03-09-platform-agnosticity.md

：MCP服务器是自给自足、平台无关的核心。调试跟踪完全依赖MCP工具——而非对话自省——因此它适用于任何MCP客户端。对话扫描仅作为补充。

诊断自助工具：

describe(topology)

用于HSM验证，

describe(playbook)

用于合规性检查，

describe(eventTypes, emissionGuide)

用于事件模式/目录对比，

describe(actions)

用于模式/网关元数据，

runbook(phase)

用于步骤一致性检查，

pipeline

convergence

telemetry

视图用于健康指标。

Triggers

触发条件

Activate this skill when:

User runs
```
/dogfood
```
or
```
/dogfood
```
User asks "what went wrong this session" or "review the failures"
User wants to triage errors from a workflow run
End of a workflow session to capture learnings

在以下情况激活该技能：

用户运行
```
/dogfood
```
或
```
/dogfood
```
命令
用户询问“本次会话出了什么问题”或“审查故障”
用户想要分类处理工作流运行中的错误
工作流会话结束时捕获经验教训

Process

流程

Step 1: Debug Trace via MCP Self-Service

步骤1：通过MCP自助服务进行调试跟踪

Query the MCP server's own self-service capabilities to build a ground-truth diagnostic picture. This is the primary investigation method — it uses the same tools any MCP client has access to.

查询MCP服务器自身的自助服务能力，构建真实的诊断画面。这是主要的调查方法——它使用任何MCP客户端都能访问的相同工具。

1a. Identify Active Workflows

1a. 识别活跃工作流

Use

exarchos_view

with

action: "pipeline"

to get an aggregated view of active workflows with their phases and task counts.

$ARGUMENTS

specifies a workflow or feature ID, scope to that workflow. Otherwise, inspect all non-terminal workflows.

使用带

action: "pipeline"

参数的

exarchos_view

获取包含阶段和任务计数的活跃工作流聚合视图。

如果

$ARGUMENTS

指定了工作流或功能ID，则限定到该工作流。否则，检查所有非终端工作流。

1b. Inspect Workflow State and Topology

1b. 检查工作流状态与拓扑

For each relevant workflow:

Read state —
```
exarchos_workflow get
```
to retrieve current phase, tasks, reviews, gate results.
Read topology —
```
exarchos_workflow describe(topology: "<workflowType>")
```
to get the HSM definition. Compare the agent's phase transition attempts against valid transitions. Invalid transition attempts = documentation issue (skill prescribed wrong path) or user error.
Check guard prerequisites — For
```
workflow.guard-failed
```
events, look up the guard in the topology to understand unmet preconditions.

对于每个相关工作流：

读取状态 — 使用
```
exarchos_workflow get
```
获取当前阶段、任务、审核结果和网关结果。
读取拓扑 — 使用
```
exarchos_workflow describe(topology: "<workflowType>")
```
获取HSM定义。将代理的阶段转换尝试与有效转换进行对比。无效的转换尝试=文档问题（技能指定了错误路径）或用户错误。
检查网关先决条件 — 对于
```
workflow.guard-failed
```
事件，在拓扑中查找网关以了解未满足的前置条件。

1c. Playbook Adherence Check

1c. 剧本合规性检查

Use

exarchos_workflow describe(playbook: "<workflowType>")

to retrieve phase playbooks. For each phase executed, compare playbook's

tools

events

transitionCriteria

guardPrerequisites

humanCheckpoint

, and

compactGuidance

against what the agent actually did and what skill docs prescribe.

Playbook violations are diagnostic gold:

Agent deviated and skill docs told it to → documentation issue (skill contradicts playbook)
Agent deviated and skill docs agree with playbook → user error
Playbook is wrong (prescribes invalid tools/events) → code bug

使用

exarchos_workflow describe(playbook: "<workflowType>")

获取阶段剧本。对于每个已执行的阶段，将剧本的

tools

、

events

、

transitionCriteria

、

guardPrerequisites

、

humanCheckpoint

和

compactGuidance

与代理实际执行的操作以及技能文档的规定进行对比。

剧本违规是诊断关键：

代理偏离操作且技能文档要求如此 → 文档问题（技能与剧本矛盾）
代理偏离操作但技能文档与剧本一致 → 用户错误
剧本存在错误（指定了无效的工具/事件） → 代码缺陷

1d. Event Log Analysis

1d. 事件日志分析

Use

exarchos_event query(stream)

on the workflow's event stream. Look for:

Rejected events — absent from log despite agent attempts (corroborate with conversation errors)
Missing events — compare against playbook
```
events
```
field and
```
exarchos_event describe(emissionGuide: true)
```
. Missing model-emitted events = documentation gap or user error.
Sequence anomalies — wrong order, duplicates, or timeline gaps
Schema mismatches — use
```
describe(eventTypes: [...])
```
to get authoritative JSON Schema. Compare actual payloads against schema for semantically wrong fields.

对工作流的事件流使用

exarchos_event query(stream)

。查找：

被拒绝的事件 — 代理尝试发送但未出现在日志中（与对话错误相互印证）
缺失的事件 — 与剧本的
```
events
```
字段和
```
exarchos_event describe(emissionGuide: true)
```
的结果对比。模型未发出预期事件=文档缺失或用户错误。
序列异常 — 顺序错误、重复或时间线间隙
模式不匹配 — 使用
```
describe(eventTypes: [...])
```
获取权威JSON Schema。将实际负载与模式对比，检查语义错误的字段。

1e. Orchestrate Action and Gate Analysis

1e. 编排操作与网关分析

Schema verification —
```
exarchos_orchestrate describe(actions: [...])
```
for authoritative schemas. Compare agent's parameters against schema to detect stale skill docs or improvisation.
Gate metadata — Describe output includes
```
{ blocking, dimension, autoEmits }
```
. Check: did the agent treat blocking/non-blocking correctly? Did expected auto-emissions fire?
Gate convergence —
```
exarchos_view convergence
```
for per-dimension (D1-D5) pass rates. Low convergence suggests systemic gate issues.

模式验证 — 使用
```
exarchos_orchestrate describe(actions: [...])
```
获取权威模式。将代理的参数与模式对比，检测过时的技能文档或即兴操作。
网关元数据 — 描述输出包含
```
{ blocking, dimension, autoEmits }
```
。检查：代理是否正确处理了阻塞/非阻塞？预期的自动事件是否触发？
网关收敛性 — 使用
```
exarchos_view convergence
```
查看各维度（D1-D5）的通过率。低收敛性表明存在系统性网关问题。

1f. Runbook Conformance Check

1f. 运行手册一致性检查

Use

exarchos_orchestrate runbook(phase)

to retrieve relevant runbooks. Check: step ordering, decision branch correctness (steps with

decide

fields),

onFail

directive adherence (

stop

continue

retry

), and

templateVars

completeness.

使用

exarchos_orchestrate runbook(phase)

获取相关运行手册。检查：步骤顺序、决策分支正确性（含

decide

字段的步骤）、

onFail

指令合规性（

stop

continue

retry

）以及

templateVars

的完整性。

1g. Telemetry Review

1g. 遥测数据审查

Use

exarchos_view telemetry

for per-tool performance. Flag: high error rates (systemic issues), high invocation counts (retry loops), and tools never invoked that the playbook prescribes.

使用

exarchos_view telemetry

查看各工具的性能。标记：高错误率（系统性问题）、高调用次数（重试循环）以及剧本规定但从未被调用的工具。

Step 2: Scan Session for Failed Tool Calls

步骤2：扫描会话中的失败工具调用

Supplement the debug trace with client-side context — review conversation for failed Exarchos tool calls.

Note: Platform-dependent step (requires conversation history). Skip on platforms without introspection; the debug trace is self-sufficient.

Target tools:

exarchos_workflow

exarchos_event

exarchos_orchestrate

exarchos_view

exarchos_sync

Error signals:

INVALID_INPUT

VALIDATION_ERROR

BATCH_APPEND_FAILED

, Zod failures (

invalid_type

invalid_enum_value

unrecognized_keys

ENOENT

CLAIM_FAILED

SEQUENCE_CONFLICT

, CAS exhaustion, retry sequences, successful-after-retry calls.

使用客户端上下文补充调试跟踪——审查对话中的Exarchos工具失败调用。

注意： 该步骤依赖平台（需要对话历史）。在不支持自省的平台上跳过此步骤；调试跟踪已足够。

目标工具：

exarchos_workflow

、

exarchos_event

、

exarchos_orchestrate

、

exarchos_view

、

exarchos_sync

错误信号：

INVALID_INPUT

、

VALIDATION_ERROR

、

BATCH_APPEND_FAILED

、Zod失败（

invalid_type

、

invalid_enum_value

、

unrecognized_keys

）、

ENOENT

、

CLAIM_FAILED

、

SEQUENCE_CONFLICT

、CAS耗尽、重试序列、重试后成功的调用。

Step 3: Diagnose Each Failure

步骤3：诊断每个故障

Merge debug trace and conversation scan findings. For each failure document:

What was attempted — action, parameters, intent
What went wrong — error message and validation path
Server-side evidence — event log, state, describe output, views
Authoritative reference — the self-service query providing ground truth (playbook, topology, schema, runbook)
Root cause — per
```
references/root-cause-patterns.md
```
Fix category — code, docs, or user behavior

Flag discrepancies only visible via server-side inspection as trace-only findings.

合并调试跟踪和对话扫描的结果。为每个故障记录：

尝试的操作 — 动作、参数、意图
故障情况 — 错误消息和验证路径
服务器端证据 — 事件日志、状态、描述输出、视图
权威参考 — 提供真实依据的自助查询（剧本、拓扑、模式、运行手册）
根本原因 — 根据
```
references/root-cause-patterns.md
```
修复类别 — 代码、文档或用户行为

将仅通过服务器端检查发现的差异标记为仅跟踪发现。

Step 4: Categorize into Buckets

步骤4：分类到对应类别

Assign each failure to exactly one root cause bucket:

将每个故障分配到恰好一个根本原因类别：

Bucket 1: Code Bug

类别1：代码缺陷

The MCP server, event store, or workflow engine has a defect.

Signals: Schema rejects valid input (confirmed via

describe

), CAS failures with no concurrent writers, gate over-enforcement, identical-parameter retry succeeds (race condition), state corruption, topology/engine mismatch, auto-emission failure.

Action: File bug issue with reproduction steps, expected vs actual, and suggested fix.

MCP服务器、事件存储或工作流引擎存在缺陷。

信号： 模式拒绝有效输入（通过

describe

确认）、无并发写入时的CAS失败、网关过度限制、相同参数重试成功（竞态条件）、状态损坏、拓扑/引擎不匹配、自动事件触发失败。

操作： 创建缺陷工单，包含复现步骤、预期与实际结果以及建议修复方案。

Bucket 2: Documentation Issue

类别2：文档问题

Skill docs are wrong, incomplete, or out of sync with the MCP server's self-service output.

Signals: Skill payload doesn't match

describe

schema, skill/playbook divergence, skill documents nonexistent topology paths, missing event types (compare emission guide), retry-based field discovery, runbook/skill contradictions, compactGuidance drift.

Action: File docs issue with file:line, the discrepancy, and correct information from

describe

output.

技能文档存在错误、不完整或与MCP服务器的自助服务输出不同步。

信号： 技能负载与

describe

模式不匹配、技能/剧本不一致、技能文档记录了不存在的拓扑路径、缺失事件类型（与事件触发指南对比）、基于重试的字段发现、运行手册/技能矛盾、compactGuidance偏离。

操作： 创建文档工单，包含文件:行号、差异点以及来自

describe

输出的正确信息。

Bucket 3: User Error

类别3：用户错误

The agent misused a tool in a way both docs and

describe

output correctly describe.

Signals: Format mismatch (confirmed by

describe

+ docs agreement), invalid sequence (topology confirms), missing context both skill and playbook prescribe, runbook deviation without justification.

Action: Note for skill improvement if errors are frequent.

代理以技能文档和

describe

输出均正确描述的方式误用了工具。

信号： 格式不匹配（通过

describe

+文档一致确认）、无效序列（拓扑确认）、技能和剧本均要求的上下文缺失、无正当理由偏离运行手册。

操作： 如果错误频繁发生，记录下来用于技能改进。

Step 5: Generate Report

步骤5：生成报告

Produce the report using the template from

references/report-template.md

. Include:

Summary counts per bucket
Debug trace summary (workflows inspected, events reviewed, describe queries issued, views consulted)
Each failure with full diagnosis (including authoritative self-service references)
Trace-only findings section (issues only visible via server-side inspection)
Playbook/runbook adherence summary
Actionable next steps (draft issue bodies for bugs/docs issues)

使用

references/report-template.md

中的模板生成报告。包含：

每个类别的汇总计数
调试跟踪摘要（检查的工作流数量、审查的事件数量、执行的describe查询数量、查阅的视图、仅跟踪发现的数量）
每个故障的完整诊断（包括权威自助服务参考）
仅跟踪发现部分（仅通过服务器端检查发现的问题）
剧本/运行手册合规性摘要
可执行的下一步操作（缺陷/文档工单的草稿内容）

Step 6: Offer to File Issues

步骤6：提议创建工单

For findings in the Code Bug and Documentation Issue buckets, offer to create GitHub issues:

typescript

exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })

Only file issues with user confirmation — present the draft first.

对于代码缺陷和文档问题类别中的发现，提议创建GitHub工单：

typescript

exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })

仅在用户确认后创建工单——先展示草稿。

Required Output Format

要求的输出格式

json

{
  "session_summary": {
    "total_tool_calls": 0,
    "failed_tool_calls": 0,
    "failure_rate": "0%",
    "debug_trace": {
      "workflows_inspected": 0,
      "events_reviewed": 0,
      "describe_queries": 0,
      "views_consulted": [],
      "trace_only_findings": 0
    }
  },
  "playbook_adherence": {
    "phases_checked": 0,
    "violations": [
      {
        "phase": "delegate",
        "field": "events",
        "expected": "team.spawned, team.task.assigned",
        "actual": "none emitted",
        "bucket": "documentation_issue"
      }
    ]
  },
  "runbook_conformance": {
    "runbooks_checked": 0,
    "deviations": []
  },
  "buckets": {
    "code_bug": [],
    "documentation_issue": [],
    "user_error": []
  },
  "findings": [
    {
      "id": 1,
      "bucket": "code_bug | documentation_issue | user_error",
      "tool": "exarchos_workflow",
      "action": "set",
      "error": "INVALID_INPUT: ...",
      "root_cause": "Schema rejects null branch on pending tasks",
      "trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
      "authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
      "severity": "HIGH | MEDIUM | LOW",
      "suggested_fix": "Accept nullable branch in TaskSchema",
      "issue_draft": {
        "title": "bug: workflow task schema rejects null branch",
        "labels": ["bug"],
        "body": "..."
      }
    }
  ],
  "trace_only_findings": [
    {
      "id": "T1",
      "description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
      "evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
      "authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
      "bucket": "documentation_issue",
      "suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
    }
  ]
}

json

{
  "session_summary": {
    "total_tool_calls": 0,
    "failed_tool_calls": 0,
    "failure_rate": "0%",
    "debug_trace": {
      "workflows_inspected": 0,
      "events_reviewed": 0,
      "describe_queries": 0,
      "views_consulted": [],
      "trace_only_findings": 0
    }
  },
  "playbook_adherence": {
    "phases_checked": 0,
    "violations": [
      {
        "phase": "delegate",
        "field": "events",
        "expected": "team.spawned, team.task.assigned",
        "actual": "none emitted",
        "bucket": "documentation_issue"
      }
    ]
  },
  "runbook_conformance": {
    "runbooks_checked": 0,
    "deviations": []
  },
  "buckets": {
    "code_bug": [],
    "documentation_issue": [],
    "user_error": []
  },
  "findings": [
    {
      "id": 1,
      "bucket": "code_bug | documentation_issue | user_error",
      "tool": "exarchos_workflow",
      "action": "set",
      "error": "INVALID_INPUT: ...",
      "root_cause": "Schema rejects null branch on pending tasks",
      "trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
      "authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
      "severity": "HIGH | MEDIUM | LOW",
      "suggested_fix": "Accept nullable branch in TaskSchema",
      "issue_draft": {
        "title": "bug: workflow task schema rejects null branch",
        "labels": ["bug"],
        "body": "..."
      }
    }
  ],
  "trace_only_findings": [
    {
      "id": "T1",
      "description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
      "evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
      "authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
      "bucket": "documentation_issue",
      "suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
    }
  ]
}

Anti-Patterns

反模式

Don't	Do Instead
Skip the debug trace and only scan conversation	Always query MCP self-service tools first — conversation scan is supplementary
Guess what the schema expects	Use `describe` to get authoritative schemas — they are the source of truth
Assess playbook adherence from memory	Query `describe(playbook)` to get the actual prescribed tools, events, and criteria
Assume the topology without checking	Query `describe(topology)` to get valid transitions, guards, and effects
Blame the user when skill docs contradict the playbook	If skill docs diverge from playbook/describe output, it's a documentation issue
File duplicate issues	Check existing open/closed issues before drafting
Categorize retries as separate failures	Group retry sequences as a single finding
Ignore successful-after-retry calls	These reveal friction even though they eventually worked
Include non-Exarchos failures	Scope strictly to the 5 Exarchos tools — other MCP failures are out of scope
Report conversation-only findings without trace corroboration	Cross-reference every finding with server-side state when possible

不要做	正确做法
跳过调试跟踪，仅扫描对话	始终先查询MCP自助服务工具——对话扫描仅作为补充
猜测模式的要求	使用 `describe` 获取权威模式——它们是事实来源
凭记忆评估剧本合规性	查询 `describe(playbook)` 获取实际规定的工具、事件和标准
不检查就假设拓扑	查询 `describe(topology)` 获取有效转换、网关和效果
当技能文档与剧本矛盾时指责用户	如果技能文档与剧本/describe输出不一致，这是文档问题
创建重复工单	起草前检查现有开放/已关闭工单
将重试归类为单独的故障	将重试序列归为单个发现
忽略重试后成功的调用	这些调用即使最终成功也揭示了摩擦点
包含非Exarchos故障	严格限定为5个Exarchos工具——其他MCP故障不在范围内
报告仅对话发现而无跟踪佐证	尽可能用服务器端状态交叉验证每个发现