dogfood

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dogfood Skill

Dogfood技能

VCS Provider

VCS提供商

This skill uses VCS operations through Exarchos MCP actions (
create_issue
, etc.). These actions automatically detect and route to the correct VCS provider (GitHub, GitLab, Azure DevOps). No
gh
/
glab
/
az
commands needed — the MCP server handles provider dispatch.
该技能通过Exarchos MCP操作(
create_issue
等)使用VCS操作。 这些操作会自动检测并路由到正确的VCS提供商(GitHub、GitLab、Azure DevOps)。 无需使用
gh
/
glab
/
az
命令——MCP服务器会处理提供商的调度。

Overview

概述

Retrospective analysis of Exarchos MCP tool usage. Uses the MCP server's own self-service capabilities as the primary diagnostic instrument — describe APIs, views, playbooks, and runbooks turned inward to diagnose failures.
Three distinct failure modes require different fixes — code changes, documentation updates, or skill instruction improvements. Mixing them wastes effort.
对Exarchos MCP工具使用情况的回顾性分析。将MCP服务器自身的自助服务能力作为主要诊断工具——描述用于诊断故障的API、视图、剧本(playbook)和运行手册(runbook)。 三种不同的故障模式需要不同的修复方式——代码变更、文档更新或技能指令改进。混淆它们会浪费精力。

Platform-Agnosticity

平台无关性

Per
docs/designs/2026-03-09-platform-agnosticity.md
: the MCP server is the self-sufficient, platform-agnostic core. The debug trace relies entirely on MCP tools — not conversation introspection — so it works for any MCP client. Conversation scanning is supplementary.
Diagnostic self-service tools:
describe(topology)
for HSM verification,
describe(playbook)
for adherence checks,
describe(eventTypes, emissionGuide)
for event schema/catalog comparison,
describe(actions)
for schema/gate metadata,
runbook(phase)
for step conformance,
pipeline
/
convergence
/
telemetry
views for health metrics.
根据
docs/designs/2026-03-09-platform-agnosticity.md
:MCP服务器是自给自足、平台无关的核心。调试跟踪完全依赖MCP工具——而非对话自省——因此它适用于任何MCP客户端。对话扫描仅作为补充。
诊断自助工具:
describe(topology)
用于HSM验证,
describe(playbook)
用于合规性检查,
describe(eventTypes, emissionGuide)
用于事件模式/目录对比,
describe(actions)
用于模式/网关元数据,
runbook(phase)
用于步骤一致性检查,
pipeline
/
convergence
/
telemetry
视图用于健康指标。

Triggers

触发条件

Activate this skill when:
  • User runs
    /dogfood
    or
    /dogfood
  • User asks "what went wrong this session" or "review the failures"
  • User wants to triage errors from a workflow run
  • End of a workflow session to capture learnings
在以下情况激活该技能:
  • 用户运行
    /dogfood
    /dogfood
    命令
  • 用户询问“本次会话出了什么问题”或“审查故障”
  • 用户想要分类处理工作流运行中的错误
  • 工作流会话结束时捕获经验教训

Process

流程

Step 1: Debug Trace via MCP Self-Service

步骤1:通过MCP自助服务进行调试跟踪

Query the MCP server's own self-service capabilities to build a ground-truth diagnostic picture. This is the primary investigation method — it uses the same tools any MCP client has access to.
查询MCP服务器自身的自助服务能力,构建真实的诊断画面。这是主要的调查方法——它使用任何MCP客户端都能访问的相同工具。

1a. Identify Active Workflows

1a. 识别活跃工作流

Use
exarchos_view
with
action: "pipeline"
to get an aggregated view of active workflows with their phases and task counts.
If
$ARGUMENTS
specifies a workflow or feature ID, scope to that workflow. Otherwise, inspect all non-terminal workflows.
使用带
action: "pipeline"
参数的
exarchos_view
获取包含阶段和任务计数的活跃工作流聚合视图。
如果
$ARGUMENTS
指定了工作流或功能ID,则限定到该工作流。否则,检查所有非终端工作流。

1b. Inspect Workflow State and Topology

1b. 检查工作流状态与拓扑

For each relevant workflow:
  1. Read state
    exarchos_workflow get
    to retrieve current phase, tasks, reviews, gate results.
  2. Read topology
    exarchos_workflow describe(topology: "<workflowType>")
    to get the HSM definition. Compare the agent's phase transition attempts against valid transitions. Invalid transition attempts = documentation issue (skill prescribed wrong path) or user error.
  3. Check guard prerequisites — For
    workflow.guard-failed
    events, look up the guard in the topology to understand unmet preconditions.
对于每个相关工作流:
  1. 读取状态 — 使用
    exarchos_workflow get
    获取当前阶段、任务、审核结果和网关结果。
  2. 读取拓扑 — 使用
    exarchos_workflow describe(topology: "<workflowType>")
    获取HSM定义。将代理的阶段转换尝试与有效转换进行对比。无效的转换尝试=文档问题(技能指定了错误路径)或用户错误。
  3. 检查网关先决条件 — 对于
    workflow.guard-failed
    事件,在拓扑中查找网关以了解未满足的前置条件。

1c. Playbook Adherence Check

1c. 剧本合规性检查

Use
exarchos_workflow describe(playbook: "<workflowType>")
to retrieve phase playbooks. For each phase executed, compare playbook's
tools
,
events
,
transitionCriteria
,
guardPrerequisites
,
humanCheckpoint
, and
compactGuidance
against what the agent actually did and what skill docs prescribe.
Playbook violations are diagnostic gold:
  • Agent deviated and skill docs told it to → documentation issue (skill contradicts playbook)
  • Agent deviated and skill docs agree with playbook → user error
  • Playbook is wrong (prescribes invalid tools/events) → code bug
使用
exarchos_workflow describe(playbook: "<workflowType>")
获取阶段剧本。对于每个已执行的阶段,将剧本的
tools
events
transitionCriteria
guardPrerequisites
humanCheckpoint
compactGuidance
与代理实际执行的操作以及技能文档的规定进行对比。
剧本违规是诊断关键:
  • 代理偏离操作且技能文档要求如此 → 文档问题(技能与剧本矛盾)
  • 代理偏离操作但技能文档与剧本一致 → 用户错误
  • 剧本存在错误(指定了无效的工具/事件) → 代码缺陷

1d. Event Log Analysis

1d. 事件日志分析

Use
exarchos_event query(stream)
on the workflow's event stream. Look for:
  • Rejected events — absent from log despite agent attempts (corroborate with conversation errors)
  • Missing events — compare against playbook
    events
    field and
    exarchos_event describe(emissionGuide: true)
    . Missing model-emitted events = documentation gap or user error.
  • Sequence anomalies — wrong order, duplicates, or timeline gaps
  • Schema mismatches — use
    describe(eventTypes: [...])
    to get authoritative JSON Schema. Compare actual payloads against schema for semantically wrong fields.
对工作流的事件流使用
exarchos_event query(stream)
。查找:
  • 被拒绝的事件 — 代理尝试发送但未出现在日志中(与对话错误相互印证)
  • 缺失的事件 — 与剧本的
    events
    字段和
    exarchos_event describe(emissionGuide: true)
    的结果对比。模型未发出预期事件=文档缺失或用户错误。
  • 序列异常 — 顺序错误、重复或时间线间隙
  • 模式不匹配 — 使用
    describe(eventTypes: [...])
    获取权威JSON Schema。将实际负载与模式对比,检查语义错误的字段。

1e. Orchestrate Action and Gate Analysis

1e. 编排操作与网关分析

  1. Schema verification
    exarchos_orchestrate describe(actions: [...])
    for authoritative schemas. Compare agent's parameters against schema to detect stale skill docs or improvisation.
  2. Gate metadata — Describe output includes
    { blocking, dimension, autoEmits }
    . Check: did the agent treat blocking/non-blocking correctly? Did expected auto-emissions fire?
  3. Gate convergence
    exarchos_view convergence
    for per-dimension (D1-D5) pass rates. Low convergence suggests systemic gate issues.
  1. 模式验证 — 使用
    exarchos_orchestrate describe(actions: [...])
    获取权威模式。将代理的参数与模式对比,检测过时的技能文档或即兴操作。
  2. 网关元数据 — 描述输出包含
    { blocking, dimension, autoEmits }
    。检查:代理是否正确处理了阻塞/非阻塞?预期的自动事件是否触发?
  3. 网关收敛性 — 使用
    exarchos_view convergence
    查看各维度(D1-D5)的通过率。低收敛性表明存在系统性网关问题。

1f. Runbook Conformance Check

1f. 运行手册一致性检查

Use
exarchos_orchestrate runbook(phase)
to retrieve relevant runbooks. Check: step ordering, decision branch correctness (steps with
decide
fields),
onFail
directive adherence (
stop
/
continue
/
retry
), and
templateVars
completeness.
使用
exarchos_orchestrate runbook(phase)
获取相关运行手册。检查:步骤顺序、决策分支正确性(含
decide
字段的步骤)、
onFail
指令合规性(
stop
/
continue
/
retry
)以及
templateVars
的完整性。

1g. Telemetry Review

1g. 遥测数据审查

Use
exarchos_view telemetry
for per-tool performance. Flag: high error rates (systemic issues), high invocation counts (retry loops), and tools never invoked that the playbook prescribes.
使用
exarchos_view telemetry
查看各工具的性能。标记:高错误率(系统性问题)、高调用次数(重试循环)以及剧本规定但从未被调用的工具。

Step 2: Scan Session for Failed Tool Calls

步骤2:扫描会话中的失败工具调用

Supplement the debug trace with client-side context — review conversation for failed Exarchos tool calls.
Note: Platform-dependent step (requires conversation history). Skip on platforms without introspection; the debug trace is self-sufficient.
Target tools:
exarchos_workflow
,
exarchos_event
,
exarchos_orchestrate
,
exarchos_view
,
exarchos_sync
Error signals:
INVALID_INPUT
,
VALIDATION_ERROR
,
BATCH_APPEND_FAILED
, Zod failures (
invalid_type
,
invalid_enum_value
,
unrecognized_keys
),
ENOENT
,
CLAIM_FAILED
,
SEQUENCE_CONFLICT
, CAS exhaustion, retry sequences, successful-after-retry calls.
使用客户端上下文补充调试跟踪——审查对话中的Exarchos工具失败调用。
注意: 该步骤依赖平台(需要对话历史)。在不支持自省的平台上跳过此步骤;调试跟踪已足够。
目标工具:
exarchos_workflow
exarchos_event
exarchos_orchestrate
exarchos_view
exarchos_sync
错误信号:
INVALID_INPUT
VALIDATION_ERROR
BATCH_APPEND_FAILED
、Zod失败(
invalid_type
invalid_enum_value
unrecognized_keys
)、
ENOENT
CLAIM_FAILED
SEQUENCE_CONFLICT
、CAS耗尽、重试序列、重试后成功的调用。

Step 3: Diagnose Each Failure

步骤3:诊断每个故障

Merge debug trace and conversation scan findings. For each failure document:
  1. What was attempted — action, parameters, intent
  2. What went wrong — error message and validation path
  3. Server-side evidence — event log, state, describe output, views
  4. Authoritative reference — the self-service query providing ground truth (playbook, topology, schema, runbook)
  5. Root cause — per
    references/root-cause-patterns.md
  6. Fix category — code, docs, or user behavior
Flag discrepancies only visible via server-side inspection as trace-only findings.
合并调试跟踪和对话扫描的结果。为每个故障记录:
  1. 尝试的操作 — 动作、参数、意图
  2. 故障情况 — 错误消息和验证路径
  3. 服务器端证据 — 事件日志、状态、描述输出、视图
  4. 权威参考 — 提供真实依据的自助查询(剧本、拓扑、模式、运行手册)
  5. 根本原因 — 根据
    references/root-cause-patterns.md
  6. 修复类别 — 代码、文档或用户行为
将仅通过服务器端检查发现的差异标记为仅跟踪发现

Step 4: Categorize into Buckets

步骤4:分类到对应类别

Assign each failure to exactly one root cause bucket:
将每个故障分配到恰好一个根本原因类别:

Bucket 1: Code Bug

类别1:代码缺陷

The MCP server, event store, or workflow engine has a defect.
Signals: Schema rejects valid input (confirmed via
describe
), CAS failures with no concurrent writers, gate over-enforcement, identical-parameter retry succeeds (race condition), state corruption, topology/engine mismatch, auto-emission failure.
Action: File bug issue with reproduction steps, expected vs actual, and suggested fix.
MCP服务器、事件存储或工作流引擎存在缺陷。
信号: 模式拒绝有效输入(通过
describe
确认)、无并发写入时的CAS失败、网关过度限制、相同参数重试成功(竞态条件)、状态损坏、拓扑/引擎不匹配、自动事件触发失败。
操作: 创建缺陷工单,包含复现步骤、预期与实际结果以及建议修复方案。

Bucket 2: Documentation Issue

类别2:文档问题

Skill docs are wrong, incomplete, or out of sync with the MCP server's self-service output.
Signals: Skill payload doesn't match
describe
schema, skill/playbook divergence, skill documents nonexistent topology paths, missing event types (compare emission guide), retry-based field discovery, runbook/skill contradictions, compactGuidance drift.
Action: File docs issue with file:line, the discrepancy, and correct information from
describe
output.
技能文档存在错误、不完整或与MCP服务器的自助服务输出不同步。
信号: 技能负载与
describe
模式不匹配、技能/剧本不一致、技能文档记录了不存在的拓扑路径、缺失事件类型(与事件触发指南对比)、基于重试的字段发现、运行手册/技能矛盾、compactGuidance偏离。
操作: 创建文档工单,包含文件:行号、差异点以及来自
describe
输出的正确信息。

Bucket 3: User Error

类别3:用户错误

The agent misused a tool in a way both docs and
describe
output correctly describe.
Signals: Format mismatch (confirmed by
describe
+ docs agreement), invalid sequence (topology confirms), missing context both skill and playbook prescribe, runbook deviation without justification.
Action: Note for skill improvement if errors are frequent.
代理以技能文档和
describe
输出均正确描述的方式误用了工具。
信号: 格式不匹配(通过
describe
+文档一致确认)、无效序列(拓扑确认)、技能和剧本均要求的上下文缺失、无正当理由偏离运行手册。
操作: 如果错误频繁发生,记录下来用于技能改进。

Step 5: Generate Report

步骤5:生成报告

Produce the report using the template from
references/report-template.md
. Include:
  • Summary counts per bucket
  • Debug trace summary (workflows inspected, events reviewed, describe queries issued, views consulted)
  • Each failure with full diagnosis (including authoritative self-service references)
  • Trace-only findings section (issues only visible via server-side inspection)
  • Playbook/runbook adherence summary
  • Actionable next steps (draft issue bodies for bugs/docs issues)
使用
references/report-template.md
中的模板生成报告。包含:
  • 每个类别的汇总计数
  • 调试跟踪摘要(检查的工作流数量、审查的事件数量、执行的describe查询数量、查阅的视图、仅跟踪发现的数量)
  • 每个故障的完整诊断(包括权威自助服务参考)
  • 仅跟踪发现部分(仅通过服务器端检查发现的问题)
  • 剧本/运行手册合规性摘要
  • 可执行的下一步操作(缺陷/文档工单的草稿内容)

Step 6: Offer to File Issues

步骤6:提议创建工单

For findings in the Code Bug and Documentation Issue buckets, offer to create GitHub issues:
typescript
exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })
Only file issues with user confirmation — present the draft first.
对于代码缺陷文档问题类别中的发现,提议创建GitHub工单:
typescript
exarchos_orchestrate({ action: "create_issue", title: "<type>: <summary>", body: "<issue body>", labels: ["bug"] })
仅在用户确认后创建工单——先展示草稿。

Required Output Format

要求的输出格式

json
{
  "session_summary": {
    "total_tool_calls": 0,
    "failed_tool_calls": 0,
    "failure_rate": "0%",
    "debug_trace": {
      "workflows_inspected": 0,
      "events_reviewed": 0,
      "describe_queries": 0,
      "views_consulted": [],
      "trace_only_findings": 0
    }
  },
  "playbook_adherence": {
    "phases_checked": 0,
    "violations": [
      {
        "phase": "delegate",
        "field": "events",
        "expected": "team.spawned, team.task.assigned",
        "actual": "none emitted",
        "bucket": "documentation_issue"
      }
    ]
  },
  "runbook_conformance": {
    "runbooks_checked": 0,
    "deviations": []
  },
  "buckets": {
    "code_bug": [],
    "documentation_issue": [],
    "user_error": []
  },
  "findings": [
    {
      "id": 1,
      "bucket": "code_bug | documentation_issue | user_error",
      "tool": "exarchos_workflow",
      "action": "set",
      "error": "INVALID_INPUT: ...",
      "root_cause": "Schema rejects null branch on pending tasks",
      "trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
      "authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
      "severity": "HIGH | MEDIUM | LOW",
      "suggested_fix": "Accept nullable branch in TaskSchema",
      "issue_draft": {
        "title": "bug: workflow task schema rejects null branch",
        "labels": ["bug"],
        "body": "..."
      }
    }
  ],
  "trace_only_findings": [
    {
      "id": "T1",
      "description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
      "evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
      "authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
      "bucket": "documentation_issue",
      "suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
    }
  ]
}
json
{
  "session_summary": {
    "total_tool_calls": 0,
    "failed_tool_calls": 0,
    "failure_rate": "0%",
    "debug_trace": {
      "workflows_inspected": 0,
      "events_reviewed": 0,
      "describe_queries": 0,
      "views_consulted": [],
      "trace_only_findings": 0
    }
  },
  "playbook_adherence": {
    "phases_checked": 0,
    "violations": [
      {
        "phase": "delegate",
        "field": "events",
        "expected": "team.spawned, team.task.assigned",
        "actual": "none emitted",
        "bucket": "documentation_issue"
      }
    ]
  },
  "runbook_conformance": {
    "runbooks_checked": 0,
    "deviations": []
  },
  "buckets": {
    "code_bug": [],
    "documentation_issue": [],
    "user_error": []
  },
  "findings": [
    {
      "id": 1,
      "bucket": "code_bug | documentation_issue | user_error",
      "tool": "exarchos_workflow",
      "action": "set",
      "error": "INVALID_INPUT: ...",
      "root_cause": "Schema rejects null branch on pending tasks",
      "trace_evidence": "describe(actions: ['set']) shows branch as required string; event log confirms no task.updated event",
      "authoritative_ref": "exarchos_workflow describe(actions: ['set']) → TaskSchema",
      "severity": "HIGH | MEDIUM | LOW",
      "suggested_fix": "Accept nullable branch in TaskSchema",
      "issue_draft": {
        "title": "bug: workflow task schema rejects null branch",
        "labels": ["bug"],
        "body": "..."
      }
    }
  ],
  "trace_only_findings": [
    {
      "id": "T1",
      "description": "State drift: agent assumed phase was 'delegate' but server shows 'plan'",
      "evidence": "exarchos_workflow get shows phase=plan; topology confirms plan→delegate requires planReviewComplete guard",
      "authoritative_ref": "exarchos_workflow describe(topology: 'feature') → guards",
      "bucket": "documentation_issue",
      "suggested_fix": "Skill should instruct agent to verify phase via get before proceeding"
    }
  ]
}

Anti-Patterns

反模式

Don'tDo Instead
Skip the debug trace and only scan conversationAlways query MCP self-service tools first — conversation scan is supplementary
Guess what the schema expectsUse
describe
to get authoritative schemas — they are the source of truth
Assess playbook adherence from memoryQuery
describe(playbook)
to get the actual prescribed tools, events, and criteria
Assume the topology without checkingQuery
describe(topology)
to get valid transitions, guards, and effects
Blame the user when skill docs contradict the playbookIf skill docs diverge from playbook/describe output, it's a documentation issue
File duplicate issuesCheck existing open/closed issues before drafting
Categorize retries as separate failuresGroup retry sequences as a single finding
Ignore successful-after-retry callsThese reveal friction even though they eventually worked
Include non-Exarchos failuresScope strictly to the 5 Exarchos tools — other MCP failures are out of scope
Report conversation-only findings without trace corroborationCross-reference every finding with server-side state when possible
不要做正确做法
跳过调试跟踪,仅扫描对话始终先查询MCP自助服务工具——对话扫描仅作为补充
猜测模式的要求使用
describe
获取权威模式——它们是事实来源
凭记忆评估剧本合规性查询
describe(playbook)
获取实际规定的工具、事件和标准
不检查就假设拓扑查询
describe(topology)
获取有效转换、网关和效果
当技能文档与剧本矛盾时指责用户如果技能文档与剧本/describe输出不一致,这是文档问题
创建重复工单起草前检查现有开放/已关闭工单
将重试归类为单独的故障将重试序列归为单个发现
忽略重试后成功的调用这些调用即使最终成功也揭示了摩擦点
包含非Exarchos故障严格限定为5个Exarchos工具——其他MCP故障不在范围内
报告仅对话发现而无跟踪佐证尽可能用服务器端状态交叉验证每个发现