ghost

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ghost

Ghost

Overview

概述

Generate a ghost package (spec + tests + install prompt) from an existing repo.
Preserve behavior, not prose:
  • tests.yaml
    is the behavior contract (operation cases and/or scenarios)
  • source tests and/or captured traces are the primary evidence
  • code/docs/examples only fill gaps (never contradict evidence)
The output is language-agnostic so it can be implemented in any target language or harness.
Scenario testing frame (for agentic systems / tool loops):
  • Given an initial world state + tool surface + user goal
  • When the agent runs under realistic constraints and noise
  • Then it reaches an acceptable outcome without violating invariants (safety, security, cost, latency, policy)
从现有仓库生成Ghost包(规范+测试+安装提示)。
保留行为,而非文案:
  • tests.yaml
    是行为契约(操作用例和/或场景)
  • 源码测试和/或捕获的追踪数据是主要依据
  • 代码/文档/示例仅用于填补空白(绝不与依据矛盾)
输出内容与语言无关,可在任意目标语言或测试框架中实现。
场景测试框架(适用于智能体系统/工具循环):
  • 给定初始世界状态 + 工具接口 + 用户目标
  • 智能体在真实约束和干扰下运行时
  • 它能在不违反不变量(安全、合规、成本、延迟、策略)的情况下达成可接受的结果

Fit / limitations

适用场景与局限性

This approach works best when the system’s behavior can be expressed as deterministic data:
  • pure-ish operations (input -> output or error)
  • a runnable test suite covering the public API
It also works for agentic systems when behavior can be expressed as controlled, replayable scenarios:
  • a tool sandbox (stubs/record-replay/simulator)
  • machine-checkable oracles (state assertions + trace invariants)
  • a deterministic debug mode plus a production-like reliability mode (pass rates)
It gets harder (but is still possible) when the contract depends on time, randomness, IO, concurrency, global state, or platform details. In those cases, make assumptions explicit in
SPEC.md
+
VERIFY.md
, and normalize nondeterminism into explicit inputs/outputs.
当系统行为可表示为确定性数据时,此方法效果最佳:
  • 纯函数式操作(输入→输出或错误)
  • 覆盖公开API的可运行测试套件
对于智能体系统,当行为可表示为受控、可重放的场景时,此方法同样适用:
  • 工具沙箱(桩件/记录重放/模拟器)
  • 机器可校验的断言器(状态断言+追踪不变量)
  • 确定性调试模式 + 类生产环境的可靠性模式(通过率)
当契约依赖时间、随机性、IO、并发、全局状态或平台细节时,难度会增加(但仍可行)。在这些情况下,需在
SPEC.md
VERIFY.md
中明确做出假设,并将非确定性转换为显式输入/输出。

Hard rules (MUST / MUST NOT)

硬性规则(必须/禁止)

  • MUST treat upstream tests (and for agentic systems: captured traces/eval runs) as authoritative; if docs/examples disagree, prefer evidence and record the discrepancy.
  • MUST normalize nondeterminism in the environment/tool surface into explicit inputs/outputs (no implicit "now", random seeds, locale surprises, unordered iteration).
  • MUST make model/agent stochasticity explicit and test it as reliability: gate on pass rates + invariant-violation-free runs (not exact-text goldens).
  • MUST keep the ghost repo language-agnostic: ship no implementation code, adapter runner, or build tooling.
  • MUST paraphrase upstream docs; do not copy text verbatim.
  • MUST preserve upstream license files verbatim as
    LICENSE*
    .
  • MUST produce a verification signal and document it in
    VERIFY.md
    (adapter runner preferred; sampling fallback allowed).
  • MUST document provenance and regeneration in
    VERIFY.md
    (upstream repo + revision, how artifacts were produced, and how to rerun verification).
  • MUST choose a
    tests.yaml
    contract shape that matches the system style (functional API vs protocol/CLI vs scenario) and keep it consistent across
    SPEC.md
    ,
    INSTALL.md
    , and
    VERIFY.md
    .
  • MUST document the
    tests.yaml
    harness schema when it is non-trivial (callbacks, mutation steps, warnings, multi-step protocol setup, etc.).
    • Recommended artifact:
      TESTS_SCHEMA.md
      .
    • INSTALL.md
      MUST reference it when present.
  • MUST minimize
    skip
    cases; only skip when deterministic setup is currently infeasible, and record why.
  • MUST assert stable machine-interface fields explicitly (required keys, lengths/counts, and state effects), not only loose partial matches.
  • MUST treat human-readable warning/error messages as unstable unless tests prove they are part of the public contract.
    • Prefer structured fields (codes) or substring assertions for message checks.
  • MUST capture cross-operation state transitions when behavior depends on prior calls (for example session, instance, history, or tool-loop continuity).
  • MUST include executable end-to-end loop coverage for each primary stateful workflow (for example create -> act -> persist -> follow-up) with explicit pre/post state assertions.
  • MUST treat a stateful workflow as incomplete if only isolated operation cases exist; add scenario coverage in
    tests.yaml
    and verification proof before calling extraction done.
  • MUST include trace-level invariants for agentic scenarios (for example permission boundaries, confirmation-before-side-effects, injection resistance, budget/step limits).
  • MUST prefer oracles that score behavior via state + trace (tool calls, side effects) over brittle final-text matching.
  • MUST produce a machine-checkable evidence bundle under
    verification/evidence/
    and fail extraction unless it passes
    uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence
    .
  • MUST keep
    verification/evidence/inventory.json
    synchronized with
    tests.yaml
    :
    public_operations
    must match non-workflow operation ids and
    primary_workflows
    must match workflow/scenario ids (
    coverage_mode
    defaults to
    exhaustive
    ; when
    sampled
    , include
    sampled_case_ids
    ).
  • MUST ensure every required case id appears in
    traceability.csv
    and has at least one baseline (
    mutated=false
    )
    pass
    row in
    adapter_results.jsonl
    (all
    tests.yaml
    cases for
    exhaustive
    ;
    inventory.json.sampled_case_ids
    for
    sampled
    ).
  • MUST enforce fail-closed verification thresholds: 100% mapped public operations, 100% mapped primary workflows, and 100% mapped required case ids (all tests for
    exhaustive
    ; sampled ids for
    sampled
    ), plus mutation sensitivity and independent regeneration parity passes.
  • MUST declare verification coverage mode in
    VERIFY.md
    : default
    exhaustive
    ;
    sampled
    is allowed only when full adapter execution is infeasible and must list sampled case ids plus rationale (including
    inventory.json.sampled_case_ids
    ).
  • MUST define and enforce conformance profiles in generated artifacts:
    Core Conformance
    ,
    Extension Conformance
    , and
    Real Integration Profile
    .
  • MUST include
    Conformance Profile
    ,
    Validation Matrix
    , and
    Definition of Done
    sections in
    SPEC.md
    .
  • MUST include
    Summary
    ,
    Regenerate
    ,
    Validation Matrix
    ,
    Traceability Matrix
    ,
    Mutation Sensitivity
    ,
    Regeneration Parity
    , and
    Limitations
    sections in
    VERIFY.md
    .
  • MUST include typed failure classes for extraction/verification failures (for example missing artifacts, parse failures, and contract mismatches).
  • MUST require stateful/scenario ghost specs to include lifecycle structure sections in
    SPEC.md
    :
    State Model
    ,
    Transition Triggers
    ,
    Recovery/Idempotency
    , and
    Reference Algorithm
    .
  • MUST run the evidence verifier in strict mode by default; legacy bypass is break-glass only (
    --legacy-allow --legacy-reason "<rationale>"
    ) and never default.
  • 必须将上游测试(对于智能体系统:捕获的追踪/评估运行数据)视为权威依据;若文档/示例与依据矛盾,优先依据并记录差异。
  • 必须将环境/工具接口中的非确定性转换为显式输入/输出(隐式的“当前时间”、随机种子、区域设置异常、无序迭代均不允许)。
  • 必须明确模型/智能体的随机性,并将其作为可靠性进行测试:以通过率+无不变量违规为标准(而非精确文本匹配)。
  • 必须保持Ghost仓库与语言无关:不得包含实现代码、适配器运行器或构建工具。
  • 必须改写上游文档;不得直接复制文本。
  • 必须完整保留上游许可证文件,命名为
    LICENSE*
  • 必须生成验证信号并记录在
    VERIFY.md
    中(优先使用适配器运行器;采样作为备选方案)。
  • 必须在
    VERIFY.md
    中记录来源和重新生成方法(上游仓库+版本、工件生成方式、重新运行验证的方法)。
  • 必须选择与系统风格匹配的
    tests.yaml
    契约结构(函数式API vs 协议/CLI vs 场景),并在
    SPEC.md
    INSTALL.md
    VERIFY.md
    中保持一致。
  • tests.yaml
    测试框架的语义较为复杂时(回调、突变步骤、警告、多步骤协议设置等),必须记录其模式。
    • 推荐生成工件:
      TESTS_SCHEMA.md
    • 若存在该文件,
      INSTALL.md
      必须引用它。
  • 必须尽量减少
    skip
    用例;仅当确定性设置当前不可行时才跳过,并记录原因。
  • 必须显式断言稳定的机器接口字段(必填键、长度/数量、状态影响),而非仅使用松散的部分匹配。
  • 必须将人类可读的警告/错误消息视为不稳定内容,除非测试证明它们属于公开契约的一部分。
    • 优先使用结构化字段(错误码)或子串断言来检查消息。
  • 当行为依赖之前的调用时(例如会话、实例、历史记录或工具循环连续性),必须捕获跨操作的状态转换。
  • 必须为每个主要有状态工作流(例如 创建→执行→持久化→后续操作)提供可执行的端到端循环覆盖,并附带显式的前后状态断言。
  • 若仅存在孤立的操作用例,则有状态工作流视为不完整;需在
    tests.yaml
    中添加场景覆盖并提供验证依据后,方可完成提取。
  • 必须为智能体场景添加追踪级别的不变量(例如权限边界、副作用前确认、注入抵抗、预算/步骤限制)。
  • 优先使用通过状态+追踪(工具调用、副作用)评分行为的断言器,而非脆弱的最终文本匹配。
  • 必须在
    verification/evidence/
    下生成机器可校验的依据包,若无法通过
    uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence
    则提取失败。
  • 必须保持
    verification/evidence/inventory.json
    tests.yaml
    同步:
    public_operations
    必须与非工作流操作ID匹配,
    primary_workflows
    必须与工作流/场景ID匹配(
    coverage_mode
    默认为
    exhaustive
    ;若为
    sampled
    ,需包含
    sampled_case_ids
    )。
  • 必须确保每个必填用例ID都出现在
    traceability.csv
    中,且在
    adapter_results.jsonl
    中至少有一条基线(
    mutated=false
    )的
    pass
    记录(
    exhaustive
    模式下所有
    tests.yaml
    用例;
    sampled
    模式下为
    inventory.json.sampled_case_ids
    中的用例)。
  • 必须强制执行闭环验证阈值:100%映射公开操作、100%映射主要工作流、100%映射必填用例ID(
    exhaustive
    模式下所有测试;
    sampled
    模式下为采样ID),同时通过突变敏感性和独立重新生成一致性检查。
  • 必须在
    VERIFY.md
    中声明验证覆盖模式:默认
    exhaustive
    ;仅当完整适配器执行不可行时允许使用
    sampled
    ,且必须列出采样用例ID及理由(包括
    inventory.json.sampled_case_ids
    )。
  • 必须在生成的工件中定义并强制执行一致性配置文件:
    Core Conformance
    Extension Conformance
    Real Integration Profile
  • 必须在
    SPEC.md
    中包含
    Conformance Profile
    Validation Matrix
    Definition of Done
    章节。
  • 必须在
    VERIFY.md
    中包含
    Summary
    Regenerate
    Validation Matrix
    Traceability Matrix
    Mutation Sensitivity
    Regeneration Parity
    Limitations
    章节。
  • 必须为提取/验证失败定义类型化的错误类别(例如缺失工件、解析失败、契约不匹配)。
  • 必须要求有状态/场景Ghost规范在
    SPEC.md
    中包含生命周期结构章节:
    State Model
    Transition Triggers
    Recovery/Idempotency
    Reference Algorithm
  • 必须默认以严格模式运行依据验证器;仅在特殊情况下使用遗留绕过选项(
    --legacy-allow --legacy-reason "<rationale>"
    ),且不得作为默认设置。

Conformance profiles (required)

一致性配置文件(必填)

  • Core Conformance
    :
    • deterministic contract extraction requirements that every ghost package must satisfy
    • strict evidence gates and fail-closed verification
  • Extension Conformance
    :
    • optional behaviors implemented by an extraction for stronger fidelity or ergonomics
    • must be explicitly labeled as optional and tested if claimed
  • Real Integration Profile
    :
    • environment-dependent checks that validate production-like behavior
    • may be skipped only with explicit rationale in
      VERIFY.md
Profile usage rules:
  • SPEC.md
    and
    VERIFY.md
    must state which profile each validation requirement belongs to.
  • Validation Matrix
    and
    Definition of Done
    must align with the selected profile labels.
  • Stateful/scenario workflows must include lifecycle sections regardless of profile.
  • Core Conformance
    • 每个Ghost包必须满足的确定性契约提取要求
    • 严格的依据检查和闭环验证
  • Extension Conformance
    • 为提高保真度或易用性而添加的可选行为
    • 若声明支持,必须明确标记为可选并进行测试
  • Real Integration Profile
    • 依赖环境的检查,用于验证类生产环境的行为
    • 仅当在
      VERIFY.md
      中明确说明理由时方可跳过
配置文件使用规则:
  • SPEC.md
    VERIFY.md
    必须指明每个验证要求所属的配置文件。
  • Validation Matrix
    Definition of Done
    必须与所选配置文件标签保持一致。
  • 无论使用哪种配置文件,有状态/场景工作流都必须包含生命周期章节。

Inputs

输入项

  • Source repo path (git working tree)
  • Output repo name/location (default: sibling directory
    <repo-name>-ghost
    )
  • Upstream identity + revision (remote URL if available; tag/commit SHA)
  • Public surface if ambiguous:
    • library: functions/classes/modules
    • agentic system: tool names/schemas, permissions, and side-effect boundaries
  • Source language/runtime + how to run upstream tests
  • Any required runtime assumptions (timezone, locale, units, encoding)
For scenario-heavy (agentic) extractions, also collect:
  • scenario catalog (top user goals + failure modes)
  • tool error/latency behaviors (timeouts, 500s, malformed payloads)
  • explicit invariants (security, safety, cost, latency, policy)
  • 源码仓库路径(Git工作区)
  • 输出仓库名称/位置(默认:同级目录
    <repo-name>-ghost
  • 上游标识+版本(可用的远程URL;标签/提交SHA)
  • 若公开接口不明确,需提供:
    • 库:函数/类/模块
    • 智能体系统:工具名称/模式、权限、副作用边界
  • 源码语言/运行时 + 运行上游测试的方法
  • 任何必需的运行时假设(时区、区域设置、单位、编码)
对于以场景为主的(智能体)提取,还需收集:
  • 场景目录(核心用户目标+故障模式)
  • 工具错误/延迟行为(超时、500错误、格式错误的负载)
  • 显式不变量(安全、合规、成本、延迟、策略)

Conventions

约定

Operation ids

操作ID

tests.yaml
organizes cases by operation ids (stable identifiers for public API entries). Use a naming scheme that survives translation across languages:
  • foo
    (top-level function)
  • module.foo
    (namespaced function)
  • Class#method
    (instance method)
  • Class.method
    (static/class method)
Avoid language-specific spellings in ids (e.g., avoid
snake_case
vs
camelCase
wars). Prefer the canonical name used by the source library’s docs.
For agentic scenario suites, operation ids SHOULD match tool names as the agent sees them (e.g.
orders.lookup
,
tickets.create
).
tests.yaml
操作ID组织用例(公开API条目的稳定标识符)。使用可跨语言移植的命名方案:
  • foo
    (顶层函数)
  • module.foo
    (命名空间函数)
  • Class#method
    (实例方法)
  • Class.method
    (静态/类方法)
避免在ID中使用语言特定的拼写(例如避免
snake_case
camelCase
的差异)。优先使用源码库文档中的标准名称。
对于智能体场景套件,操作ID应与智能体所见的工具名称匹配(例如
orders.lookup
tickets.create
)。

Scenario ids

场景ID

When using scenario testing, keep scenario ids stable and descriptive:
  • refund.create_ticket_with_guardrails
  • calendar.reschedule_with_rate_limit
  • security.prompt_injection_from_tool_output
使用场景测试时,保持场景ID稳定且具描述性:
  • refund.create_ticket_with_guardrails
  • calendar.reschedule_with_rate_limit
  • security.prompt_injection_from_tool_output

Case ids

用例ID

Every executable case SHOULD carry a stable
case_id
and use it as the primary key across evidence artifacts.
  • Prefer
    <operation-id>.<behavior>
    for operation cases.
  • For single-case workflow/scenario targets, reusing the workflow/scenario id as
    case_id
    is acceptable.
  • traceability.csv
    and
    adapter_results.jsonl
    MUST use the same
    case_id
    tokens.
每个可执行用例应包含稳定的
case_id
,并将其作为所有依据工件的主键。
  • 操作用例优先使用
    <operation-id>.<behavior>
    格式。
  • 对于单例工作流/场景目标,可复用工作流/场景ID作为
    case_id
  • traceability.csv
    adapter_results.jsonl
    必须使用相同的
    case_id
    标识。

Contract shape

契约结构

Pick one schema and stay consistent:
  • Functional API layout: operation ids at top-level with
    {name,input,output|error}
    cases.
  • Protocol/CLI layout: top-level
    meta
    +
    operations
    , where operation ids live under
    operations
    and cases include command/state assertions.
  • Scenario layout (agentic systems): top-level
    meta
    +
    scenarios
    , where scenario ids live under
    scenarios
    and each scenario defines environment + tools + goal + oracles.
选择一种模式并保持一致:
  • 函数式API结构:顶层为操作ID,用例包含
    {name,input,output|error}
  • 协议/CLI结构:顶层为
    meta
    +
    operations
    ,操作ID位于
    operations
    下,用例包含命令/状态断言。
  • 场景结构(智能体系统):顶层为
    meta
    +
    scenarios
    ,场景ID位于
    scenarios
    下,每个场景定义环境+工具+目标+断言器。

tests.yaml
version

tests.yaml
版本

tests.yaml
MUST include a source version identifier that ties cases to upstream evidence.
  • If the upstream library has a release version (SemVer/tag), use it.
  • Otherwise, use an immutable source revision identifier (e.g.,
    git:<short-sha>
    or
    git describe
    ).
  • Functional layout: use top-level
    version
    .
  • Protocol/CLI layout: keep
    meta.version
    for test schema version and include
    meta.source_version
    for upstream evidence version.
  • Scenario layout: keep
    meta.version
    for schema version and include
    meta.source_version
    for upstream evidence version.
tests.yaml
必须包含源码版本标识符,将用例与上游依据关联。
  • 若上游库有发布版本(SemVer/标签),使用该版本。
  • 否则,使用不可变的源码版本标识符(例如
    git:<short-sha>
    git describe
    )。
  • 函数式结构:使用顶层
    version
    字段。
  • 协议/CLI结构:
    meta.version
    用于测试模式版本,
    meta.source_version
    用于上游依据版本。
  • 场景结构:
    meta.version
    用于模式版本,
    meta.source_version
    用于上游依据版本。

Workflow (tests-first)

工作流(测试优先)

0) Define scope and contract

0) 定义范围和契约

  • Write a one-line problem statement naming the upstream repo/revision and target ghost output path.
  • Choose one
    tests.yaml
    layout (functional, protocol/CLI, or scenario) and keep it consistent across
    SPEC.md
    ,
    INSTALL.md
    , and
    VERIFY.md
    .
  • Set success criteria: deterministic cases for every public operation, executable loop coverage for primary stateful workflows, and a recorded verification signal in
    VERIFY.md
    .
For agentic systems, define success criteria as:
  • critical scenarios expressed in a controlled tool sandbox
  • hard oracles + trace-level invariants (no critical violations)
  • reliability gates (pass rate thresholds) for production-like runs
  • 编写一行问题说明,指明上游仓库/版本和目标Ghost输出路径。
  • 选择一种
    tests.yaml
    结构(函数式、协议/CLI或场景),并在
    SPEC.md
    INSTALL.md
    VERIFY.md
    中保持一致。
  • 设置成功标准:每个公开操作的确定性用例、主要有状态工作流的可执行循环覆盖、
    VERIFY.md
    中记录的验证信号。
对于智能体系统,成功标准定义为:
  • 受控工具沙箱中的关键场景
  • 严格断言器+追踪级不变量(无关键违规)
  • 类生产环境运行的可靠性阈值(通过率)

1) Scope the source

1) 确定源码范围

  • Locate the test suite(s), examples, and primary docs (README, API docs, docs site).
  • Identify the public API and map each public operation to an operation id.
  • Use export/visibility cues to confirm what’s public:
    • JS/TS: package entrypoints + exports/re-exports
    • Python: top-level module +
      __all__
    • Rust:
      pub
      items re-exported from
      lib.rs
    • Zig:
      build.zig
      module graph (
      root_source_file
      ,
      addModule
      ,
      pub usingnamespace
      ) is source of truth; defaults are often
      src/root.zig
      (library) and
      src/main.zig
      (exe) but repos vary; treat C ABI
      export
      as public only if documented
    • C/C++: installed public headers + exported symbols; include macros/constants only if documented as API
    • Go: exported identifiers (Capitalized)
    • Java/C#:
      public
      types/members in the target package/namespace
    • Other: use the language’s visibility/export mechanism + published package entrypoints
  • Confirm which functions/classes are in scope:
    • public API + tests covering it
    • exclude internal helpers unless tests prove they are part of the contract
  • Identify primary user-facing workflows (especially stateful loops) and map each workflow to required operation sequences and state boundaries.
For agentic systems:
  • Identify the tool surface (names, schemas, permissions, rate limits).
  • Identify the environment/state (what changes when tools are called).
  • Identify invariants (safety/security/cost/latency/policy) that must hold across the full trace.
  • Build a coverage matrix (functional, robustness, safety/security/abuse, cost/latency).
  • Decide the output directory as a new sibling repo unless the user overrides.
  • 定位测试套件、示例和核心文档(README、API文档、文档站点)。
  • 识别公开API,并将每个公开操作映射到操作ID。
  • 使用导出/可见性线索确认公开范围:
    • JS/TS:包入口点+导出/重导出
    • Python:顶层模块+
      __all__
    • Rust:
      lib.rs
      中重导出的
      pub
    • Zig:
      build.zig
      模块图(
      root_source_file
      addModule
      pub usingnamespace
      )为权威依据;默认通常为
      src/root.zig
      (库)和
      src/main.zig
      (可执行文件),但仓库可能有差异;仅当文档明确说明时,才将C ABI
      export
      视为公开内容
    • C/C++:已安装的公开头文件+导出符号;仅当文档明确说明时,才将宏/常量视为API的一部分
    • Go:导出标识符(首字母大写)
    • Java/C#:目标包/命名空间中的
      public
      类型/成员
    • 其他语言:使用语言的可见性/导出机制+已发布的包入口点
  • 确认哪些函数/类属于范围内:
    • 公开API+覆盖它的测试
    • 排除内部辅助函数,除非测试证明它们属于契约的一部分
  • 识别主要的用户工作流(尤其是有状态循环),并将每个工作流映射到所需的操作序列和状态边界。
对于智能体系统:
  • 识别工具接口(名称、模式、权限、速率限制)。
  • 识别环境/状态(工具调用时会改变的内容)。
  • 识别必须在整个追踪过程中保持的不变量(安全/合规/成本/延迟/策略)。
  • 构建覆盖矩阵(功能、鲁棒性、安全/合规/滥用、成本/延迟)。
  • 除非用户另行指定,否则将输出目录设置为新的同级仓库。

2) Harvest behavior evidence

2) 收集行为依据

  • Extract test cases and expected outputs (or scenario traces); treat evidence as authoritative.
  • When tests are silent, read code/docs to infer behavior and record the inference.
  • Note all boundary values, rounding rules, encoding rules, and error cases.
  • If the API promises "copy"/"detached" behavior, harvest mutation-isolation evidence (including nested structure mutation, not just top-level fields).
  • For stateful APIs, harvest continuity evidence across steps (persisted ids, history chains, context/tool carry-forward, and reset semantics).
  • Normalize environment assumptions:
    • eliminate dependency on current time (use explicit timestamps)
    • force timezone/locale rules if relevant
    • remove nondeterminism (random seeds, unordered iteration)
For scenario suites, also harvest:
  • realistic tool failures (timeouts/500s/malformed JSON/partial results) and backoff/retry behavior
  • prompt-injection-like tool outputs and required refusal/ignore behavior
  • stop conditions (max steps, budget) and graceful halts
  • 提取测试用例和预期输出(或场景追踪数据);将依据视为权威。
  • 当测试未覆盖时,阅读代码/文档推断行为并记录推断过程。
  • 记录所有边界值、舍入规则、编码规则和错误场景。
  • 若API承诺“复制”/“独立”行为,收集 mutation-isolation 依据(包括嵌套结构突变,而非仅顶层字段)。
  • 对于有状态API,收集跨步骤的连续性依据(持久化ID、历史链、上下文/工具传递、重置语义)。
  • 规范化环境假设:
    • 消除对当前时间的依赖(使用显式时间戳)
    • 若相关,强制时区/区域设置规则
    • 移除非确定性(随机种子、无序迭代)
对于场景套件,还需收集:
  • 真实的工具故障(超时/500错误/格式错误的JSON/部分结果)和退避/重试行为
  • 类似提示注入的工具输出和所需的拒绝/忽略行为
  • 停止条件(最大步骤、预算)和优雅终止机制

3) Write
SPEC.md
(strict, language-agnostic)

3) 编写
SPEC.md
(严格、与语言无关)

  • Include
    Conformance Profile
    ,
    Validation Matrix
    , and
    Definition of Done
    sections.
  • Describe types abstractly (number/string/object/timestamp/bytes/etc.).
  • For bytes/buffers, define a canonical encoding (hex or base64) and use it consistently in
    tests.yaml
    .
  • Define normalization rules (e.g., timestamp parsing, string trimming, unicode, case folding).
  • Specify error behavior precisely (conditions), but keep the mechanism language-idiomatic.
  • Include typed failure classes in the spec surface (machine-checkable failure names/codes where possible).
  • Specify every public operation with inputs, outputs, rules, and edge cases.
  • When an operation yields both a "prepared" value and a "persisted delta" (or similar), define the delta derivation mechanically (slice/filter/identity rules) and test it.
  • Specify cross-operation invariants for primary workflows (state transitions, required ordering, and continuity guarantees).
  • For scenarios, specify:
    • state model and transition triggers
    • recovery/idempotency behavior
    • reference algorithm overview (language-agnostic)
    • environment state model and reset semantics
    • tool surface contracts (schemas, permissions, rate limits)
    • invariants as explicit, testable rules (trace-level)
  • Paraphrase source docs; do not copy text verbatim.
  • Use
    references/templates.md
    for structure.
  • 包含
    Conformance Profile
    Validation Matrix
    Definition of Done
    章节。
  • 抽象描述类型(数字/字符串/对象/时间戳/字节等)。
  • 对于字节/缓冲区,定义标准编码(十六进制或base64),并在
    tests.yaml
    中统一使用。
  • 定义规范化规则(例如时间戳解析、字符串修剪、Unicode、大小写折叠)。
  • 精确指定错误行为(条件),但保持错误机制符合语言习惯。
  • 在规范中包含类型化的错误类别(尽可能使用机器可校验的错误名称/代码)。
  • 详细说明每个公开操作的输入、输出、规则和边缘情况。
  • 当操作同时返回“预处理值”和“持久化增量”(或类似内容)时,机械定义增量推导规则(切片/过滤/恒等规则)并进行测试。
  • 为主要工作流指定跨操作不变量(状态转换、必需顺序、连续性保证)。
  • 对于场景,指定:
    • 状态模型和转换触发器
    • 恢复/幂等行为
    • 参考算法概述(与语言无关)
    • 环境状态模型和重置语义
    • 工具接口契约(模式、权限、速率限制)
    • 作为显式可测试规则的不变量(追踪级别)
  • 改写源码文档;不得直接复制文本。
  • 使用
    references/templates.md
    作为结构模板。

4) Generate
tests.yaml
(exhaustive)

4) 生成
tests.yaml
(全面覆盖)

  • Convert each source test into a YAML case under its operation id.
  • Include the source version identifier (
    version
    or
    meta.source_version
    ).
  • Schema is intentionally strict and portable; choose the contract shape from Conventions:
    • Functional layout:
      • each case has
        name
        ,
        input
        , and a stable
        case_id
        (recommended)
      • each case has exactly one of
        output
        or
        error: true
    • Protocol/CLI layout:
      • top-level
        meta
        +
        operations
      • each case has
        case_id
        ,
        name
        ,
        input
        , and deterministic expected outcomes (for example
        exit_code
        , machine-readable stdout assertions, and state assertions)
    • keep to a portable YAML subset (no anchors/tags/binary) so it is easy to parse in many languages
    • quote ambiguous scalars (
      yes
      ,
      no
      ,
      on
      ,
      off
      ,
      null
      ) to avoid parser disagreements
  • Normalize inputs to deterministic values (avoid "now"; use explicit timestamps).
  • Keep or improve coverage across all public operations and failure modes.
  • Add scenario cases for primary stateful workflows so the contract proves end-to-end loop behavior, not only per-operation correctness.
  • For agentic systems, prefer the scenario layout and define each scenario as:
    • initial state (what the agent knows + world state)
    • tool sandbox (stubs/record-replay/simulator) and permissions
    • dynamics (how the world responds to tool calls, including failures/delays)
    • success criteria (final state and/or required tool side effects)
    • oracles (hard assertions + trace invariants; optional rubric judge)
  • Prefer exact/value-complete assertions for stable output fields; use partial assertions only when fields are intentionally volatile.
  • If assertions use path lookups, define path resolver semantics in
    TESTS_SCHEMA.md
    (root object, dot segments,
    [index]
    arrays, and "missing path fails assertion").
  • For warning/error message checks, prefer substring assertions unless the exact wording is itself part of the upstream contract.
  • If
    tests.yaml
    includes harness directives beyond basic
    {name,input,output|error}
    (e.g. callbacks by label, mutation steps, warning sinks, setup scripts), document them in
    TESTS_SCHEMA.md
    .
  • Keep
    skip
    rare; every skip must include a concrete reason and be accounted for in
    VERIFY.md
    .
  • If the source returns floats, prefer defining stable rounding/formatting rules so
    output
    is exact.
  • Follow the format in
    references/templates.md
    .
  • 将每个源码测试转换为对应操作ID下的YAML用例。
  • 包含源码版本标识符(
    version
    meta.source_version
    )。
  • 模式设计为严格且可移植;从约定中选择契约结构:
    • 函数式结构:
      • 每个用例包含
        name
        input
        和稳定的
        case_id
        (推荐)
      • 每个用例必须包含
        output
        error: true
        中的一个
    • 协议/CLI结构:
      • 顶层为
        meta
        +
        operations
      • 每个用例包含
        case_id
        name
        input
        和确定性预期结果(例如
        exit_code
        、机器可读的stdout断言、状态断言)
    • 使用可移植的YAML子集(无锚点/标签/二进制数据),以便在多种语言中轻松解析
    • 对歧义标量(
      yes
      no
      on
      off
      null
      )添加引号,避免解析器差异
  • 将输入规范化为确定性值(避免使用“当前时间”;使用显式时间戳)。
  • 保持或提升所有公开操作和故障模式的覆盖范围。
  • 为主要有状态工作流添加场景用例,使契约能证明端到端循环行为,而非仅单个操作的正确性。
  • 对于智能体系统,优先使用场景结构,每个场景定义为:
    • 初始状态(智能体已知内容+世界状态)
    • 工具沙箱(桩件/记录重放/模拟器)和权限
    • 动态规则(世界对工具调用的响应,包括故障/延迟)
    • 成功标准(最终状态和/或必需的工具副作用)
    • 断言器(严格断言+追踪不变量;可选评分规则)
  • 对于稳定输出字段,优先使用精确/完整值断言;仅当字段有意设计为易变时,才使用部分断言。
  • 若断言使用路径查找,需在
    TESTS_SCHEMA.md
    中定义路径解析器语义(根对象、点分隔段、
    [index]
    数组、“路径不存在则断言失败”)。
  • 对于警告/错误消息检查,优先使用子串断言,除非确切措辞本身属于上游契约的一部分。
  • tests.yaml
    包含基础
    {name,input,output|error}
    之外的测试框架指令(例如按标签回调、突变步骤、警告收集、多步骤协议设置),需在
    TESTS_SCHEMA.md
    中记录。
  • 尽量减少
    skip
    用例;每个跳过的用例必须包含具体原因,并在
    VERIFY.md
    中说明。
  • 若源码返回浮点数,优先定义稳定的舍入/格式化规则,使
    output
    值精确。
  • 遵循
    references/templates.md
    中的格式。

5) Add
INSTALL.md
+
README.md
+
VERIFY.md
+
LICENSE*

5) 添加
INSTALL.md
+
README.md
+
VERIFY.md
+
LICENSE*

  • INSTALL.md
    : a short prompt for implementing the library in any language, referencing
    SPEC.md
    and
    tests.yaml
    .
  • README.md
    : explain what the ghost library is, list operations, and describe the included files.
  • TESTS_SCHEMA.md
    (when needed): define the
    tests.yaml
    harness schema and any callback catalogs or side-effect capture requirements.
  • VERIFY.md
    : describe provenance + how the ghost artifacts were produced and verified against the source library (adapter-first; sampling fallback).
    • include
      Summary
      ,
      Regenerate
      ,
      Validation Matrix
      ,
      Traceability Matrix
      ,
      Mutation Sensitivity
      ,
      Regeneration Parity
      , and
      Limitations
      sections
    • include upstream repo identity + exact revision (tag or commit)
    • include the exact commands used to produce each artifact (or a single deterministic regeneration recipe)
    • include the exact commands used to run verification and the resulting pass/skip counts
    • include any environment normalization assumptions
    • include a summary of
      verification/evidence/
      and the verifier command/result
    • if legacy verifier bypass is used, include explicit break-glass rationale and follow-up remediation plan
  • LICENSE*
    : preserve the upstream repo’s license files verbatim.
    • copy common files like
      LICENSE
      ,
      LICENSE.md
      ,
      COPYING*
    • if no license file exists upstream, include a
      LICENSE
      file stating that no upstream license was found
  • INSTALL.md
    :用于在任意语言中实现该库的简短提示,引用
    SPEC.md
    tests.yaml
  • README.md
    :说明Ghost库的用途,列出操作,描述包含的文件。
  • TESTS_SCHEMA.md
    (可选;当tests.yaml包含复杂测试框架语义时添加):定义
    tests.yaml
    测试框架的模式和任何回调目录或副作用捕获要求。
  • VERIFY.md
    :描述来源+Ghost工件的生成和验证方式(优先使用适配器;采样作为备选)。
    • 包含
      Summary
      Regenerate
      Validation Matrix
      Traceability Matrix
      Mutation Sensitivity
      Regeneration Parity
      Limitations
      章节
    • 包含上游仓库标识+精确版本(标签或提交)
    • 包含生成每个工件的确切命令(或单个确定性重新生成方案)
    • 包含运行验证的确切命令和结果的通过/跳过计数
    • 包含任何环境规范化假设
    • 包含
      verification/evidence/
      的摘要和验证器命令/结果
    • 若使用遗留验证器绕过选项,需包含明确的特殊情况理由和后续修复计划
  • LICENSE*
    :完整保留上游仓库的许可证文件。
    • 复制常见文件如
      LICENSE
      LICENSE.md
      COPYING*
    • 若上游无许可证文件,添加
      LICENSE
      文件说明未找到上游许可证

6) Verify fidelity (must do)

6) 验证保真度(必须执行)

  • Ensure
    tests.yaml
    parses and case counts match or exceed the source tests covering the public API.
  • Ensure every operation id has at least one executable (non-
    skip
    ) case unless infeasible, and list any exceptions in
    VERIFY.md
    .
  • Preferred: create a temporary adapter runner in the source language to run
    tests.yaml
    against the upstream system (library or agent).
    • if the source language has weak YAML tooling, parse YAML externally and dispatch into the library via a tiny CLI/FFI shim
    • assert expected outcomes match exactly (outputs/errors for functional layout; exit/status/payload/state assertions for protocol layout)
    • for stateful workflows, execute end-to-end loop scenarios and assert continuity/persistence effects across steps
    • delete the adapter afterward; do not ship it in the ghost repo
    • summarize how to run it (and results) in
      VERIFY.md
  • Build a fail-closed evidence bundle in
    verification/evidence/
    :
    • inventory.json
      (public operations + primary workflows, including reset requirements; optional
      coverage_mode
      , and
      sampled_case_ids
      when
      coverage_mode=sampled
      )
    • traceability.csv
      (operation/workflow -> case ids -> proof artifact -> adapter run id)
    • workflow_loops.json
      (loop cases + continuity assertions + reset assertions when required)
    • adapter_results.jsonl
      (case-level results with
      run_id
      ,
      case_id
      ,
      status
      , and mutation marker)
    • mutation_check.json
      (required mutation count + detected failures + pass/fail)
    • parity.json
      (independent regeneration parity verdict + diff count)
  • Run
    uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence
    ; non-zero exit means extraction is incomplete.
  • Strict mode is default and fail-closed. Use
    --legacy-allow --legacy-reason "<rationale>"
    only for explicit manual break-glass migrations.
  • For stochastic agentic systems:
    • run scenarios in two modes:
      • deterministic debug mode (stable tool outputs; fixed seed when possible)
      • production-like mode (real sampling settings)
    • run each critical scenario N times and record pass rate + cost/latency distributions
    • release gates: no critical invariant violations and pass rate meets threshold
  • If a full adapter is infeasible:
    • run a representative sample across all operation ids (typical + boundary + error)
    • document the limitation clearly in
      VERIFY.md
  • Use
    references/verification.md
    for a checklist and
    VERIFY.md
    template.
  • 确保
    tests.yaml
    可解析,且用例数量等于或超过覆盖公开API的上游测试数量。
  • 确保每个操作ID至少有一个可执行(非
    skip
    )用例,除非不可行,并在
    VERIFY.md
    中列出例外情况。
  • 优先方案:使用源码语言创建临时适配器运行器,将
    tests.yaml
    与上游系统(库或智能体)对接。
    • 若源码语言的YAML工具较弱,可在外部解析YAML,通过小型CLI/FFI垫片调度到库中
    • 断言预期结果完全匹配(函数式结构的输出/错误;协议结构的退出码/状态/负载/状态断言)
    • 对于有状态工作流,执行端到端循环场景并断言跨步骤的连续性/持久化效果
    • 之后删除适配器;不得将其包含在Ghost仓库中
    • VERIFY.md
      中总结运行方法(和结果)
  • verification/evidence/
    中构建闭环依据包:
    • inventory.json
      (公开操作+主要工作流,包括重置要求;可选
      coverage_mode
      ,当
      coverage_mode=sampled
      时包含
      sampled_case_ids
    • traceability.csv
      (操作/工作流→用例ID→依据工件→适配器运行ID)
    • workflow_loops.json
      (循环用例+连续性断言+必要时的重置断言)
    • adapter_results.jsonl
      (用例级结果,包含
      run_id
      case_id
      status
      和突变标记)
    • mutation_check.json
      (必需的突变次数+检测到的失败+通过/失败状态)
    • parity.json
      (独立重新生成一致性 verdict + 差异计数)
  • 运行
    uv run --with pyyaml -- python scripts/verify_evidence.py --bundle <ghost-repo>/verification/evidence
    ;非零退出码表示提取未完成。
  • 默认使用严格模式和闭环验证。仅在明确的手动特殊迁移场景下,方可使用
    --legacy-allow --legacy-reason "<rationale>"
  • 对于随机智能体系统:
    • 在两种模式下运行场景:
      • 确定性调试模式(稳定工具输出;尽可能固定种子)
      • 类生产环境模式(真实采样设置)
    • 每个关键场景运行N次,记录通过率+成本/延迟分布
    • 发布标准:无关键不变量违规且通过率满足阈值
  • 若无法实现完整适配器:
    • 在所有操作ID中运行代表性采样(典型+边界+错误场景)
    • VERIFY.md
      中明确记录局限性
  • 使用
    references/verification.md
    作为检查清单和
    VERIFY.md
    模板。

Reproducibility and regen policy

可重复性和重新生成策略

  • The ghost repo must be reproducible: a future developer should be able to point at the upstream revision and rerun the extraction + verification.
  • Do not add regeneration scripts as tracked files unless the user explicitly asks; put the recipe in
    VERIFY.md
    instead.
  • Ghost仓库必须可重复生成:未来开发者应能指向上游版本,重新运行提取+验证流程。
  • 除非用户明确要求,否则不要将重新生成脚本作为跟踪文件添加;应将方案记录在
    VERIFY.md
    中。

Output

输出

Produce only these artifacts in the ghost repo:
  • README.md
  • SPEC.md
  • tests.yaml
  • TESTS_SCHEMA.md
    (optional; include when tests.yaml has non-trivial harness semantics)
  • INSTALL.md
  • VERIFY.md
  • verification/evidence/inventory.json
  • verification/evidence/traceability.csv
  • verification/evidence/workflow_loops.json
  • verification/evidence/adapter_results.jsonl
  • verification/evidence/mutation_check.json
  • verification/evidence/parity.json
  • verification/evidence/structure_contract.json
    (optional, recommended for explicit structure policy)
  • LICENSE*
    (copied from upstream)
  • .gitignore
    (optional, minimal)
Ghost仓库中只能包含以下工件:
  • README.md
  • SPEC.md
  • tests.yaml
  • TESTS_SCHEMA.md
    (可选;当tests.yaml包含复杂测试框架语义时添加)
  • INSTALL.md
  • VERIFY.md
  • verification/evidence/inventory.json
  • verification/evidence/traceability.csv
  • verification/evidence/workflow_loops.json
  • verification/evidence/adapter_results.jsonl
  • verification/evidence/mutation_check.json
  • verification/evidence/parity.json
  • verification/evidence/structure_contract.json
    (可选,推荐用于显式结构策略)
  • LICENSE*
    (从上游复制)
  • .gitignore
    (可选,极简)

Notes

注意事项

  • Prefer precision over verbosity; rules should be unambiguous and testable.
  • Keep the ghost repo free of implementation code and packaging scaffolding.
  • 优先精确性而非冗长性;规则应明确且可测试。
  • Ghost仓库中不得包含实现代码和打包脚手架。

Zig notes

Zig相关注意事项

  • Running upstream tests: prefer
    zig build test
    (if
    build.zig
    defines tests); otherwise
    zig test path/to/file.zig
    for the library root and any test entrypoints.
  • Operation ids for methods: treat a first parameter named
    self
    of type
    T
    /
    *T
    as an instance method (
    T#method
    ); otherwise use
    T.method
    .
  • comptime
    parameters: record allowed values in
    SPEC.md
    , and represent them as ordinary fields in
    tests.yaml
    inputs.
  • Allocators/buffers: if the API takes
    std.mem.Allocator
    or caller-provided buffers, specify ownership and mutation rules; assume allocations succeed unless tests cover OOM.
  • Errors:
    • Functional layout: keep
      tests.yaml
      strict (
      error: true
      only); in a Zig adapter, treat "any error return" as a passing error case and rely on
      SPEC.md
      for exact conditions.
    • Protocol/CLI layout: prefer explicit machine-readable error payload assertions plus exit codes.
  • YAML tooling: Zig stdlib has JSON but not YAML; for adapters/implementations it’s fine to convert
    tests.yaml
    to JSON (or JSONL) as an intermediate and have a Zig runner parse it via
    std.json
    .
  • 运行上游测试:优先使用
    zig build test
    (若
    build.zig
    定义了测试);否则,对库根和任何测试入口点使用
    zig test path/to/file.zig
  • 方法的操作ID:将名为
    self
    、类型为
    T
    /
    *T
    的第一个参数视为实例方法(
    T#method
    );否则使用
    T.method
  • comptime
    参数:在
    SPEC.md
    中记录允许的值,并在
    tests.yaml
    输入中作为普通字段表示。
  • 分配器/缓冲区:若API接受
    std.mem.Allocator
    或调用者提供的缓冲区,需指定所有权和突变规则;除非测试覆盖OOM情况,否则假设分配成功。
  • 错误处理:
    • 函数式结构:保持
      tests.yaml
      严格(仅允许
      error: true
      );在Zig适配器中,将“任何错误返回”视为通过的错误用例,依赖
      SPEC.md
      说明确切条件。
    • 协议/CLI结构:优先使用显式的机器可读错误负载断言+退出码。
  • YAML工具:Zig标准库包含JSON但不包含YAML;对于适配器/实现,可将
    tests.yaml
    转换为JSON(或JSONL)作为中间步骤,再由Zig运行器通过
    std.json
    解析。

Resources

资源

  • references/templates.md
    (artifact outlines and YAML format)
  • references/verification.md
    (verification checklist +
    VERIFY.md
    template)
  • references/templates.md
    (工件大纲和YAML格式)
  • references/verification.md
    (验证检查清单+
    VERIFY.md
    模板)