agent-output-audit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Output Audit
Agent输出审计
Independent verification of AI-implemented work. The skill that asks: "Did the implementing agent actually do what says it did?" — not "Would a real user succeed at this product?" (that's ).
task_NN.mdqa-execution对AI实现的工作进行独立验证。该技能旨在确认:“执行任务的Agent是否真正完成了中要求的工作?” —— 而非*“真实用户能否成功使用该产品?”*(后者属于的范畴)。
task_NN.mdqa-executionRequired Reading Router
必读文档指引
Match your task to the row. Read the listed files in full before producing output. They are not appendices — they are load-bearing. Inline content in this SKILL.md is a pointer, not a substitute.
| Task | MUST read |
|---|---|
| Discovering install/lint/test/build/start commands (Step 1) | |
| Deciding E2E support and classifying coverage (Step 1) | |
| Building the audit scope checklist (Step 2) | |
| Holding independent-evaluator stance on AI tasks (Step 3) | |
| Scanning test diffs for AI hygiene red flags (Step 4) | |
| Diagnosing a test that passed on retry without a code change | |
将你的任务与对应行匹配。在生成输出前完整阅读列出的文档。这些文档不是附录——它们是核心支撑内容。本SKILL.md中的内联内容仅作为指引,不能替代这些文档。
| 任务 | 必须阅读的文档 |
|---|---|
| 发现install/lint/test/build/start命令(步骤1) | |
| 判断E2E支持情况并分类测试覆盖范围(步骤1) | |
| 构建审计范围检查清单(步骤2) | |
| 在AI任务中保持独立评估者立场(步骤3) | |
| 扫描测试差异以发现AI测试规范的红色预警(步骤4) | |
| 诊断无需修改代码即可通过重试的测试 | |
Reference Index
参考文档索引
- — Heuristics for picking install/lint/test/build/start commands across ecosystems when the repo lacks an umbrella gate.
references/project-signals.md - — Taxonomy for
references/e2e-coverage.md/existing-e2e/needs-e2e/manual-onlyand how to detect harness support.blocked - — Audit checklist by category: contract discovery, baseline, task audit, AI hygiene, flaky detection, quality gates.
references/checklist.md - — Red Flag scanners (RF-1..RF-6), Requirement→Test mapping, verdict matrix for completed tasks.
references/ai-implementation-audit.md - — What counts (and doesn't count) as evidence; transcript classification (
references/independent-evaluator-protocol.md/genuine-failure/grader-bug/ambiguous-task).bypass-exploit - — Taxonomy, diagnosis protocol, and quarantine workflow for retry-passes-without-code-change failures.
references/flaky-triage.md
- — 当仓库没有统一的质量门时,用于在不同技术栈中选择install/lint/test/build/start命令的启发式规则。
references/project-signals.md - — 针对
references/e2e-coverage.md/existing-e2e/needs-e2e/manual-only的分类体系,以及如何检测E2E测试框架的支持情况。blocked - — 按类别划分的审计检查清单:契约发现、基线验证、任务审计、AI测试规范、不稳定测试检测、质量门。
references/checklist.md - — 红色预警扫描器(RF-1至RF-6)、需求→测试映射规则、已完成任务的判定矩阵。
references/ai-implementation-audit.md - — 哪些内容可作为证据(哪些不可);对话记录分类(
references/independent-evaluator-protocol.md/genuine-failure/grader-bug/ambiguous-task)。bypass-exploit - — 无需修改代码即可通过重试的失败案例的分类体系、诊断流程和隔离工作流。
references/flaky-triage.md
Required Inputs
必要输入
- audit-output-path (optional): Directory where audit artifacts (bugs, audit report, evidence) are stored. When provided, create the directory if it does not exist and use it for all audit outputs. When omitted, fall back to repository conventions or .
/tmp/agent-output-audit-<slug>
- audit-output-path(可选):存储审计工件(缺陷、审计报告、证据)的目录。如果提供了该路径,若目录不存在则创建,并将所有审计输出存入其中。若未提供,则遵循仓库约定或使用。
/tmp/agent-output-audit-<slug>
Procedures
流程步骤
Step 1: Discover the Repository Verification Contract
- Read root instructions, repository docs, and CI/build files before running commands.
- Execute to surface candidate install, verify, build, test, lint, start commands, and E2E signals.
python3 scripts/discover-project-contract.py --root . - STOP. Read in full before picking commands when discovery surfaces more than one plausible gate or the repo mixes ecosystems.
references/project-signals.md - STOP. Read in full before classifying any flow.
references/e2e-coverage.md - Prefer repository-defined umbrella commands such as ,
make verify, or CI entrypoints over language-default commands.just verify - Resolve the audit artifact directory. If the user provided an argument, use it. Otherwise use repository conventions, falling back to
audit-output-path. Create the/tmp/agent-output-audit-<slug>subdirectory; store all bugs and reports underaudit/.<audit-output-path>/audit/ - Detect Compozy mode. If exists, record the slug and switch into Compozy-aware audit:
.compozy/tasks/<slug>/- Read (read-only — never write to it;
state.yamlowns mutation per the cy-codex-loop contract).scripts/update-state.py - Read (deliverable source of truth) and
_techspec.md(task roster) when present._tasks.md - List every and capture its frontmatter
task_NN.mdvalue (allowed:status:,pending,in_progress). Whencompletedfrontmatter disagrees withtask_NN.md, treat frontmatter as the source of truth.state.yaml - Note the canonical memory slot — Step 4 writes audit notes there before any status flips.
.compozy/tasks/<slug>/memory/qa-execution.md
- Read
Step 2: Run the Baseline Verification Gate
- Install dependencies with the repository-preferred command.
- Run the canonical verification gate once before any audit work. Execute in fastest-first order: lint and type-check, then build, then unit tests, then integration tests.
- If the E2E command is separate from the umbrella gate, decide whether to run it now or after runtime prerequisites are ready, then record that plan explicitly.
- If the baseline fails, read the first failing output carefully and determine whether it is pre-existing or introduced by current work before moving on.
4a. Flaky-failure protocol. When a baseline command fails, before classifying as pre-existing or new, run the failing test in isolation 3-5 times on the same SHA. If it passes at least once without code changes, classify as , record in
flaky-suspectunderaudit-report.md(test name, attempts, retry outcome, suspected category), and do NOT promote to PASS via retry. STOP. ReadSUITE HEALTH SNAPSHOTin full before assigning a suspected category or proposing a quarantine.references/flaky-triage.md
Step 3: Audit Task Implementations (Compozy mode and any AI-implemented tasks)
Skip this step only when no task, phase, PRD, tech spec, or implementation-plan artifacts exist.
- STOP. Read in full before forming any task verdict. Tripwire summary: never accept the implementing agent's transcript, success message, or memory note as evidence. In Compozy mode, read the implementing agent's
references/independent-evaluator-protocol.mdartifacts and classify anomalies (.compozy/tasks/<slug>/memory/<phase>.md/genuine-failure/grader-bug/ambiguous-task) in thebypass-exploitsection ofErrors / Correctionsbefore judging the task.memory/qa-execution.md - Read each and its body. Summarize each task into a Task Implementation Matrix (column names mirror cy-codex-loop frontmatter):
task_NN.md- (e.g.,
task_path).compozy/tasks/<slug>/task_07.md - — literal frontmatter
declared_statusvaluestatus: - ,
title,type,complexity— mirrored from frontmatterdependencies - — linked section in
techspec_deliverablewhen present_techspec.md - Requirements, subtasks, checklist items, success criteria, dependent files
- — files, modules, routes, commands, migrations, seeds, tests
implementation_evidence - — commands executed, exit codes, output summaries
verification_evidence - —
qa_verdict|PASS|PARTIAL|FAIL|REOPEN(distinct fromBLOCKED)declared_status - — red flag IDs that fired in Step 4 with verdict
ai_audit_findings - —
action|none|fixed|reopened-frontmatterBUG-NNN.md filed - — BUG IDs
linked_bugs
- Do not treat a task , checked checkbox, memory note, or prior agent summary as proof. Verify every completed or claimed-complete task against actual files, public behavior, automated tests, and acceptance criteria.
declared_status - Classify each task with :
qa_verdict- : every material requirement and success criterion has implementation and fresh verification evidence.
PASS - : implementation exists but one or more non-critical requirements, tests, or evidence are missing.
PARTIAL - : claimed behavior does not work or a critical requirement is absent.
FAIL - : the source
REOPENhastask_NN.mdin frontmatter but the QA verdict isstatus: completedorPARTIAL.FAIL - : audit cannot continue because a concrete prerequisite is missing.
BLOCKED
Step 4: AI Test-Hygiene Scan (RF-1..RF-6)
- STOP. Read in full before scanning the test diff of any task with
references/ai-implementation-audit.md. That file owns the Red Flag scanners (RF-1..RF-6), the Requirement→Test mapping rules, and the verdict matrix.declared_status: completed - Run the scans against the diff since the task baseline (,
git log --follow <test_file>).git diff <baseline_sha>..HEAD - Emit verdict automatically when scanners detect:
FAIL- Weakened assertions on P0/P1 Success Criterion (RF-2).
- /
.skip/.only/xitinserted in the diff (RF-1).t.Skip - Mocks inserted in tests whose corresponding TC declared as Integration/E2E (RF-3).
External Dependencies - Snapshot drift on P0/P1 with no requirement-change justification (RF-4).
- Record findings in the Task Implementation Matrix column and in the per-task block of
ai_audit_findings.audit-report.md - Apply the Requirement → Test mapping table from . For every Success Criterion in
references/ai-implementation-audit.md(frontmatter or body) and every linked bullet intask_NN.md, find the corresponding test by name, reference, or assertion content. Mark each criterion_techspec.md/covers/weak. A checked item ormissingwithout astatus: completedrow is an audit failure.covers
Step 5: Reopen, File Bugs, Write Memory
- Mark incomplete completed tasks as in the matrix.
REOPEN - In Compozy mode, write audit notes to using the canonical sections required by cy-codex-loop:
.compozy/tasks/<slug>/memory/qa-execution.md,Objective Snapshot,Important Decisions,Learnings,Files / Surfaces,Errors / Corrections. This file must be written before anyReady for Next Runfrontmatter is flipped (memory-precedes-status invariant).task_NN.md - Edit the offending frontmatter
task_NN.mdback tostatus:(orpendingif salvageable). Never write toin_progress— cy-codex-loop'sstate.yamlowns mutation; frontmatter wins because the next iteration reconciles from it.update-state.py - File under
BUG-<num>.mdusing<audit-output-path>/audit/issues/. Include:assets/issue-template.md- The task path under .
Reopens task: - The failed Success Criterion under .
Summary: - The original strict assertion (when RF-2 fired) under .
Root cause: - The red flag ID and verdict under notes.
Automation Follow-up: - The transcript anomaly classification (when applicable) under .
Related:
- The task path under
- When the missing work is a bounded root-cause fix inside the audit scope, you may implement it, add regression coverage, and rerun the task proof. Otherwise reopen the task — do not silently pass it.
Step 6: Quality Gates Verdict
- Re-run the canonical verification gate from scratch after the last code change made during the audit.
- Compile the Quality Gates section of . Each gate is
audit-report.md/PASS/FAIL:N/A- Flaky rate <2% in canonical suite.
- Zero from AI test-hygiene audit on P0/P1 tasks.
FAIL - Zero /
Criticalissues open.High - Coverage delta ≥ baseline (no regression).
- Zero unresolved on P0 flows.
flaky-suspect
- A on any gate blocks an unconditional PASS verdict for the run.
FAIL
Step 7: Write the Audit Report
- Summarize the audit using and write the report to
assets/audit-report-template.md.<audit-output-path>/audit/audit-report.md - Mandatory sections:
- Claim / Command / Exit code / Verdict per command executed in Step 2 and Step 6.
- AUTOMATED COVERAGE — support detected, harness, canonical command, required flows with classification, specs added or updated.
- TASK IMPLEMENTATION AUDIT — Compozy slug, plan sources, matrix totals, per-task verdicts, reopened/fixed/blocked tasks, links to bugs.
- SUITE HEALTH SNAPSHOT — flaky rate, flaky events list, mutation score (when harness exists), coverage delta vs baseline, blocked count, manual-only count, AI audit findings count.
- QUALITY GATES — PASS/FAIL/N/A per gate.
- ISSUES FILED — total, by severity, with annotations.
Reopens task:
- When running in a Compozy slug, the final PASS feeds cy-codex-loop's
audit-report.mdprecondition for Phase E — do not callverify.last_status=PASS; cy-codex-loop owns that mutation.update-state.py - Report blocked scenarios, missing credentials, or environment gaps with the exact command or prerequisite that stopped execution.
步骤1:发现仓库验证契约
- 在执行命令前,阅读根目录说明、仓库文档和CI/构建文件。
- 执行以获取候选的install、verify、build、test、lint、start命令以及E2E信号。
python3 scripts/discover-project-contract.py --root . - 停止操作。当发现多个可行的质量门或仓库混合了多种技术栈时,在选择命令前完整阅读。
references/project-signals.md - 停止操作。在对任何流程进行分类前,完整阅读。
references/e2e-coverage.md - 优先选择仓库定义的统一命令,如、
make verify或CI入口点,而非语言默认命令。just verify - 确定审计工件目录。如果用户提供了参数,则使用该路径;否则遵循仓库约定,默认使用
audit-output-path。创建/tmp/agent-output-audit-<slug>子目录;将所有缺陷和报告存入audit/。<audit-output-path>/audit/ - 检测Compozy模式。如果目录存在,记录slug并切换到Compozy感知审计模式:
.compozy/tasks/<slug>/- 读取(只读——切勿写入;根据cy-codex-loop契约,
state.yaml负责该文件的修改)。scripts/update-state.py - 读取(交付物的真实来源)和
_techspec.md(任务清单)(如果存在)。_tasks.md - 列出所有文件并提取其前置信息中的
task_NN.md值(允许的值:status:、pending、in_progress)。当completed的前置信息与task_NN.md不一致时,以前置信息为准。state.yaml - 记录标准存储位置——步骤4在修改任何状态前,会将审计笔记写入该文件。
.compozy/tasks/<slug>/memory/qa-execution.md
- 读取
步骤2:执行基线验证质量门
- 使用仓库推荐的命令安装依赖。
- 在开始任何审计工作前,先执行一次标准验证质量门。按最快优先顺序执行:代码检查和类型校验,然后构建,接着单元测试,最后集成测试。
- 如果E2E命令与统一质量门分离,决定是立即执行还是在运行时前置条件就绪后执行,并明确记录该计划。
- 如果基线验证失败,仔细阅读第一个失败输出,确定该失败是预先存在的还是当前工作引入的,然后再继续。
4a. 不稳定失败处理流程。当基线命令失败时,在将其归类为预先存在或新失败之前,在同一个SHA上单独运行失败的测试3-5次。如果无需修改代码即可至少通过一次,则将其归类为,在
flaky-suspect的audit-report.md部分记录(测试名称、尝试次数、重试结果、疑似类别),并且不要通过重试将其标记为PASS。停止操作。在分配疑似类别或提出隔离建议前,完整阅读SUITE HEALTH SNAPSHOT。references/flaky-triage.md
步骤3:审计任务实现(Compozy模式及所有AI实现的任务)
仅当不存在任务、阶段、PRD、技术规范或实现计划工件时,才跳过此步骤。
- **停止操作。在得出任何任务判定前,完整阅读。**核心要点:绝不将执行任务的Agent的对话记录、成功消息或存储笔记作为证据。在Compozy模式下,在判定任务前,先阅读执行Agent的
references/independent-evaluator-protocol.md工件,并在.compozy/tasks/<slug>/memory/<phase>.md的memory/qa-execution.md部分对异常进行分类(Errors / Corrections/genuine-failure/grader-bug/ambiguous-task)。bypass-exploit - 阅读每个及其内容。将每个任务总结到任务实现矩阵中(列名与cy-codex-loop前置信息一致):
task_NN.md- (例如:
task_path).compozy/tasks/<slug>/task_07.md - — 前置信息中
declared_status的字面值status: - 、
title、type、complexity— 与前置信息一致dependencies - — 若存在,则为
techspec_deliverable中的关联章节_techspec.md - 需求、子任务、检查清单项、成功标准、依赖文件
- — 文件、模块、路由、命令、迁移脚本、种子数据、测试
implementation_evidence - — 执行的命令、退出码、输出摘要
verification_evidence - —
qa_verdict|PASS|PARTIAL|FAIL|REOPEN(与BLOCKED不同)declared_status - — 步骤4中触发的红色预警ID及判定结果
ai_audit_findings - —
action|none|fixed|reopened-frontmatterBUG-NNN.md filed - — 缺陷ID
linked_bugs
- 不要将任务的、勾选的复选框、存储笔记或之前Agent的总结作为证据。针对每个已完成或声称已完成的任务,对照实际文件、公开行为、自动化测试和验收标准进行验证。
declared_status - 使用对每个任务进行分类:
qa_verdict- :所有重要需求和成功标准都有对应的实现和最新的验证证据。
PASS - :存在实现,但缺少一个或多个非关键需求、测试或证据。
PARTIAL - :声称的功能无法正常工作,或缺少关键需求。
FAIL - :源文件
REOPEN的前置信息中task_NN.md,但QA判定为status: completed或PARTIAL。FAIL - :因缺少具体的前置条件,审计无法继续。
BLOCKED
步骤4:AI测试规范扫描(RF-1至RF-6)
- **停止操作。在扫描任何的任务的测试差异前,完整阅读
declared_status: completed。**该文件定义了红色预警扫描器(RF-1至RF-6)、需求→测试映射规则和判定矩阵。references/ai-implementation-audit.md - 针对任务基线以来的差异运行扫描(、
git log --follow <test_file>)。git diff <baseline_sha>..HEAD - 当扫描器检测到以下情况时,自动判定为:
FAIL- P0/P1成功标准的断言被弱化(RF-2)。
- 差异中插入了/
.skip/.only/xit(RF-1)。t.Skip - 在对应测试用例声明为集成/E2E的测试中插入了模拟(RF-3)。
External Dependencies - P0/P1的快照发生偏移且无需求变更的合理理由(RF-4)。
- 将发现结果记录到任务实现矩阵的列以及
ai_audit_findings的每个任务区块中。audit-report.md - 应用中的需求→测试映射表。针对
references/ai-implementation-audit.md(前置信息或内容)中的每个成功标准,以及task_NN.md中的每个关联项目,通过名称、引用或断言内容找到对应的测试。将每个标准标记为_techspec.md/covers/weak。若某个勾选的项目或missing没有对应的status: completed记录,则判定为审计失败。covers
步骤5:重新开启任务、提交缺陷、写入存储笔记
- 在矩阵中将未完成的已标记任务标记为。
REOPEN - 在Compozy模式下,按照cy-codex-loop要求的标准章节,将审计笔记写入:
.compozy/tasks/<slug>/memory/qa-execution.md、Objective Snapshot、Important Decisions、Learnings、Files / Surfaces、Errors / Corrections。必须在修改任何Ready for Next Run的前置信息之前写入该文件(存储优先于状态的原则)。task_NN.md - 将有问题的的前置信息
task_NN.md修改回status:(若可修复则改为pending)。切勿写入in_progress——cy-codex-loop的state.yaml负责该文件的修改;前置信息优先,因为下一次迭代会以此为基准进行协调。update-state.py - 使用在
assets/issue-template.md下创建<audit-output-path>/audit/issues/。内容需包含:BUG-<num>.md- 下填写任务路径。
Reopens task: - 下填写失败的成功标准。
Summary: - 下填写原始的严格断言(当RF-2触发时)。
Root cause: - 备注下填写红色预警ID和判定结果。
Automation Follow-up: - 下填写对话记录异常分类(若适用)。
Related:
- 若缺失的工作是审计范围内的有限根因修复,你可以实现该修复、添加回归测试覆盖,并重新运行任务验证。否则重新开启任务——不要悄悄标记为通过。
步骤6:质量门判定
- 在审计期间最后一次代码修改后,从头重新执行标准验证质量门。
- 编写的质量门部分。每个质量门的结果为
audit-report.md/PASS/FAIL:N/A- 标准测试套件的不稳定率<2%。
- P0/P1任务的AI测试规范审计无结果。
FAIL - 无未解决的/
Critical级问题。High - 测试覆盖增量≥基线(无回归)。
- P0流程无未解决的。
flaky-suspect
- 任何一个质量门判定为,都会阻止本次运行获得无条件PASS的结果。
FAIL
步骤7:编写审计报告
- 使用总结审计内容,并将报告写入
assets/audit-report-template.md。<audit-output-path>/audit/audit-report.md - 必须包含的章节:
- Claim / Command / Exit code / Verdict:步骤2和步骤6中执行的每个命令的相关信息。
- AUTOMATED COVERAGE:检测到的支持情况、测试框架、标准命令、已分类的必要流程、新增或更新的规范。
- TASK IMPLEMENTATION AUDIT:Compozy slug、计划来源、矩阵汇总、每个任务的判定结果、重新开启/修复/阻塞的任务、缺陷链接。
- SUITE HEALTH SNAPSHOT:不稳定率、不稳定事件列表、变异分数(若有测试框架)、与基线相比的覆盖增量、阻塞数量、仅手动测试数量、AI审计发现数量。
- QUALITY GATES:每个质量门的PASS/FAIL/N/A结果。
- ISSUES FILED:缺陷总数、按严重程度分类、带有注释。
Reopens task:
- 在Compozy slug中运行时,最终的PASS结果会作为cy-codex-loop阶段E的
audit-report.md前置条件——不要调用verify.last_status=PASS;cy-codex-loop负责该修改。update-state.py - 报告阻塞场景、缺失的凭证或环境差距时,需提供导致执行停止的具体命令或前置条件。
Error Handling
错误处理
- If command discovery returns multiple plausible gates, prefer the broadest repository-defined command and explain the tie-breaker.
- If E2E support signals are weak or contradictory, prefer explicit config files and runnable commands before claiming the repository supports E2E.
- If no canonical verify command exists, read , choose the broadest safe install, lint, test, and build commands for the detected ecosystem, and state that assumption explicitly.
references/project-signals.md - If a required live dependency is unavailable, validate every local boundary that does not require the missing dependency and report the blocked live validation separately.
- If a failure appears unrelated to the audited tasks, prove that with a clean reproduction before excluding it from the audit scope.
- If the repository has an E2E harness but credentials, runtime services, or test data prevent execution, keep the affected flow classified as and report the exact prerequisite that is missing.
blocked - If files are marked
task_NN.mdbut contain unchecked subtasks, missing deliverables, or unverified criteria, do not call the audit a pass. Writestatus: completedfirst, then edit frontmattermemory/qa-execution.mdback tostatus:orpending, and filein_progressper Step 5. Never write toBUG-<num>.md.state.yaml - If a test fails and passes on retry without a code change, do not promote to PASS. Register as per
flaky-suspect, record the event in the Suite Health Snapshot, and treat any unresolvedreferences/flaky-triage.mdon a P0 flow as a blocker for the final verdict.flaky-suspect - If the AI test-hygiene scan (Step 4) detects weakened assertions, skipped tests, or mocks hiding integration in a task with , do not call the audit a pass. Apply the verdict matrix in
declared_status: completed, filereferences/ai-implementation-audit.mdwith TypeBUG-<num>.md, and flip frontmatterFunctionalper Step 5.status:
- 如果命令发现返回多个可行的质量门,优先选择覆盖范围最广的仓库定义命令,并说明选择理由。
- 如果E2E支持信号较弱或相互矛盾,在声称仓库支持E2E之前,优先依据明确的配置文件和可运行的命令。
- 如果不存在标准的verify命令,阅读,为检测到的技术栈选择覆盖范围最广且安全的install、lint、test和build命令,并明确说明该假设。
references/project-signals.md - 如果所需的实时依赖不可用,验证所有不依赖该缺失项的本地边界,并单独报告阻塞的实时验证情况。
- 如果某个失败看起来与审计任务无关,在将其排除出审计范围前,需通过干净的复现证明这一点。
- 如果仓库有E2E测试框架,但凭证、运行时服务或测试数据导致无法执行,将受影响的流程归类为,并报告缺失的具体前置条件。
blocked - 如果文件标记为
task_NN.md但包含未勾选的子任务、缺失的交付物或未验证的标准,不要判定审计通过。先写入status: completed,然后将前置信息的memory/qa-execution.md修改回status:或pending,并按照步骤5提交in_progress。切勿写入BUG-<num>.md。state.yaml - 如果某个测试失败,但无需修改代码即可通过重试,不要将其标记为PASS。按照将其注册为
references/flaky-triage.md,在套件健康快照中记录该事件,并将P0流程中任何未解决的flaky-suspect视为最终判定的阻塞项。flaky-suspect - 如果AI测试规范扫描(步骤4)在的任务中检测到弱化的断言、跳过的测试或隐藏集成的模拟,不要判定审计通过。应用
declared_status: completed中的判定矩阵,提交类型为references/ai-implementation-audit.md的Functional,并按照步骤5修改前置信息的BUG-<num>.md。status:
Companion: qa-execution
配套工具:qa-execution
agent-output-auditqa-execution- Run to certify that
agent-output-auditreflects real work.task_NN.md status: completed - Run to certify that the product, taken as a whole, is acceptable to end users.
qa-execution
A Compozy slug typically wants both: audit the task implementations, then exercise the resulting product through user-flow QA. They share no output directory, no bug taxonomy, and no procedures — keep them separate.
agent-output-auditqa-execution- 运行以确认
agent-output-audit反映了真实完成的工作。task_NN.md status: completed - 运行以确认整个产品是否能被终端用户接受。
qa-execution
Compozy slug通常需要同时使用这两个工具:先审计任务实现,然后通过用户流QA测试最终产品。二者不共享输出目录、缺陷分类体系和流程——请将它们分开使用。