ecc-tools-cost-audit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ECC Tools Cost Audit

ECC Tools 成本审计

Use this skill when the user suspects the ECC Tools GitHub App is burning cost, over-creating PRs, bypassing usage limits, or routing free users into premium analysis paths.
This is a focused operator workflow for the sibling ECC-Tools repo. It is not a generic billing skill and it is not a repo-wide code review pass.
当用户怀疑ECC Tools GitHub App存在成本消耗过高、PR创建过量、绕过使用限制,或者将免费用户导流到付费分析路径的问题时,可使用本技能。
这是为兄弟仓库ECC-Tools量身打造的运维工作流,并非通用账单技能,也不是全仓库代码评审流程。

Skill Stack

技能栈

Pull these ECC-native skills into the workflow when relevant:
  • autonomous-loops
    for bounded multi-step audits that cross webhooks, queues, billing, and retries
  • agentic-engineering
    for tracing the request path into discrete, provable units
  • customer-billing-ops
    when repo behavior and customer-impact math must be separated cleanly
  • search-first
    before inventing helpers or re-implementing repo-local utilities
  • security-review
    when auth, usage gates, entitlements, or secrets are touched
  • verification-loop
    for proving rerun safety and exact post-fix state
  • tdd-workflow
    when the fix needs regression coverage in the worker, router, or billing paths
当相关场景出现时,可将以下ECC原生技能引入工作流:
  • autonomous-loops
    用于跨webhook、队列、账单和重试环节的受限多步审计
  • agentic-engineering
    用于将请求路径拆解为独立可验证的单元
  • customer-billing-ops
    用于明确区分代码库行为与客户影响计算
  • search-first
    在开发辅助工具或重新实现仓库本地工具前优先使用
  • security-review
    当涉及鉴权、使用闸门、权限或密钥时使用
  • verification-loop
    用于验证重跑安全性和修复后的精确状态
  • tdd-workflow
    当修复需要为worker、路由或账单路径添加回归测试覆盖时使用

When To Use

适用场景

  • user says ECC Tools burn rate, PR recursion, over-created PRs, usage-limit bypass, or premium-model leakage
  • the task is in the sibling
    ECC-Tools
    repo and depends on webhook handlers, queue workers, usage reservation, PR creation logic, or paid-gate enforcement
  • a customer report says the app created too many PRs, billed incorrectly, or analyzed code without producing a usable result
  • 用户提及ECC Tools消耗速率、PR递归、PR创建过量、使用限制绕过、付费模型泄露等问题
  • 任务属于兄弟仓库
    ECC-Tools
    范围,且依赖webhook handler、队列worker、使用量预留、PR创建逻辑或付费闸门规则
  • 客户反馈称App创建了过多PR、计费错误,或者完成了代码分析但未输出可用结果

Scope Guardrails

范围边界

  • work in the sibling
    ECC-Tools
    repo, not in
    everything-claude-code
  • start read-only unless the user clearly asked for a fix
  • do not mutate unrelated billing, checkout, or UI flows while tracing analysis burn
  • treat app-generated branches and app-generated PRs as red-flag recursion paths until proved otherwise
  • separate three things explicitly:
    • repo-side burn root cause
    • customer-facing billing impact
    • product or entitlement gaps that need backlog follow-up
  • 仅在兄弟仓库
    ECC-Tools
    内工作,不要修改
    everything-claude-code
    仓库
  • 除非用户明确要求修复,否则默认以只读模式开始工作
  • 追踪分析消耗时不要修改不相关的账单、结算或UI流程
  • 在证实无风险前,将App生成的分支和App生成的PR视为高风险递归路径
  • 明确区分三类内容:
    • 代码库侧消耗的根本原因
    • 面向客户的账单影响
    • 需要加入待办清单跟进的产品或权限缺口

Workflow

工作流

1. Freeze repo scope

1. 锁定仓库范围

  • switch into the sibling
    ECC-Tools
    repo
  • check branch and local diff first
  • identify the exact surface under audit:
    • webhook router
    • queue producer
    • queue consumer
    • PR creation path
    • usage reservation / billing path
    • model routing path
  • 切换到兄弟仓库
    ECC-Tools
  • 首先检查分支和本地差异
  • 确定审计覆盖的具体范围:
    • webhook路由
    • 队列生产者
    • 队列消费者
    • PR创建路径
    • 使用量预留/账单路径
    • 模型路由路径

2. Trace ingress before theorizing

2. 先追踪入口再推导原因

  • inspect
    src/index.*
    or the main entrypoint first
  • map every enqueue path before suggesting a fix
  • confirm which GitHub events share a queue type
  • confirm whether push, pull_request, synchronize, comment, or manual re-run events can converge on the same expensive path
  • 首先检查
    src/index.*
    或主入口文件
  • 在提出修复方案前梳理所有入队路径
  • 确认哪些GitHub事件共享同一队列类型
  • 确认push、pull_request、synchronize、comment或手动重跑事件是否会收敛到同一个高成本路径

3. Trace the worker and side effects

3. 追踪worker和副作用

  • inspect the queue consumer or scheduled worker that handles analysis
  • confirm whether a queued analysis always ends in:
    • PR creation
    • branch creation
    • file updates
    • premium model calls
    • usage increments
  • if analysis can spend tokens and then fail before output is persisted, classify it as burn-with-broken-output
  • 检查处理分析任务的队列消费者或定时worker
  • 确认排队的分析任务是否始终会触发以下操作:
    • PR创建
    • 分支创建
    • 文件更新
    • 付费模型调用
    • 使用量累加
  • 如果分析任务消耗了token,但在输出持久化之前失败,将其归类为“无输出资源消耗”

4. Audit the high-signal burn paths

4. 审计高风险消耗路径

PR multiplication

PR倍增

  • inspect PR helpers and branch naming
  • check dedupe, synchronize-event handling, and existing-PR reuse
  • if app-generated branches can re-enter analysis, treat that as a priority-0 recursion risk
  • 检查PR辅助工具和分支命名规则
  • 检查去重逻辑、synchronize事件处理逻辑和现有PR复用逻辑
  • 如果App生成的分支可以重新进入分析流程,将其视为最高优先级递归风险

Quota bypass

配额绕过

  • inspect where quota is checked versus where usage is reserved or incremented
  • if quota is checked before enqueue but usage is charged only inside the worker, treat concurrent front-door passes as a real race
  • 检查配额校验的位置,以及使用量预留或累加的位置
  • 如果配额校验在入队前完成,但使用量扣费仅在worker内部执行,将并发请求通过入口闸门的情况判定为真实的竞态风险

Premium-model leakage

付费模型泄露

  • inspect model selection, tier branching, and provider routing
  • verify whether free or capped users can still hit premium analyzers when premium keys are present
  • 检查模型选择、层级分支和供应商路由逻辑
  • 验证当付费密钥存在时,免费用户或 capped 用户是否仍能访问付费分析器

Retry burn

重试消耗

  • inspect retry loops, duplicate queue jobs, and deterministic failure reruns
  • if the same non-transient error can spend analysis repeatedly, fix that before quality improvements
  • 检查重试循环、重复队列任务和确定性失败重跑逻辑
  • 如果同一个非临时错误会导致重复消耗分析资源,优先修复该问题再进行质量优化

5. Fix in burn order

5. 按消耗优先级修复

If the user asked for code changes, prioritize fixes in this order:
  1. stop automatic PR multiplication
  2. stop quota bypass
  3. stop premium leakage
  4. stop duplicate-job fanout and pointless retries
  5. close rerun/update safety gaps
Keep the pass bounded to one to three direct fixes unless the same root cause clearly spans multiple files.
如果用户要求修改代码,按以下顺序安排修复优先级:
  1. 停止自动PR倍增
  2. 停止配额绕过
  3. 停止付费模型泄露
  4. 停止重复任务扩散和无意义重试
  5. 填补重跑/更新安全缺口
除非同一个根因明显横跨多个文件,否则将修复范围限制在1-3个直接修改点内。

6. Verify with the smallest proving steps

6. 用最小步骤验证修复效果

  • rerun only the targeted tests or integration slices that cover the changed path
  • verify whether the burn path is now:
    • blocked
    • deduped
    • downgraded to cheaper analysis
    • or rejected early
  • state the final status exactly:
    • changed locally
    • verified locally
    • pushed
    • deployed
    • still blocked
  • 仅重跑覆盖修改路径的目标测试或集成测试片段
  • 验证消耗路径现在是否处于以下状态:
    • 被阻断
    • 已去重
    • 降级为更低成本的分析
    • 或被提前拦截
  • 明确说明最终状态:
    • 本地已修改
    • 本地已验证
    • 已推送
    • 已部署
    • 仍被阻断

High-Signal Failure Patterns

高风险故障模式

1. One queue type for all triggers

1. 所有触发器共用同一队列类型

If pushes, PR syncs, and manual audits all enqueue the same job and the worker always creates a PR, analysis equals PR spam.
如果push、PR同步和手动审计都入列同一个任务,且worker始终会创建PR,那么分析任务就等同于PR垃圾消息。

2. Post-enqueue usage reservation

2. 入队后才预留使用量

If usage is checked at the front door but only incremented in the worker, concurrent requests can all pass the gate and exceed quota.
如果在入口处校验使用量,但仅在worker内部累加使用量,并发请求可能全部通过闸门导致配额超限。

3. Free tier on premium path

3. 免费层用户进入付费路径

If free queued jobs can still route into Anthropic or another premium provider when keys exist, that is real spend leakage even if the user never sees the premium result.
如果密钥存在时,免费排队任务仍能路由到Anthropic或其他付费供应商,即使用户从未看到付费结果,也会造成真实的成本泄露。

4. App-generated branches re-enter the webhook

4. App生成的分支重新触发webhook

If
pull_request.synchronize
, branch pushes, or comment-triggered runs fire on app-owned branches, the app can recursively analyze its own output.
如果
pull_request.synchronize
、分支推送或评论触发的运行在App所有的分支上生效,App可能会递归分析自己的输出。

5. Expensive work before persistence safety

5. 持久化安全校验前执行高成本操作

If the system can spend tokens and then fail on PR creation, file update, or branch collision, it is burning cost without shipping value.
如果系统先消耗token,之后才在PR创建、文件更新或分支冲突环节失败,会导致无价值的成本消耗。

Pitfalls

注意事项

  • do not begin with broad repo wandering; settle webhook -> queue -> worker first
  • do not mix customer billing inference with code-backed product truth
  • do not fix lower-value quality issues before the highest-burn path is contained
  • do not claim burn is fixed until the narrow proving step was rerun
  • do not push or deploy unless the user asked
  • do not touch unrelated repo-local changes if they are already in progress
  • 不要一开始就漫无目的地浏览仓库,优先梳理webhook -> 队列 -> worker的链路
  • 不要将客户账单推测与代码层面的产品事实混为一谈
  • 在遏制最高消耗路径前,不要修复低价值的质量问题
  • 在完成窄范围验证步骤重跑前,不要宣称消耗问题已修复
  • 除非用户要求,否则不要推送或部署代码
  • 不要触碰仓库中已经在进行中的不相关修改

Verification

验证标准

  • root causes cite exact file paths and code areas
  • fixes are ordered by burn impact, not code neatness
  • proving commands are named
  • final status distinguishes local change, verification, push, and deployment
  • 根因说明需标注精确的文件路径和代码区域
  • 修复按消耗影响排序,而非代码整洁度
  • 明确列出验证用的命令
  • 最终状态需区分本地修改、验证、推送和部署四个阶段