autoresearch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Claude Autoresearch — Autonomous Goal-directed Iteration

Claude Autoresearch — 自主目标导向迭代

Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work — not just ML research.
Core idea: You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat.
灵感来自Karpathy的autoresearch。将约束驱动的自主迭代应用于任何工作——不仅仅是ML研究。
核心理念: 你是一个自主Agent。修改 → 验证 → 保留/舍弃 → 重复。

Subcommands

子命令

SubcommandPurpose
/autoresearch
Run the autonomous loop (default)
/autoresearch:plan
Interactive wizard to build Scope, Metric, Direction & Verify from a Goal
子命令用途
/autoresearch
运行自主循环(默认)
/autoresearch:plan
交互式向导,基于目标构建范围、指标、方向与验证规则

/autoresearch:plan — Goal → Configuration Wizard

/autoresearch:plan — 目标→配置向导

Converts a plain-language goal into a validated, ready-to-execute autoresearch configuration.
Load:
references/plan-workflow.md
for full protocol.
Quick summary:
  1. Capture Goal — ask what the user wants to improve (or accept inline text)
  2. Analyze Context — scan codebase for tooling, test runners, build scripts
  3. Define Scope — suggest file globs, validate they resolve to real files
  4. Define Metric — suggest mechanical metrics, validate they output a number
  5. Define Direction — higher or lower is better
  6. Define Verify — construct the shell command, dry-run it, confirm it works
  7. Confirm & Launch — present the complete config, offer to launch immediately
Critical gates:
  • Metric MUST be mechanical (outputs a parseable number, not subjective)
  • Verify command MUST pass a dry run on the current codebase before accepting
  • Scope MUST resolve to ≥1 file
Usage:
/autoresearch:plan
Goal: Make the API respond faster

/autoresearch:plan Increase test coverage to 95%

/autoresearch:plan Reduce bundle size below 200KB
After the wizard completes, the user gets a ready-to-paste
/autoresearch
invocation — or can launch it directly.
将自然语言描述的目标转换为经过验证、可直接执行的autoresearch配置。
加载
references/plan-workflow.md
查看完整流程。
快速概述:
  1. 捕获目标 — 询问用户想要优化的内容(或接受内联文本)
  2. 分析上下文 — 扫描代码库,了解工具、测试运行器、构建脚本等信息
  3. 定义范围 — 建议文件通配符,验证其指向真实文件
  4. 定义指标 — 建议可量化指标,验证其能输出数值
  5. 定义优化方向 — 确定指标是越高越好还是越低越好
  6. 定义验证规则 — 构建Shell命令,试运行并确认其可用
  7. 确认并启动 — 展示完整配置,提供直接启动选项
关键校验:
  • 指标必须是可量化的(输出可解析的数值,而非主观判断)
  • 验证命令必须在当前代码库中通过试运行才能被接受
  • 范围必须指向至少1个文件
用法:
/autoresearch:plan
Goal: Make the API respond faster

/autoresearch:plan Increase test coverage to 95%

/autoresearch:plan Reduce bundle size below 200KB
向导完成后,用户将获得可直接粘贴的
/autoresearch
调用命令,或可直接启动任务。

When to Activate

触发场景

  • User invokes
    /autoresearch
    or
    /ug:autoresearch
    → run the loop
  • User invokes
    /autoresearch:plan
    → run the planning wizard
  • User says "help me set up autoresearch", "plan an autoresearch run" → run the planning wizard
  • User says "work autonomously", "iterate until done", "keep improving", "run overnight" → run the loop
  • Any task requiring repeated iteration cycles with measurable outcomes → run the loop
  • 用户调用
    /autoresearch
    /ug:autoresearch
    → 运行循环
  • 用户调用
    /autoresearch:plan
    → 运行配置向导
  • 用户说"帮我设置autoresearch"、"规划autoresearch运行" → 运行配置向导
  • 用户说"自主工作"、"迭代直到完成"、"持续优化"、"通宵运行" → 运行循环
  • 任何需要重复迭代并产生可衡量结果的任务 → 运行循环

Optional: Controlled Loop Count

可选:受控循环次数

By default, autoresearch loops forever until manually interrupted. However, users can optionally specify a loop count to limit iterations using Claude Code's built-in
/loop
command.
Requires: Claude Code v1.0.32+ (the
/loop
command was introduced in this version)
默认情况下,autoresearch会无限循环直到手动中断。不过用户可通过Claude Code内置的
/loop
命令,可选地指定循环次数来限制迭代次数。
要求: Claude Code v1.0.32+(
/loop
命令在此版本中引入)

Usage

用法

Unlimited (default):
/autoresearch
Goal: Increase test coverage to 90%
Bounded (N iterations):
/loop 25 /autoresearch
Goal: Increase test coverage to 90%
This chains
/autoresearch
with
/loop 25
, running exactly 25 iteration cycles. After 25 iterations, Claude stops and prints a final summary.
无限循环(默认):
/autoresearch
Goal: Increase test coverage to 90%
有限循环(N次迭代):
/loop 25 /autoresearch
Goal: Increase test coverage to 90%
此命令将
/autoresearch
/loop 25
关联,恰好运行25次迭代循环。25次迭代后,Claude将停止并打印最终总结。

When to Use Bounded Loops

有限循环的适用场景

ScenarioRecommendation
Run overnight, review in morningUnlimited (default)
Quick 30-min improvement session
/loop 10 /autoresearch
Targeted fix with known scope
/loop 5 /autoresearch
Exploratory — see if approach works
/loop 15 /autoresearch
CI/CD pipeline integration
/loop N /autoresearch
(set N based on time budget)
场景建议
通宵运行,次日查看结果无限循环(默认)
30分钟快速优化会话
/loop 10 /autoresearch
范围明确的定向修复
/loop 5 /autoresearch
探索性尝试——验证方法是否可行
/loop 15 /autoresearch
CI/CD流水线集成
/loop N /autoresearch
(根据时间预算设置N值)

Behavior with Loop Count

指定循环次数后的行为

When a loop count is specified:
  • Claude runs exactly N iterations through the autoresearch loop
  • After iteration N, Claude prints a final summary with baseline → current best, keeps/discards/crashes
  • If the goal is achieved before N iterations, Claude prints early completion and stops
  • All other rules (atomic changes, mechanical verification, auto-rollback) still apply
当指定循环次数时:
  • Claude将恰好运行N次autoresearch循环迭代
  • 第N次迭代后,Claude将打印最终总结,包含基线→当前最优结果、保留/舍弃/失败记录
  • 如果在N次迭代前已达成目标,Claude将提前结束并打印完成信息
  • 所有其他规则(原子变更、可量化验证、自动回滚)仍然适用

Setup Phase (Do Once)

初始化阶段(仅需执行一次)

  1. Read all in-scope files for full context before any modification
  2. Define the goal — What does "better" mean? Extract or ask for a mechanical metric:
    • Code: tests pass, build succeeds, performance benchmark improves
    • Content: word count target hit, SEO score improves, readability score
    • Design: lighthouse score, accessibility audit passes
    • If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors")
  3. Define scope constraints — Which files can you modify? Which are read-only?
  4. Create a results log — Track every iteration (see
    references/results-logging.md
    )
  5. Establish baseline — Run verification on current state. Record as iteration #0
  6. Confirm and go — Show user the setup, get confirmation, then BEGIN THE LOOP
  1. 读取所有范围内的文件,在进行任何修改前获取完整上下文
  2. 定义目标 — "更好"的标准是什么?提取或与用户协商定义可量化指标:
    • 代码:测试通过、构建成功、性能基准提升
    • 内容:达成字数目标、SEO分数提升、可读性分数优化
    • 设计:Lighthouse分数提升、可访问性审计通过
    • 如果没有现成指标→与用户共同定义,或使用最简单的替代指标(如"编译无错误")
  3. 定义范围约束 — 可修改哪些文件?哪些是只读文件?
  4. 创建结果日志 — 跟踪每一次迭代(查看
    references/results-logging.md
  5. 建立基准线 — 在当前状态下运行验证,记录为第0次迭代
  6. 确认并启动 — 向用户展示初始化信息,获得确认后开始循环

The Loop

循环流程

Read
references/autonomous-loop-protocol.md
for full protocol details.
LOOP (FOREVER or N times):
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Decide:
     - IMPROVED → Keep commit, log "keep", advance
     - SAME/WORSE → Git revert, log "discard"
     - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
  7. Log: Record result in results log
  8. Repeat: Go to step 1.
     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
     - If bounded (N): Stop after N iterations, print final summary
查看
references/autonomous-loop-protocol.md
获取完整流程细节。
LOOP(无限次或N次):
  1. 回顾:读取当前状态 + Git历史 + 结果日志
  2. 构思:基于目标、过往结果和未尝试的方法,选择下一次变更
  3. 修改:对范围内的文件进行**一次聚焦式变更**
  4. 提交:Git提交变更(在验证前)
  5. 验证:运行可量化指标(测试、构建、基准测试等)
  6. 决策:
     - 优化→保留提交,记录"保留",继续迭代
     - 无变化/恶化→Git回滚,记录"舍弃"
     - 崩溃→尝试修复(最多3次),否则记录"崩溃"并继续
  7. 记录:将结果写入结果日志
  8. 重复:回到步骤1
     - 无限循环:永不停止。永不询问"是否继续?"
     - 有限循环(N次):N次迭代后停止,打印最终总结

Critical Rules

核心规则

  1. Loop until done — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
  2. Read before write — Always understand full context before modifying
  3. One change per iteration — Atomic changes. If it breaks, you know exactly why
  4. Mechanical verification only — No subjective "looks good". Use metrics
  5. Automatic rollback — Failed changes revert instantly. No debates
  6. Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
  7. Git is memory — Every kept change committed. Agent reads history to learn patterns
  8. When stuck, think harder — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions
  1. 循环直到完成 — 无限循环:直到被中断。有限循环:完成N次迭代后总结
  2. 先读再写 — 修改前必须充分理解完整上下文
  3. 每次迭代仅做一次变更 — 原子变更。如果出错,可准确定位原因
  4. 仅采用可量化验证 — 不接受主观的"看起来不错",必须使用指标
  5. 自动回滚 — 失败的变更立即回滚,无需争议
  6. 简洁优先 — 结果相同+代码更少→保留。微小提升+复杂代码→舍弃
  7. Git作为记忆 — 所有保留的变更都将提交。Agent通过读取历史来学习模式
  8. 遇到瓶颈时深入思考 — 重新读取文件、回顾目标、组合近似方案、尝试激进变更。除非确实因权限/访问缺失受阻,否则不要寻求帮助

Principles Reference

原则参考

See
references/core-principles.md
for the 7 generalizable principles from autoresearch.
查看
references/core-principles.md
获取autoresearch的7项通用原则。

Adapting to Different Domains

适配不同领域

DomainMetricScopeVerify Command
Backend codeTests pass + coverage %
src/**/*.ts
npm test
Frontend UILighthouse score
src/components/**
npx lighthouse
ML trainingval_bpb / loss
train.py
uv run train.py
Blog/contentWord count + readability
content/*.md
Custom script
PerformanceBenchmark time (ms)Target files
npm run bench
RefactoringTests pass + LOC reducedTarget module
npm test && wc -l
Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.
领域指标范围验证命令
后端代码测试通过 + 覆盖率百分比
src/**/*.ts
npm test
前端UILighthouse分数
src/components/**
npx lighthouse
ML训练val_bpb / 损失值
train.py
uv run train.py
博客/内容字数 + 可读性分数
content/*.md
自定义脚本
性能基准测试时间(ms)目标文件
npm run bench
代码重构测试通过 + 代码行数减少目标模块
npm test && wc -l
根据领域适配循环流程。原则是通用的指标是领域特定的