autoresearch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Claude Autoresearch — Autonomous Goal-directed Iteration

Claude Autoresearch — 自主目标导向迭代

Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work — not just ML research.

Core idea: You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat.

灵感来自Karpathy的autoresearch。将约束驱动的自主迭代应用于任何工作——不仅仅是ML研究。

核心理念： 你是一个自主Agent。修改 → 验证 → 保留/舍弃 → 重复。

Subcommands

子命令

Subcommand	Purpose
`/autoresearch`	Run the autonomous loop (default)
`/autoresearch:plan`	Interactive wizard to build Scope, Metric, Direction & Verify from a Goal

子命令	用途
`/autoresearch`	运行自主循环（默认）
`/autoresearch:plan`	交互式向导，基于目标构建范围、指标、方向与验证规则

/autoresearch:plan — Goal → Configuration Wizard

/autoresearch:plan — 目标→配置向导

Converts a plain-language goal into a validated, ready-to-execute autoresearch configuration.

Load:

references/plan-workflow.md

for full protocol.

Quick summary:

Capture Goal — ask what the user wants to improve (or accept inline text)
Analyze Context — scan codebase for tooling, test runners, build scripts
Define Scope — suggest file globs, validate they resolve to real files
Define Metric — suggest mechanical metrics, validate they output a number
Define Direction — higher or lower is better
Define Verify — construct the shell command, dry-run it, confirm it works
Confirm & Launch — present the complete config, offer to launch immediately

Critical gates:

Metric MUST be mechanical (outputs a parseable number, not subjective)
Verify command MUST pass a dry run on the current codebase before accepting
Scope MUST resolve to ≥1 file

Usage:

/autoresearch:plan
Goal: Make the API respond faster

/autoresearch:plan Increase test coverage to 95%

/autoresearch:plan Reduce bundle size below 200KB

After the wizard completes, the user gets a ready-to-paste

/autoresearch

invocation — or can launch it directly.

将自然语言描述的目标转换为经过验证、可直接执行的autoresearch配置。

加载

references/plan-workflow.md

查看完整流程。

快速概述：

捕获目标 — 询问用户想要优化的内容（或接受内联文本）
分析上下文 — 扫描代码库，了解工具、测试运行器、构建脚本等信息
定义范围 — 建议文件通配符，验证其指向真实文件
定义指标 — 建议可量化指标，验证其能输出数值
定义优化方向 — 确定指标是越高越好还是越低越好
定义验证规则 — 构建Shell命令，试运行并确认其可用
确认并启动 — 展示完整配置，提供直接启动选项

关键校验：

指标必须是可量化的（输出可解析的数值，而非主观判断）
验证命令必须在当前代码库中通过试运行才能被接受
范围必须指向至少1个文件

用法：

/autoresearch:plan
Goal: Make the API respond faster

/autoresearch:plan Increase test coverage to 95%

/autoresearch:plan Reduce bundle size below 200KB

向导完成后，用户将获得可直接粘贴的

/autoresearch

调用命令，或可直接启动任务。

When to Activate

触发场景

User invokes
```
/autoresearch
```
or
```
/ug:autoresearch
```
→ run the loop
User invokes
```
/autoresearch:plan
```
→ run the planning wizard
User says "help me set up autoresearch", "plan an autoresearch run" → run the planning wizard
User says "work autonomously", "iterate until done", "keep improving", "run overnight" → run the loop
Any task requiring repeated iteration cycles with measurable outcomes → run the loop

用户调用
```
/autoresearch
```
或
```
/ug:autoresearch
```
→ 运行循环
用户调用
```
/autoresearch:plan
```
→ 运行配置向导
用户说"帮我设置autoresearch"、"规划autoresearch运行" → 运行配置向导
用户说"自主工作"、"迭代直到完成"、"持续优化"、"通宵运行" → 运行循环
任何需要重复迭代并产生可衡量结果的任务 → 运行循环

Optional: Controlled Loop Count

可选：受控循环次数

By default, autoresearch loops forever until manually interrupted. However, users can optionally specify a loop count to limit iterations using Claude Code's built-in

/loop

command.

Requires: Claude Code v1.0.32+ (the
/loop
command was introduced in this version)

默认情况下，autoresearch会无限循环直到手动中断。不过用户可通过Claude Code内置的

/loop

命令，可选地指定循环次数来限制迭代次数。

要求： Claude Code v1.0.32+（
/loop
命令在此版本中引入）

Usage

用法

Unlimited (default):

/autoresearch
Goal: Increase test coverage to 90%

Bounded (N iterations):

/loop 25 /autoresearch
Goal: Increase test coverage to 90%

This chains

/autoresearch

with

/loop 25

, running exactly 25 iteration cycles. After 25 iterations, Claude stops and prints a final summary.

无限循环（默认）：

/autoresearch
Goal: Increase test coverage to 90%

有限循环（N次迭代）：

/loop 25 /autoresearch
Goal: Increase test coverage to 90%

此命令将

/autoresearch

与

/loop 25

关联，恰好运行25次迭代循环。25次迭代后，Claude将停止并打印最终总结。

When to Use Bounded Loops

有限循环的适用场景

Scenario	Recommendation
Run overnight, review in morning	Unlimited (default)
Quick 30-min improvement session	`/loop 10 /autoresearch`
Targeted fix with known scope	`/loop 5 /autoresearch`
Exploratory — see if approach works	`/loop 15 /autoresearch`
CI/CD pipeline integration	`/loop N /autoresearch` (set N based on time budget)

场景	建议
通宵运行，次日查看结果	无限循环（默认）
30分钟快速优化会话	`/loop 10 /autoresearch`
范围明确的定向修复	`/loop 5 /autoresearch`
探索性尝试——验证方法是否可行	`/loop 15 /autoresearch`
CI/CD流水线集成	`/loop N /autoresearch` （根据时间预算设置N值）

Behavior with Loop Count

指定循环次数后的行为

When a loop count is specified:

Claude runs exactly N iterations through the autoresearch loop
After iteration N, Claude prints a final summary with baseline → current best, keeps/discards/crashes
If the goal is achieved before N iterations, Claude prints early completion and stops
All other rules (atomic changes, mechanical verification, auto-rollback) still apply

当指定循环次数时：

Claude将恰好运行N次autoresearch循环迭代
第N次迭代后，Claude将打印最终总结，包含基线→当前最优结果、保留/舍弃/失败记录
如果在N次迭代前已达成目标，Claude将提前结束并打印完成信息
所有其他规则（原子变更、可量化验证、自动回滚）仍然适用

Setup Phase (Do Once)

初始化阶段（仅需执行一次）

Read all in-scope files for full context before any modification
Define the goal — What does "better" mean? Extract or ask for a mechanical metric:
- Code: tests pass, build succeeds, performance benchmark improves
- Content: word count target hit, SEO score improves, readability score
- Design: lighthouse score, accessibility audit passes
- If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors")
Define scope constraints — Which files can you modify? Which are read-only?
Create a results log — Track every iteration (see
```
references/results-logging.md
```
)
Establish baseline — Run verification on current state. Record as iteration #0
Confirm and go — Show user the setup, get confirmation, then BEGIN THE LOOP

读取所有范围内的文件，在进行任何修改前获取完整上下文
定义目标 — "更好"的标准是什么？提取或与用户协商定义可量化指标：
- 代码：测试通过、构建成功、性能基准提升
- 内容：达成字数目标、SEO分数提升、可读性分数优化
- 设计：Lighthouse分数提升、可访问性审计通过
- 如果没有现成指标→与用户共同定义，或使用最简单的替代指标（如"编译无错误"）
定义范围约束 — 可修改哪些文件？哪些是只读文件？
创建结果日志 — 跟踪每一次迭代（查看
```
references/results-logging.md
```
）
建立基准线 — 在当前状态下运行验证，记录为第0次迭代
确认并启动 — 向用户展示初始化信息，获得确认后开始循环

The Loop

循环流程

Read

references/autonomous-loop-protocol.md

for full protocol details.

LOOP (FOREVER or N times):
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Decide:
     - IMPROVED → Keep commit, log "keep", advance
     - SAME/WORSE → Git revert, log "discard"
     - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
  7. Log: Record result in results log
  8. Repeat: Go to step 1.
     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
     - If bounded (N): Stop after N iterations, print final summary

查看

references/autonomous-loop-protocol.md

获取完整流程细节。

LOOP（无限次或N次）:
  1. 回顾：读取当前状态 + Git历史 + 结果日志
  2. 构思：基于目标、过往结果和未尝试的方法，选择下一次变更
  3. 修改：对范围内的文件进行**一次聚焦式变更**
  4. 提交：Git提交变更（在验证前）
  5. 验证：运行可量化指标（测试、构建、基准测试等）
  6. 决策:
     - 优化→保留提交，记录"保留"，继续迭代
     - 无变化/恶化→Git回滚，记录"舍弃"
     - 崩溃→尝试修复（最多3次），否则记录"崩溃"并继续
  7. 记录：将结果写入结果日志
  8. 重复：回到步骤1
     - 无限循环：永不停止。永不询问"是否继续？"
     - 有限循环（N次）：N次迭代后停止，打印最终总结

Critical Rules

核心规则

Loop until done — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
Read before write — Always understand full context before modifying
One change per iteration — Atomic changes. If it breaks, you know exactly why
Mechanical verification only — No subjective "looks good". Use metrics
Automatic rollback — Failed changes revert instantly. No debates
Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
Git is memory — Every kept change committed. Agent reads history to learn patterns
When stuck, think harder — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions

循环直到完成 — 无限循环：直到被中断。有限循环：完成N次迭代后总结
先读再写 — 修改前必须充分理解完整上下文
每次迭代仅做一次变更 — 原子变更。如果出错，可准确定位原因
仅采用可量化验证 — 不接受主观的"看起来不错"，必须使用指标
自动回滚 — 失败的变更立即回滚，无需争议
简洁优先 — 结果相同+代码更少→保留。微小提升+复杂代码→舍弃
Git作为记忆 — 所有保留的变更都将提交。Agent通过读取历史来学习模式
遇到瓶颈时深入思考 — 重新读取文件、回顾目标、组合近似方案、尝试激进变更。除非确实因权限/访问缺失受阻，否则不要寻求帮助

Principles Reference

原则参考

See

references/core-principles.md

for the 7 generalizable principles from autoresearch.

查看

references/core-principles.md

获取autoresearch的7项通用原则。

Adapting to Different Domains

适配不同领域

Domain	Metric	Scope	Verify Command
Backend code	Tests pass + coverage %	`src/*/.ts`	`npm test`
Frontend UI	Lighthouse score	`src/components/**`	`npx lighthouse`
ML training	val_bpb / loss	`train.py`	`uv run train.py`
Blog/content	Word count + readability	`content/*.md`	Custom script
Performance	Benchmark time (ms)	Target files	`npm run bench`
Refactoring	Tests pass + LOC reduced	Target module	`npm test && wc -l`

Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.

领域	指标	范围	验证命令
后端代码	测试通过 + 覆盖率百分比	`src/*/.ts`	`npm test`
前端UI	Lighthouse分数	`src/components/**`	`npx lighthouse`
ML训练	val_bpb / 损失值	`train.py`	`uv run train.py`
博客/内容	字数 + 可读性分数	`content/*.md`	自定义脚本
性能	基准测试时间（ms）	目标文件	`npm run bench`
代码重构	测试通过 + 代码行数减少	目标模块	`npm test && wc -l`

根据领域适配循环流程。原则是通用的；指标是领域特定的。