codex-autoresearch-loop
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCodex Autoresearch
Codex Autoresearch
Skill by ara.so — Daily 2026 Skills collection.
Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.
由ara.so开发的Skill — 属于Daily 2026 Skills合集。
Codex Autoresearch是一款Codex Skill,可在你的代码库中自主运行「修改→验证→保留/回滚」的循环流程。你用一句话描述可量化目标,Codex会确认计划,然后无人值守地迭代——每次改进都会提交到git,每次失败都会自动回滚——直到被中断或达到次数上限。该工具灵感来自Karpathy的autoresearch概念,从机器学习训练场景推广到了所有软件指标优化场景。
Installation
安装
Option A — manual copy into your project:
bash
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearchOption B — Codex skill installer:
text
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearchThe skill lives at inside your project. No config file is required before first use.
.agents/skills/codex-autoresearch/选项A — 手动复制到项目中:
bash
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch选项B — 使用Codex Skill安装器:
text
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch该Skill会存放在项目内的路径下,首次使用无需配置文件。
.agents/skills/codex-autoresearch/How to Activate
激活方式
Open Codex in your project directory and prefix your goal with :
$codex-autoresearchtext
$codex-autoresearch
I want to get rid of all `any` types in my TypeScript codeCodex will:
- Scan the repo and infer scope, metric, verify command, and guard command.
- Present a confirmation summary — reply (or correct anything).
go - Run the loop unattended until you interrupt it or the goal is met.
You never write config. Codex infers everything.
在项目目录中打开Codex,然后在你的目标前加上前缀:
$codex-autoresearchtext
$codex-autoresearch
我要移除TypeScript代码中所有的`any`类型Codex会执行以下操作:
- 扫描代码仓库,推断范围、指标、验证命令和防护命令。
- 展示确认摘要——回复(或修改任何内容)即可开始。
go - 无人值守运行循环,直到你中断或达成目标。
你无需编写配置文件,Codex会自动推断所有信息。
Confirmation Flow
确认流程
Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:
Codex: I found 47 `any` occurrences across src/**/*.ts.
Confirmed:
- Target: eliminate `any` types in src/**/*.ts
- Metric: `any` count (current: 47), direction: lower
- Verify: grep + tsc --noEmit as guard
Need to confirm:
- Run until all gone, or cap at N iterations?
Reply "go" to start, or tell me what to change.
You: Go, run overnight.
Codex: Starting — baseline: 47. Iterating until interrupted.Up to five confirmation rounds are possible. After that, Codex proceeds.
在循环开始前,Codex总会展示它发现的信息并请求你确认。以下是示例对话:
Codex: 我在src/**/*.ts文件中发现了47处`any`类型。
已确认内容:
- 目标:消除src/**/*.ts中的`any`类型
- 指标:`any`类型数量(当前:47),方向:减少
- 验证:使用grep + tsc --noEmit作为防护手段
需要确认的内容:
- 运行到全部消除为止,还是设置N次迭代上限?
回复"go"开始,或告诉我需要修改的内容。
你: 开始,通宵运行。
Codex: 启动中——基准值:47。将持续迭代直到被中断。最多会进行5轮确认,之后Codex会自动继续执行。
The Loop (internals)
循环流程(内部机制)
PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume
PHASE 1: Read context + lessons file from prior run (if any)
LOOP (forever or N times):
1. Review current state, git history, results log, lessons
2. Pick ONE hypothesis (apply perspectives, filter by environment)
-- or N hypotheses if parallel mode is active
3. Make ONE atomic change
4. git commit (before verification)
5. Run verify command → did the target metric improve?
Run guard command → did anything else break?
6. Improved → keep (extract lesson)
Worse → approved rollback strategy (git revert)
Crashed → fix or skip
7. Log the result to results log
8. Health check (disk, git, verify health)
9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search
10. Repeat. Never stop. Never ask.The loop runs unbounded unless you say during confirmation.
Iterations: NPHASE 0: 探测环境(CPU/GPU/内存/工具链),检查是否可恢复会话
PHASE 1: 读取上下文 + 上一次运行的经验文件(如果存在)
循环(无限次或N次):
1. 回顾当前状态、git历史、结果日志、经验总结
2. 选择一个假设(结合视角,根据环境过滤)
-- 如果开启并行模式,则选择N个假设
3. 进行一次原子性修改
4. git提交(在验证前)
5. 运行验证命令 → 目标指标是否有提升?
运行防护命令 → 是否有其他内容被破坏?
6. 指标提升 → 保留修改(提取经验)
指标恶化 → 执行已批准的回滚策略(git revert)
运行崩溃 → 修复或跳过
7. 将结果记录到结果日志
8. 健康检查(磁盘、git、验证机制状态)
9. 如果连续3次丢弃修改 → **优化**;连续5次 → **转向**;2次转向后 → 网页搜索
10. 重复循环。永不停止,永不询问。除非你在确认时指定,否则循环会无限运行。
Iterations: NDual-Gate Verification
双验证门机制
Two commands serve distinct purposes:
| Gate | Purpose | Fails means |
|---|---|---|
| Verify | Did the target metric improve? | Change discarded, reverted |
| Guard | Did anything else break? | Change reworked (up to 2 attempts), then reverted |
Guard files are never modified by the loop.
Example verify + guard pair for a Python coverage run:
text
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-importsExample for TypeScript type cleanup:
text
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmit两个命令分别承担不同的作用:
| 验证门 | 作用 | 失败意味着 |
|---|---|---|
| Verify(验证) | 目标指标是否有提升? | 修改会被丢弃并回滚 |
| Guard(防护) | 是否有其他内容被破坏? | 修改会被重新处理(最多2次尝试),之后回滚 |
防护文件永远不会被循环修改。
以下是Python覆盖率测试的验证+防护命令示例:
text
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-importsTypeScript类型清理的示例:
text
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmitModes
运行模式
Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly.
Codex会自动将你的目标语句映射到7种模式之一——你无需手动选择模式。
loop
— iterate toward a measurable target (default)
looploop
模式 — 迭代逼近可量化目标(默认)
looptext
$codex-autoresearch
Improve test coverage in src/ to at least 80%text
$codex-autoresearch
Reduce bundle size — it's currently 2.3 MB, get it under 1 MBtext
$codex-autoresearch
将src/目录的测试覆盖率提升至至少80%text
$codex-autoresearch
减小打包体积——当前为2.3 MB,目标降至1 MB以下plan
— turn a vague goal into a validated loop config
planplan
模式 — 将模糊目标转化为可验证的循环配置
plantext
$codex-autoresearch
I want to make our API faster but I don't know where to startCodex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config.
text
$codex-autoresearch
我想让我们的API更快,但不知道从哪里入手Codex会询问你相关问题(关注p95延迟还是吞吐量?针对哪个端点?),并生成可直接运行的循环配置。
fix
— repair errors until count reaches zero
fixfix
模式 — 修复错误直到数量为零
fixtext
$codex-autoresearch
pytest is failing, 12 tests broken after the refactor — fix them alltext
$codex-autoresearch
pytest运行失败,重构后有12个测试用例损坏——全部修复debug
— evidence-driven root-cause hunting
debugdebug
模式 — 基于证据的根因排查
debugtext
$codex-autoresearch
Our API returns 503 randomly under load, no idea whyEach iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses.
text
$codex-autoresearch
我们的API在负载下会随机返回503错误,不知道原因每次迭代测试一个可证伪的假设。Codex会展示证据,而非猜测。
security
— read-only STRIDE + OWASP audit
securitysecurity
模式 — 只读的STRIDE + OWASP审计
securitytext
$codex-autoresearch
Is this code secure?text
$codex-autoresearch
这段代码安全吗?ship
— readiness verification and release gating
shipship
模式 — 发布就绪验证与发布闸门
shiptext
$codex-autoresearch
Ship ittext
$codex-autoresearch
发布上线exec
— one-shot execution with no loop
execexec
模式 — 单次执行无循环
exectext
$codex-autoresearch
Run the benchmark suite and summarize resultstext
$codex-autoresearch
运行基准测试套件并总结结果Inline Configuration (optional)
内联配置(可选)
You can override defaults inline during the confirmation step — no file edits needed:
| Phrase | Effect |
|---|---|
| Cap the loop at 20 iterations |
| Test 3 hypotheses concurrently per round |
| Override the inferred guard command |
| Override the inferred verify command |
| Restrict changes to a subdirectory |
Example during confirmation:
You: Go. Iterations: 30, Guard: npm test, Scope: src/api/你可以在确认步骤中通过内联语句覆盖默认设置——无需编辑文件:
| 语句 | 效果 |
|---|---|
| 将循环上限设置为20次 |
| 每轮同时测试3个假设 |
| 覆盖自动推断的防护命令 |
| 覆盖自动推断的验证命令 |
| 将修改限制在指定子目录 |
确认时的示例:
你: 开始。Iterations: 30, Guard: npm test, Scope: src/api/Cross-Run Learning
跨会话学习
At the end of each iteration Codex writes a structured lesson to :
.agents/skills/codex-autoresearch/lessons.mdIteration 7 — KEPT
Hypothesis: replace explicit `any` with inferred generic in src/utils/mapper.ts
Change: added <T extends Record<string, unknown>> to mapKeys()
Result: any count 31 → 29
Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.On session resume Codex reads this file first. Each new run benefits from prior runs.
To resume an interrupted run:
text
$codex-autoresearch
ResumeCodex re-reads the lessons file, checks git state, re-establishes the baseline, and continues.
每次迭代结束后,Codex会将结构化的经验写入文件:
.agents/skills/codex-autoresearch/lessons.md第7次迭代 — 保留修改
假设:将src/utils/mapper.ts中的显式`any`替换为推断泛型
修改:为mapKeys()添加<T extends Record<string, unknown>>约束
结果:`any`类型数量从31降至29
经验:对工具函数添加泛型约束可消除下游大量的`any`类型。恢复会话时,Codex会首先读取该文件。每次新运行都会受益于之前的运行经验。
恢复中断的运行:
text
$codex-autoresearch
恢复Codex会重新读取经验文件,检查git状态,重建基准值,然后继续运行。
Parallel Experiments
并行实验
Request parallel mode during confirmation or at any time:
text
You: Go, parallel 4Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large.
在确认时或任何时候请求并行模式:
text
你: 开始,并行4个假设Codex会同时运行4个假设,保留最佳结果,丢弃其余的。当假设空间较大时非常有用。
Pivot Protocol
转向协议
If the loop stalls, escalation happens automatically:
| Consecutive discards | Action |
|---|---|
| 3 | REFINE — narrow hypothesis, try smaller atomic changes |
| 5 | PIVOT — change strategy entirely |
| 2 PIVOTs | Web search — Codex fetches external references to unstick itself |
You are never asked for permission during escalation. The loop continues.
如果循环陷入停滞,会自动触发升级流程:
| 连续丢弃修改次数 | 操作 |
|---|---|
| 3 | 优化 — 缩小假设范围,尝试更小的原子性修改 |
| 5 | 转向 — 完全改变策略 |
| 2次转向 | 网页搜索 — Codex会获取外部参考资料以打破僵局 |
升级过程中不会请求你的许可,循环会持续运行。
Real Code Examples
真实代码示例
Example 1 — TypeScript any
elimination (Python verify script)
any示例1 — TypeScript any
类型消除(Python验证脚本)
anyIf you want a custom verify script instead of a one-liner:
python
undefined如果你不想使用单行命令,而是自定义验证脚本:
python
undefinedscripts/count_any.py
scripts/count_any.py
import subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
Tell Codex during confirmation:
```text
Verify: python scripts/count_any.py
Guard: npx tsc --noEmitimport subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
在确认时告知Codex:
```text
Verify: python scripts/count_any.py
Guard: npx tsc --noEmitExample 2 — pytest coverage loop (Python)
示例2 — pytest覆盖率循环(Python)
python
undefinedpython
undefinedscripts/coverage_pct.py
scripts/coverage_pct.py
import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
```text
$codex-autoresearch
Improve test coverage — target 85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
```text
$codex-autoresearch
提升测试覆盖率——目标85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50Example 3 — bundle size loop (Node.js project)
示例3 — 打包体积循环(Node.js项目)
bash
undefinedbash
undefinedscripts/bundle_size.sh
scripts/bundle_size.sh
#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
```text
$codex-autoresearch
Reduce our JS bundle size, currently ~2300 KB, target under 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
```text
$codex-autoresearch
减小我们的JS打包体积,当前约2300 KB,目标降至900 KB以下
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900Example 4 — lint warning count (any language)
示例4 — 代码检查警告计数(任意语言)
bash
undefinedbash
undefinedscripts/lint_count.sh
scripts/lint_count.sh
#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
```text
$codex-autoresearch
Get our ESLint warning count to zero
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
```text
$codex-autoresearch
将ESLint警告计数降至零
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0Unattended Runs
无人值守运行
For overnight or long runs, ensure Codex CLI approval settings do not interrupt or commands. The simplest option is to run in a disposable or sandboxed repo clone:
git commitgit revertbash
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox对于通宵或长时间运行,请确保Codex CLI的权限设置不会中断或命令。最简单的方法是在临时或沙箱化的代码仓库克隆中运行:
git commitgit revertbash
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandboxlaunch Codex here with full permissions
在此处启动拥有完整权限的Codex
Results accumulate in git history. Pull the winning commits back to your main repo when done:
```bash
结果会累积在git历史中。完成后,将成功的提交合并回主仓库:
```bashin your main repo
在你的主仓库中
git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
---git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
---Session Artifacts
会话产物
| File | Contents |
|---|---|
| Structured lessons from every iteration |
| Full per-iteration log (metric value, kept/reverted, elapsed) |
| Current session state for resume |
These files persist across Codex sessions. Delete them to start fresh.
| 文件 | 内容 |
|---|---|
| 每次迭代的结构化经验总结 |
| 完整的逐次迭代日志(指标值、保留/回滚、耗时) |
| 当前会话状态,用于恢复运行 |
这些文件会在Codex会话间保留。删除它们可重新开始。
Troubleshooting
故障排除
Loop reverts every change:
- Verify command may be returning a non-numeric value. Test it manually: should print a single number.
bash -c "<your verify command>" - Metric direction may be wrong. Confirm or
Direction: lowerduring setup.Direction: higher
Guard fires on unrelated files:
- Narrow scope:
Scope: src/specific-module/ - Or tell Codex explicitly: during confirmation.
Do not touch tests/
Session resume picks up wrong baseline:
- Delete to force a fresh baseline:
session.jsonrm .agents/skills/codex-autoresearch/session.json
Parallel mode produces merge conflicts:
- Codex handles this internally via the pivot protocol, but if it gets stuck, reduce parallelism:
Parallel: 2
Codex asks questions mid-loop:
- This means a guard crash produced ambiguous output. Pre-empt it by specifying if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.
Guard: <command> || true
Loop hits PIVOT but makes no progress:
- Supply a seed hypothesis during confirmation:
Hint: try tree-shaking unused imports first - Or run mode first to produce a richer hypothesis list before switching to
plan.loop
循环回滚所有修改:
- 验证命令可能返回了非数值结果。手动测试:应输出单个数字。
bash -c "<你的验证命令>" - 指标方向可能错误。在设置时确认或
Direction: lower。Direction: higher
防护命令触发无关文件的错误:
- 缩小范围:
Scope: src/specific-module/ - 或在确认时明确告知Codex:
不要修改tests/目录
恢复会话时基准值错误:
- 删除以强制重新建立基准值:
session.jsonrm .agents/skills/codex-autoresearch/session.json
并行模式产生合并冲突:
- Codex会通过转向协议内部处理此问题,但如果陷入停滞,可减少并行数:
Parallel: 2
Codex在循环中提问:
- 这意味着防护命令崩溃产生了模糊输出。可通过指定来预防(如果防护失败应视为非致命),或给予Codex更充分的沙箱权限使其可自由运行git命令。
Guard: <command> || true
循环触发转向但无进展:
- 在确认时提供初始假设:
提示:先尝试摇树优化未使用的导入 - 或先运行模式,在切换到
plan模式前生成更丰富的假设列表。loop
Quick Reference
快速参考
text
undefinedtext
undefinedStart a loop
启动循环
$codex-autoresearch
<your goal in one sentence>
$codex-autoresearch
<你的目标语句>
Resume interrupted run
恢复中断的运行
$codex-autoresearch
Resume
$codex-autoresearch
恢复
Bounded run
有限次数运行
$codex-autoresearch
<goal> — Iterations: 25
$codex-autoresearch
<目标> — Iterations: 25
Parallel hypotheses
并行假设
$codex-autoresearch
<goal> — Parallel: 4
$codex-autoresearch
<目标> — Parallel: 4
Force a mode
强制指定模式
$codex-autoresearch fix
pytest has 8 failures, repair them
$codex-autoresearch fix
pytest有8个失败用例,修复它们
Read-only audit
只读审计
$codex-autoresearch security
Audit src/api/ for injection vulnerabilities
undefined$codex-autoresearch security
审计src/api/中的注入漏洞
undefined