aris-autonomous-ml-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseARIS — Autonomous ML Research In Sleep
ARIS — 睡眠时自主进行ML研究
Skill by ara.so — Daily 2026 Skills collection
ARIS (Auto-Research-In-Sleep) turns Claude Code into an autonomous ML research engine. It chains idea discovery → cross-model review loops → paper writing → compiled PDF into hands-off overnight pipelines. Claude Code drives execution while an external model (Codex/GPT-5.4, GLM, DeepSeek, Kimi, etc.) acts as adversarial reviewer — breaking self-play blind spots that single-model review cannot escape.
由ara.so开发的Skill — 2026每日技能合集
ARIS(Auto-Research-In-Sleep)将Claude Code转变为自主ML研究引擎。它把创意挖掘 → 跨模型审查循环 → 论文撰写 → PDF编译串联成无需人工干预的隔夜工作流。Claude Code负责执行任务,而外部模型(Codex/GPT-5.4、GLM、DeepSeek、Kimi等)则扮演对抗性审查者的角色——打破单模型审查无法避免的自玩盲区。
What It Does
功能介绍
| Workflow | Trigger | What Runs |
|---|---|---|
| Idea Discovery | | Literature survey → 8–12 ideas → novelty check → pilot GPU runs → ranked report |
| Auto Review Loop | | 4-round review/fix cycle, score tracked per round (e.g. 5/10 → 7.5/10) |
| Paper Writing | | Narrative → outline → figures → LaTeX → PDF → 2-round auto-improvement |
| Full Pipeline | | Chains all three end-to-end from a single prompt |
| 工作流 | 触发指令 | 运行内容 |
|---|---|---|
| 创意挖掘 | | 文献调研 → 生成8-12个创意 → 新颖性检查 → GPU试点运行 → 排名报告 |
| 自动审查循环 | | 4轮审查/修正循环,跟踪每轮得分(例如5/10 → 7.5/10) |
| 论文撰写 | | 叙事构建 → 大纲生成 → 图表制作 → LaTeX编写 → PDF编译 → 2轮自动优化 |
| 完整工作流 | | 从单个提示词出发,串联上述三个流程的端到端工作流 |
Installation
安装步骤
bash
undefinedbash
undefined1. Clone and install skills
1. 克隆并安装技能
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
2. Install Codex MCP (cross-model reviewer)
2. 安装Codex MCP(跨模型审查者)
npm install -g @openai/codex
codex setup # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server
npm install -g @openai/codex
codex setup # 提示时将模型设置为gpt-5.4
claude mcp add codex -s user -- codex mcp-server
3. Verify MCP is connected
3. 验证MCP连接状态
claude mcp list # should show "codex" in the list
undefinedclaude mcp list # 列表中应显示"codex"
undefinedCodex Model Configuration
Codex模型配置
The reviewer model is read from , not from skill files. Edit directly if needed:
~/.codex/config.tomltoml
undefined审查者模型的配置读取自,而非技能文件。如有需要可直接编辑:
~/.codex/config.tomltoml
undefined~/.codex/config.toml
~/.codex/config.toml
model = "gpt-5.4" # recommended — most rigorous reviewer
model = "gpt-5.4" # 推荐使用——审查最严谨
model = "gpt-5.3-codex"
model = "gpt-5.3-codex"
model = "gpt-5.2-codex"
model = "gpt-5.2-codex"
model = "o3"
model = "o3"
undefinedundefinedCore Workflows
核心工作流
Workflow 1 — Idea Discovery
工作流1 — 创意挖掘
/idea-discovery "factorized gap in discrete diffusion language models"Be specific — "NLP" produces weak ideas; "factorized gap in discrete diffusion LMs" targets a real research gap.
What runs:
- Multi-source literature search (arXiv, Scholar, Zotero, Obsidian, local PDFs)
- Claude brainstorms 8–12 candidate ideas
- Codex reviewer cross-checks novelty against literature
- Pilot GPU experiments on top candidates
- Ranked idea report saved to
idea_discovery_report.md
/idea-discovery "离散扩散语言模型中的分解缺口"指令需具体——"NLP"会生成质量一般的创意;"离散扩散LM中的分解缺口"则指向真实的研究缺口。
运行内容:
- 多来源文献检索(arXiv、Scholar、Zotero、Obsidian、本地PDF)
- Claude brainstorm生成8-12个候选创意
- Codex审查者对照文献交叉检查新颖性
- 对排名靠前的创意进行GPU试点实验
- 排名后的创意报告保存至
idea_discovery_report.md
Workflow 2 — Auto Review Loop
工作流2 — 自动审查循环
/auto-review-loopRun from a directory containing your paper draft or experiment results.
What runs:
- Claude submits current work to Codex reviewer
- Codex returns structured critique with score
/10 - Claude implements fixes (experiments, writing, ablations)
- Repeat up to 4 rounds or until score threshold met
- Score curve saved to
docs/auto_review_score_curve.png
/auto-review-loop在包含论文草稿或实验结果的目录中运行该指令。
运行内容:
- Claude将当前成果提交给Codex审查者
- Codex返回带评分(/10)的结构化评审意见
- Claude实施修正(实验、写作、消融实验)
- 重复最多4轮,或直至达到分数阈值
- 得分曲线保存至
docs/auto_review_score_curve.png
Workflow 3 — Paper Writing
工作流3 — 论文撰写
/paper-writing "NARRATIVE_REPORT.md"Point at a narrative markdown file describing your findings.
What runs:
- Outline generation (sections, figures, tables)
- Figure generation from experiment results
- LaTeX source assembly
- compilation
pdflatex - 2-round auto-review-and-improve cycle
- Final PDF + anti-hallucination BibTeX (fetched from DBLP/CrossRef)
/paper-writing "NARRATIVE_REPORT.md"指定一个描述研究发现的叙事markdown文件。
运行内容:
- 生成大纲(章节、图表、表格)
- 根据实验结果生成图表
- 组装LaTeX源码
- 编译
pdflatex - 2轮自动审查与优化循环
- 最终PDF+防幻觉BibTeX(从DBLP/CrossRef获取)
Full Pipeline
完整工作流
/research-pipeline "your research direction"Chains Workflows 1 → 2 → 3 from a single prompt. Wake up to a scored, compiled paper.
/research-pipeline "你的研究方向"从单个提示词出发,串联工作流1→2→3。一觉醒来即可获得带评分的编译完成的论文。
Inline Configuration Overrides
内联配置覆盖
Append to any command:
— key: value/research-pipeline "topic" — AUTO_PROCEED: false
/research-pipeline "topic" — human checkpoint: true
/research-pipeline "topic" — arxiv download: true
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true| Parameter | Default | Effect |
|---|---|---|
| | |
| | |
| | |
| | |
在任意命令后追加即可覆盖配置:
— key: value/research-pipeline "主题" — AUTO_PROCEED: false
/research-pipeline "主题" — human checkpoint: true
/research-pipeline "主题" — arxiv download: true
/research-pipeline "主题" — AUTO_PROCEED: false, human checkpoint: true| 参数 | 默认值 | 作用 |
|---|---|---|
| | |
| | |
| | |
| | |
Alternative Model Combinations
替代模型组合
No Claude or OpenAI API required — swap any OpenAI-compatible endpoint via the MCP server:
llm-chatbash
undefined无需Claude或OpenAI API——通过 MCP服务器可替换为任意兼容OpenAI的端点:
llm-chatbash
undefinedInstall the bundled llm-chat MCP server
安装捆绑的llm-chat MCP服务器
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt
Configure your provider
配置你的供应商
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4
export LLM_CHAT_API_KEY="your-key"
export LLM_CHAT_MODEL="glm-4-plus"
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4
export LLM_CHAT_API_KEY="你的密钥"
export LLM_CHAT_MODEL="glm-4-plus"
Add to Claude Code
添加到Claude Code
claude mcp add llm-chat -s user -- python server.py
**Tested reviewer models:**
| Provider | Model | Notes |
|---|---|---|
| OpenAI | `gpt-5.4` | Recommended — most rigorous |
| Zhipu AI | `glm-4-plus` | Strong Chinese-language papers |
| MiniMax | `abab6.5s-chat` | Fast, cost-effective |
| Moonshot | `moonshot-v1-128k` | Kimi — long-context papers |
| DeepSeek | `deepseek-chat` | Code-heavy experiments |
| 01.ai | `yi-large` | LongCat — long context |claude mcp add llm-chat -s user -- python server.py
**已测试的审查者模型:**
| 供应商 | 模型 | 说明 |
|---|---|---|
| OpenAI | `gpt-5.4` | 推荐——审查最严谨 |
| 智谱AI | `glm-4-plus` | 擅长中文论文 |
| MiniMax | `abab6.5s-chat` | 速度快、性价比高 |
| Moonshot | `moonshot-v1-128k` | Kimi——长上下文论文 |
| DeepSeek | `deepseek-chat` | 代码密集型实验 |
| 01.ai | `yi-large` | LongCat——长上下文 |Anti-Hallucination Citations
防幻觉引用
BibTeX is fetched from real databases by default — no manual flag needed:
python
undefined默认情况下BibTeX从真实数据库获取——无需手动设置:
python
undefinedskills/paper-writing/citation_fetcher.py pattern used internally
内部使用的skills/paper-writing/citation_fetcher.py代码逻辑
import requests
def fetch_bibtex_dblp(title: str) -> str | None:
"""Fetch real BibTeX from DBLP by paper title."""
resp = requests.get(
"https://dblp.org/search/publ/api",
params={"q": title, "format": "json", "h": 1}
)
hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
if not hits:
return None
key = hits[0]["info"].get("key", "")
bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None:
"""Fallback: fetch BibTeX from CrossRef by DOI."""
resp = requests.get(
f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
)
return resp.text if resp.ok else None
Disable with `— DBLP_BIBTEX: false` if working fully offline.import requests
def fetch_bibtex_dblp(title: str) -> str | None:
"""根据论文标题从DBLP获取真实BibTeX。"""
resp = requests.get(
"https://dblp.org/search/publ/api",
params={"q": title, "format": "json", "h": 1}
)
hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
if not hits:
return None
key = hits[0]["info"].get("key", "")
bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None:
"""备选方案:根据DOI从CrossRef获取BibTeX。"""
resp = requests.get(
f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
)
return resp.text if resp.ok else None
若完全离线工作,可通过`— DBLP_BIBTEX: false`禁用该功能。Optional Integrations
可选集成
Zotero
Zotero
bash
undefinedbash
undefinedInstall Zotero Better BibTeX plugin, then:
安装Zotero Better BibTeX插件,然后:
export ZOTERO_API_KEY="your-zotero-web-api-key"
export ZOTERO_LIBRARY_ID="your-library-id"
export ZOTERO_LIBRARY_TYPE="user" # or "group"
Literature search will query your Zotero library before hitting arXiv.export ZOTERO_API_KEY="你的Zotero Web API密钥"
export ZOTERO_LIBRARY_ID="你的图书馆ID"
export ZOTERO_LIBRARY_TYPE="user" # 或"group"
文献调研时会先查询你的Zotero图书馆,再访问arXiv。Obsidian
Obsidian
bash
export OBSIDIAN_VAULT_PATH="/path/to/your/vault"Skill will search markdown notes in the vault for related work before external queries.
bash
export OBSIDIAN_VAULT_PATH="/你的知识库路径"技能会先搜索知识库中的markdown笔记,再进行外部查询。
Feishu / Lark Notifications
飞书/ Lark通知
bash
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push" # off | push | interactive| Mode | Behaviour |
|---|---|
| No notifications |
| One-way alerts: review scores, experiment completions, checkpoints |
| Mobile approval buttons at |
bash
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/你的令牌"
export FEISHU_MODE="push" # off | push | interactive| 模式 | 行为 |
|---|---|
| 无通知 |
| 单向提醒:审查分数、实验完成、检查点 |
| 在 |
Directory Layout After a Pipeline Run
工作流运行后的目录结构
your-project/
├── idea_discovery_report.md # ranked ideas with novelty scores
├── NARRATIVE_REPORT.md # auto-generated findings narrative
├── paper/
│ ├── main.tex # assembled LaTeX
│ ├── main.pdf # compiled output
│ ├── figures/ # auto-generated plots
│ └── references.bib # real BibTeX from DBLP/CrossRef
├── experiments/
│ ├── pilot_runs/ # idea-discovery GPU pilots
│ └── review_round_*/ # per-round experiment results
└── docs/
└── auto_review_score_curve.pngyour-project/
├── idea_discovery_report.md # 带新颖性评分的创意排名报告
├── NARRATIVE_REPORT.md # 自动生成的研究发现叙事
├── paper/
│ ├── main.tex # 组装好的LaTeX文件
│ ├── main.pdf # 编译后的输出文件
│ ├── figures/ # 自动生成的图表
│ └── references.bib # 从DBLP/CrossRef获取的真实BibTeX
├── experiments/
│ ├── pilot_runs/ # 创意挖掘阶段的GPU试点实验
│ └── review_round_*/ # 每轮审查的实验结果
└── docs/
└── auto_review_score_curve.pngPython Integration Pattern
Python集成方式
Trigger ARIS workflows programmatically from a Python script (e.g. a cron job or CI step):
python
import subprocess
import json
from pathlib import Path
def run_aris_pipeline(
research_direction: str,
output_dir: str = ".",
auto_proceed: bool = True,
human_checkpoint: bool = False,
arxiv_download: bool = False,
) -> dict:
"""
Launch ARIS full pipeline via Claude Code CLI.
Returns parsed score progression from the review curve JSON.
"""
overrides = ", ".join([
f"AUTO_PROCEED: {str(auto_proceed).lower()}",
f"human checkpoint: {str(human_checkpoint).lower()}",
f"arxiv download: {str(arxiv_download).lower()}",
])
command = f'/research-pipeline "{research_direction}" — {overrides}'
result = subprocess.run(
["claude", "--print", command],
cwd=output_dir,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(f"ARIS pipeline failed:\n{result.stderr}")
# Parse score progression if available
score_json = Path(output_dir) / "docs" / "review_scores.json"
if score_json.exists():
return json.loads(score_json.read_text())
return {"stdout": result.stdout}通过Python脚本(如定时任务或CI步骤)以编程方式触发ARIS工作流:
python
import subprocess
import json
from pathlib import Path
def run_aris_pipeline(
research_direction: str,
output_dir: str = ".",
auto_proceed: bool = True,
human_checkpoint: bool = False,
arxiv_download: bool = False,
) -> dict:
"""
通过Claude Code CLI启动ARIS完整工作流。
返回审查曲线JSON中的解析后得分变化。
"""
overrides = ", ".join([
f"AUTO_PROCEED: {str(auto_proceed).lower()}",
f"human checkpoint: {str(human_checkpoint).lower()}",
f"arxiv download: {str(arxiv_download).lower()}",
])
command = f'/research-pipeline "{research_direction}" — {overrides}'
result = subprocess.run(
["claude", "--print", command],
cwd=output_dir,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(f"ARIS工作流失败:\n{result.stderr}")
# 若存在则解析得分变化
score_json = Path(output_dir) / "docs" / "review_scores.json"
if score_json.exists():
return json.loads(score_json.read_text())
return {"stdout": result.stdout}Example: nightly research job
示例:夜间研究任务
if name == "main":
scores = run_aris_pipeline(
research_direction="token-level uncertainty calibration in autoregressive LMs",
output_dir="./nightly_research",
auto_proceed=True,
human_checkpoint=False,
)
print(f"Final review score: {scores.get('rounds', [{}])[-1].get('score')}/10")
undefinedif name == "main":
scores = run_aris_pipeline(
research_direction="自回归语言模型中的token级不确定性校准",
output_dir="./nightly_research",
auto_proceed=True,
human_checkpoint=False,
)
print(f"最终审查得分: {scores.get('rounds', [{}])[-1].get('score')}/10")
undefinedSkill Composition
Skill组合
ARIS ships 20 composable sub-skills. Chain them manually for custom workflows:
undefinedARIS内置20个可组合的子Skill。可手动串联它们以构建自定义工作流:
undefinedLiterature only
仅文献调研
/literature-survey "topic"
/literature-survey "主题"
Brainstorm without pilot experiments
仅brainstorm创意,不做试点实验
/idea-brainstorm "topic" — pilot experiments: false
/idea-brainstorm "主题" — pilot experiments: false
Single review round (no loop)
单轮审查(无循环)
/single-review "path/to/draft.md"
/single-review "path/to/draft.md"
Proof-writing (community skill)
证明撰写(社区Skill)
/proof-writer "theorem statement"
/proof-writer "定理陈述"
Write paper from existing narrative, skip review
基于现有叙事撰写论文,跳过审查
/paper-writing "NARRATIVE.md" — auto-review: false
undefined/paper-writing "NARRATIVE.md" — auto-review: false
undefinedTroubleshooting
故障排除
Codex MCP not found
bash
claude mcp list # verify "codex" appears
codex setup # re-run setup if missing
claude mcp remove codex && \
claude mcp add codex -s user -- codex mcp-server # re-addSkills not loading in Claude Code
bash
ls ~/.claude/skills/ # verify files copiedCodex MCP未找到
bash
claude mcp list # 验证"codex"是否存在
codex setup # 若缺失则重新运行设置
claude mcp remove codex && \
claude mcp add codex -s user -- codex mcp-server # 重新添加Claude Code中Skill未加载
bash
ls ~/.claude/skills/ # 验证文件已复制Each skill must be a directory with SKILL.md inside
每个Skill必须是包含SKILL.md的目录
ls ~/.claude/skills/auto-review-loop/SKILL.md
**`pdflatex` not found during paper writing**
```bashls ~/.claude/skills/auto-review-loop/SKILL.md
**论文撰写时提示`pdflatex`未找到**
```bashmacOS
macOS
brew install --cask mactex-no-gui
brew install --cask mactex-no-gui
Ubuntu/Debian
Ubuntu/Debian
sudo apt install texlive-full
sudo apt install texlive-full
Then retry — skill auto-detects pdflatex on PATH
然后重试——Skill会自动检测PATH中的pdflatex
**Reviewer returns empty critique**
Check `~/.codex/config.toml` — ensure `model` is set and your API key is valid:
```bash
codex "say hello" # quick smoke test outside Claude CodeGLM/DeepSeek reviewer not triggering
Verify MCP server is listed:
llm-chatbash
claude mcp list # should show "llm-chat"
echo $LLM_CHAT_BASE_URL # must be set in the shell that launches claudeScore not improving after 4 rounds
- Add and inspect each round's critique file in
— human checkpoint: trueexperiments/review_round_*/ - Consider switching reviewer model — a different architecture surfaces different weaknesses
- Lower-level issues (bad data, flawed baseline) need manual intervention before another loop
**审查者返回空评审意见**
检查`~/.codex/config.toml`——确保`model`已设置且API密钥有效:
```bash
codex "say hello" # 在Claude Code外进行快速测试GLM/DeepSeek审查者未触发
验证 MCP服务器已在列表中:
llm-chatbash
claude mcp list # 应显示"llm-chat"
echo $LLM_CHAT_BASE_URL # 启动claude的shell中必须已设置该环境变量4轮后得分仍未提升
- 添加,检查
— human checkpoint: true中每轮的评审意见文件experiments/review_round_*/ - 考虑更换审查者模型——不同架构会发现不同的问题
- 底层问题(数据质量差、基线有缺陷)需要人工干预后再进行循环
Community Skills
社区Skills
| Skill | Description |
|---|---|
| Rigorous theorem proof drafting with anti-hallucination citations |
Add your own skill: create and open a PR.
skills/your-skill-name/SKILL.md| Skill | 描述 |
|---|---|
| 带防幻觉引用的严谨定理证明撰写 |
添加自定义Skill:创建并提交PR。
skills/your-skill-name/SKILL.mdCross-Model Review — Why It Works
跨模型审查——为何有效
Claude Code (executor) Codex / external LLM (reviewer)
───────────────────── ───────────────────────────────
Fast, fluid code execution ←→ Deliberate, rigorous critique
Broad context retention Adversarial probing of blind spots
Narrative generation Structural weakness detectionSingle-model self-review falls into local minima — the same pattern-matching that generated the work also evaluates it. Cross-model review is adversarial: the reviewer actively probes weaknesses the executor didn't anticipate. The 1→2 model jump produces the largest quality gain; adding more reviewers yields diminishing returns.
Claude Code(执行者) Codex / 外部大语言模型(审查者)
───────────────────── ───────────────────────────────
快速流畅的代码执行 ←→ 审慎严谨的评审意见
广泛的上下文保留能力 对抗性探测盲区
叙事生成能力 结构缺陷检测单模型自审查会陷入局部最优——生成成果的模式匹配逻辑同样用于评估成果。跨模型审查是对抗性的:审查者会主动探测执行者未预料到的问题。从1个模型增加到2个模型时质量提升最显著;添加更多审查者的收益会递减。