research-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFull Research Pipeline: Idea → Experiments → Submission
完整科研工作流:创意 → 实验 → 投稿
End-to-end autonomous research workflow for: $ARGUMENTS
面向**$ARGUMENTS**的端到端自主科研工作流
Constants
常量
- AUTO_PROCEED = true — When , Gate 1 auto-selects the top-ranked idea (highest pilot signal + novelty confirmed) and continues to implementation. When
true, always waits for explicit user confirmation before proceeding.false - ARXIV_DOWNLOAD = false — When ,
truedownloads the top relevant arXiv PDFs during literature survey. When/research-lit(default), only fetches metadata via arXiv API. Passed through tofalse→/idea-discovery./research-lit - HUMAN_CHECKPOINT = false — When , the auto-review loops (Stage 4) pause after each round's review to let you see the score and provide custom modification instructions before fixes are implemented. When
true(default), loops run fully autonomously. Passed through tofalse./auto-review-loop
💡 Override via argument, e.g.,./research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
- AUTO_PROCEED = true — 当设为时,关卡1会自动选择排名第一的创意(最高试点信号+已确认创新性)并继续推进实现。当设为
true时,在推进前始终等待用户明确确认。false - ARXIV_DOWNLOAD = false — 当设为时,
true会在文献调研期间下载相关度最高的arXiv PDF文件。默认设为/research-lit时,仅通过arXiv API获取元数据。该参数会传递给false→/idea-discovery。/research-lit - HUMAN_CHECKPOINT = false — 当设为时,自动评审循环(阶段4)会在每轮评审后暂停,让你查看评分并提供自定义修改指令,之后再执行修复。默认设为
true时,循环完全自主运行。该参数会传递给false。/auto-review-loop
💡 可通过参数覆盖默认设置,例如:./research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
Overview
概述
This skill chains the entire research lifecycle into a single pipeline:
/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── Workflow 1 ──┤ ├────────── Workflow 2 ──────────────┤It orchestrates two major workflows plus the implementation bridge between them.
该技能将整个科研生命周期串联为单一工作流:
/idea-discovery → implement → /run-experiment → /auto-review-loop → submission-ready
├── 工作流1 ──┤ ├────────── 工作流2 ──────────────┤它编排了两大主要工作流,以及连接两者的实现环节。
Pipeline
工作流详情
Stage 1: Idea Discovery (Workflow 1)
阶段1:创意挖掘(工作流1)
Invoke the idea discovery pipeline:
/idea-discovery "$ARGUMENTS"This internally runs: → → →
/research-lit/idea-creator/novelty-check/research-reviewOutput: with ranked, validated, pilot-tested ideas.
IDEA_REPORT.md🚦 Gate 1 — Human Checkpoint:
After is generated, pause and present the top ideas to the user:
IDEA_REPORT.md📋 Idea Discovery complete. Top ideas:
1. [Idea 1 title] — Pilot: POSITIVE (+X%), Novelty: CONFIRMED
2. [Idea 2 title] — Pilot: WEAK POSITIVE (+Y%), Novelty: CONFIRMED
3. [Idea 3 title] — Pilot: NEGATIVE, eliminated
Recommended: Idea 1. Shall I proceed with implementation?If AUTO_PROCEED=false: Wait for user confirmation before continuing. The user may:
- Approve an idea → proceed to Stage 2.
- Pick a different idea → proceed with their choice.
- Request changes (e.g., "combine Idea 1 and 3", "focus more on X") → update the idea prompt with user feedback, re-run with refined constraints, and present again.
/idea-discovery - Reject all ideas → collect feedback on what's missing, re-run Stage 1 with adjusted research direction. Repeat until the user commits to an idea.
- Stop here → save current state to for future reference.
IDEA_REPORT.md
If AUTO_PROCEED=true: Present the top ideas, wait 10 seconds for user input. If no response, auto-select the #1 ranked idea (highest pilot signal + novelty confirmed) and proceed to Stage 2. Log: .
"AUTO_PROCEED: selected Idea 1 — [title]"⚠️ This gate waits for user confirmation when AUTO_PROCEED=false. When, it auto-selects the top idea after presenting results. The rest of the pipeline (Stages 2-4) is expensive (GPU time + multiple review rounds), so settrueif you want to manually choose which idea to pursue.AUTO_PROCEED=false
调用创意挖掘工作流:
/idea-discovery "$ARGUMENTS"该工作流内部会依次执行: → → →
/research-lit/idea-creator/novelty-check/research-review输出:包含经排名、验证和试点测试的创意的文件。
IDEA_REPORT.md🚦 关卡1 — 人工检查点:
生成后,暂停并向用户展示排名靠前的创意:
IDEA_REPORT.md📋 创意挖掘完成。排名靠前的创意:
1. [创意1标题] — 试点结果:POSITIVE (+X%),创新性:已确认
2. [创意2标题] — 试点结果:WEAK POSITIVE (+Y%),创新性:已确认
3. [创意3标题] — 试点结果:NEGATIVE,已淘汰
推荐选择:创意1。是否继续推进实现?若AUTO_PROCEED=false:等待用户确认后再继续。用户可:
- 批准某个创意 → 进入阶段2。
- 选择其他创意 → 按用户选择推进。
- 要求修改(例如:“结合创意1和3”“更聚焦X方向”)→ 根据用户反馈更新创意提示词,使用优化后的约束条件重新运行,然后再次展示结果。
/idea-discovery - 拒绝所有创意 → 收集用户关于缺失点的反馈,调整研究方向后重新运行阶段1。重复此过程直到用户选定创意。
- 在此停止 → 将当前状态保存到供后续参考。
IDEA_REPORT.md
若AUTO_PROCEED=true:展示排名靠前的创意后,等待10秒用户输入。若无响应,自动选择排名第1的创意(最高试点信号+已确认创新性)并进入阶段2。记录:。
"AUTO_PROCEED: selected Idea 1 — [title]"⚠️ 阶段1后的人工检查点由AUTO_PROCEED控制。设为时,无用户确认不得推进。设为false时,展示结果后自动选择排名第一的创意。阶段2-4的计算成本较高(GPU时长+多轮评审),因此若你希望手动选择研究创意,请将true设为AUTO_PROCEED。false
Stage 2: Implementation
阶段2:实现
Once the user confirms which idea to pursue:
-
Read the idea details from(hypothesis, experimental design, pilot code)
IDEA_REPORT.md -
Implement the full experiment:
- Extend pilot code to full scale (multi-seed, full dataset, proper baselines)
- Add proper evaluation metrics and logging (wandb if configured)
- Write clean, reproducible experiment scripts
- Follow existing codebase conventions
-
Code review: Before deploying, do a self-review:
- Are all hyperparameters configurable via argparse?
- Is the random seed fixed and controllable?
- Are results saved to JSON/CSV for later analysis?
- Is there proper logging for debugging?
用户确认研究创意后:
-
读取创意细节:从中获取假设、实验设计和试点代码
IDEA_REPORT.md -
实现完整实验:
- 将试点代码扩展为全规模版本(多随机种子、完整数据集、标准基线)
- 添加标准评估指标和日志记录(若已配置则使用wandb)
- 编写清晰、可复现的实验脚本
- 遵循现有代码库的规范
-
代码自检:部署前进行自我检查:
- 是否所有超参数都可通过argparse配置?
- 随机种子是否固定且可控制?
- 结果是否保存为JSON/CSV格式供后续分析?
- 是否有完善的调试日志?
Stage 3: Deploy Experiments (Workflow 2 — Part 1)
阶段3:部署实验(工作流2 — 第一部分)
Deploy the full-scale experiments:
/run-experiment [experiment command]What this does:
- Check GPU availability on configured servers
- Sync code to remote server
- Launch experiments in screen sessions with proper CUDA_VISIBLE_DEVICES
- Verify experiments started successfully
Monitor progress:
/monitor-experiment [server]Wait for experiments to complete. Collect results.
部署全规模实验:
/run-experiment [experiment command]该命令的作用:
- 检查已配置服务器的GPU可用性
- 将代码同步到远程服务器
- 在screen会话中启动实验,并设置正确的CUDA_VISIBLE_DEVICES
- 验证实验是否成功启动
监控进度:
/monitor-experiment [server]等待实验完成,收集结果。
Stage 4: Auto Review Loop (Workflow 2 — Part 2)
阶段4:自动评审循环(工作流2 — 第二部分)
Once initial results are in, start the autonomous improvement loop:
/auto-review-loop "$ARGUMENTS — [chosen idea title]"What this does (up to 4 rounds):
- GPT-5.4 xhigh reviews the work (score, weaknesses, minimum fixes)
- Claude Code implements fixes (code changes, new experiments, reframing)
- Deploy fixes, collect new results
- Re-review → repeat until score ≥ 6/10 or 4 rounds reached
Output: with full review history and final assessment.
AUTO_REVIEW.md获取初始结果后,启动自主优化循环:
/auto-review-loop "$ARGUMENTS — [chosen idea title]"该循环的执行流程(最多4轮):
- GPT-5.4 xhigh 对工作成果进行评审(评分、不足点、最小修复建议)
- Claude Code 执行修复(代码修改、新实验、重新梳理框架)
- 部署修复方案,收集新结果
- 重新评审 → 重复上述步骤直到评分≥6/10或完成4轮评审
输出:包含完整评审历史和最终评估的文件。
AUTO_REVIEW.mdStage 5: Final Summary
阶段5:最终总结
After the auto-review loop completes, write a final status report:
markdown
undefined自动评审循环完成后,撰写最终状态报告:
markdown
undefinedResearch Pipeline Report
科研工作流报告
Direction: $ARGUMENTS
Chosen Idea: [title]
Date: [start] → [end]
Pipeline: idea-discovery → implement → run-experiment → auto-review-loop
研究方向: $ARGUMENTS
选定创意: [title]
时间范围: [start] → [end]
执行流程: idea-discovery → implement → run-experiment → auto-review-loop
Journey Summary
流程概述
- Ideas generated: X → filtered to Y → piloted Z → chose 1
- Implementation: [brief description of what was built]
- Experiments: [number of GPU experiments, total compute time]
- Review rounds: N/4, final score: X/10
- 生成创意数:X → 筛选后剩余Y → 试点测试Z个 → 选定1个
- 实现情况:[已完成工作的简要描述]
- 实验情况:[GPU实验数量,总计算时长]
- 评审轮次:N/4,最终评分:X/10
Final Status
最终状态
- Ready for submission / [ ] Needs manual follow-up
- 可投稿 / [ ] 需要人工后续跟进
Remaining TODOs (if any)
剩余待办事项(若有)
- [items flagged by reviewer that weren't addressed]
- [评审指出但未解决的问题]
Files Changed
修改的文件
- [list of key files created/modified]
undefined- [创建/修改的关键文件列表]
undefinedKey Rules
核心规则
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash () to write in chunks. Do NOT ask the user for permission — just do it silently.
cat << 'EOF' > file -
Human checkpoint after Stage 1 is controlled by AUTO_PROCEED. When, do not proceed without user confirmation. When
false, auto-select the top idea after presenting results.true -
Stages 2-4 can run autonomously once the user confirms the idea. This is the "sleep and wake up to results" part.
-
If Stage 4 ends at round 4 without positive assessment, stop and report remaining issues. Do not loop forever.
-
Budget awareness: Track total GPU-hours across the pipeline. Flag if approaching user-defined limits.
-
Documentation: Every stage updates its own output file. The full history should be self-contained.
-
Fail gracefully: If any stage fails (no good ideas, experiments crash, review loop stuck), report clearly and suggest alternatives rather than forcing forward.
-
大文件处理:若写入工具因文件大小限制失败,立即使用Bash命令()分块重试写入。无需询问用户许可——直接静默执行。
cat << 'EOF' > file -
阶段1后的人工检查点由AUTO_PROCEED控制。设为时,无用户确认不得推进。设为
false时,展示结果后自动选择排名第一的创意。true -
用户确认创意后,阶段2-4可自主运行。这是“睡前启动,醒来看结果”的环节。
-
若阶段4完成4轮评审仍未获得正面评估,则停止并报告剩余问题。不得无限循环。
-
预算意识:跟踪工作流的总GPU时长。若接近用户设定的限额,及时发出提醒。
-
文档记录:每个阶段都会更新自身的输出文件。完整历史记录应独立可查。
-
优雅容错:若任何阶段失败(无优质创意、实验崩溃、评审循环停滞),清晰报告问题并提供替代方案,而非强行推进。
Typical Timeline
典型时间线
| Stage | Duration | Can sleep? |
|---|---|---|
| 1. Idea Discovery | 30-60 min | Yes if AUTO_PROCEED=true |
| 2. Implementation | 15-60 min | Yes (autonomous after Gate 1) |
| 3. Deploy | 5 min + experiment time | Yes ✅ |
| 4. Auto Review | 1-4 hours (depends on experiments) | Yes ✅ |
Sweet spot: Run Stage 1-2 in the evening, launch Stage 3-4 before bed, wake up to a reviewed paper.
| 阶段 | 时长 | 可后台运行? |
|---|---|---|
| 1. 创意挖掘 | 30-60分钟 | 若AUTO_PROCEED=true则可以 |
| 2. 实现 | 15-60分钟 | 可以(通过关卡1后自主运行) |
| 3. 部署 | 5分钟 + 实验时长 | 可以 ✅ |
| 4. 自动评审 | 1-4小时(取决于实验情况) | 可以 ✅ |
最佳实践:晚上完成阶段1-2,睡前启动阶段3-4,醒来即可获得经过评审的论文初稿。