Loading...
Loading...
Compare original and translation side by side
/research-lit/novelty-check/research-review/research-lit/novelty-check/research-reviewgpt-5.4gpt-5.4o3gpt-4o💡 Override via argument, e.g.,./idea-creator "topic" — pilot budget: 4h per idea, 20h total
gpt-5.4gpt-5.4o3gpt-4o💡 可通过参数覆盖配置,例如:。/idea-creator "topic" — pilot budget: 4h per idea, 20h total
papers/literature/papers/literature/mcp__codex__codex:
model: REVIEWER_MODEL
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are a senior ML researcher brainstorming research ideas.
Research direction: [user's direction]
Here is the current landscape:
[paste landscape map from Phase 1]
Key gaps identified:
[paste gaps from Phase 1]
Generate 8-12 concrete research ideas. For each idea:
1. One-sentence summary
2. Core hypothesis (what you expect to find and why)
3. Minimum viable experiment (what's the cheapest way to test this?)
4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
6. Estimated effort: days / weeks / months
Prioritize ideas that are:
- Testable with moderate compute (8x RTX 3090 or less)
- Likely to produce a clear positive OR negative result (both are publishable)
- Not "apply X to Y" unless the application reveals genuinely surprising insights
- Differentiated from the 10-15 papers above
Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes.mcp__codex__codex:
model: REVIEWER_MODEL
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are a senior ML researcher brainstorming research ideas.
Research direction: [user's direction]
Here is the current landscape:
[paste landscape map from Phase 1]
Key gaps identified:
[paste gaps from Phase 1]
Generate 8-12 concrete research ideas. For each idea:
1. One-sentence summary
2. Core hypothesis (what you expect to find and why)
3. Minimum viable experiment (what's the cheapest way to test this?)
4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
6. Estimated effort: days / weeks / months
Prioritize ideas that are:
- Testable with moderate compute (8x RTX 3090 or less)
- Likely to produce a clear positive OR negative result (both are publishable)
- Not "apply X to Y" unless the application reveals genuinely surprising insights
- Differentiated from the 10-15 papers above
Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes./novelty-check/novelty-check/novelty-checkmcp__codex__codex-replyHere are our top ideas after filtering:
[paste surviving ideas with novelty check results]
For each, play devil's advocate:
- What's the strongest objection a reviewer would raise?
- What's the most likely failure mode?
- How would you rank these for a top venue submission?
- Which 2-3 would you actually work on?/novelty-checkmcp__codex__codex-replyHere are our top ideas after filtering:
[paste surviving ideas with novelty check results]
For each, play devil's advocate:
- What's the strongest objection a reviewer would raise?
- What's the most likely failure mode?
- How would you rank these for a top venue submission?
- Which 2-3 would you actually work on?/run-experimentGPU 0: Pilot for Idea 1
GPU 1: Pilot for Idea 2
GPU 2: Pilot for Idea 3run_in_background: true/monitor-experiment/run-experimentGPU 0: Pilot for Idea 1
GPU 1: Pilot for Idea 2
GPU 2: Pilot for Idea 3run_in_background: true/monitor-experimentIDEA_REPORT.mdundefinedIDEA_REPORT.mdundefined| Idea | Reason eliminated |
|---|---|
| ... | Already done by [paper] |
| ... | Requires > 1 week GPU time |
| ... | Result wouldn't be interesting either way |
| Idea | Reason eliminated |
|---|---|
| ... | Already done by [paper] |
| ... | Requires > 1 week GPU time |
| ... | Result wouldn't be interesting either way |
| Idea | GPU | Time | Key Metric | Signal |
|---|---|---|---|---|
| Idea 1 | GPU 0 | 45 min | +2.3% CE | POSITIVE |
| Idea 2 | GPU 1 | 30 min | -0.1% CE | NEGATIVE |
| Idea 3 | GPU 2 | 1.5 hr | +0.8% CE | WEAK POSITIVE |
| Idea | GPU | Time | Key Metric | Signal |
|---|---|---|---|---|
| Idea 1 | GPU 0 | 45 min | +2.3% CE | POSITIVE |
| Idea 2 | GPU 1 | 30 min | -0.1% CE | NEGATIVE |
| Idea 3 | GPU 2 | 1.5 hr | +0.8% CE | WEAK POSITIVE |
undefinedundefined/idea-creator "direction" → ranked ideas
/novelty-check "top idea" → deep novelty verification (already done in Phase 4, but user can re-run)
/research-review "top idea" → external critical feedback
implement → write code
/run-experiment → deploy to GPU
/auto-review-loop → iterate until submission-ready/idea-creator "direction" → 排序后的想法
/novelty-check "top idea" → 深度新颖性验证(已在阶段4完成,但用户可重新运行)
/research-review "top idea" → 外部批判性反馈
implement → 编写代码
/run-experiment → 部署至GPU
/auto-review-loop → 迭代直至可提交