paper-notes

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paper Notes

论文笔记

Produce consistent, searchable paper notes that later steps (claims, visuals, writing) can reliably synthesize.
This is still NO PROSE: keep notes as bullets / short fields, not narrative paragraphs.
生成一致、可检索的论文笔记,供后续步骤(论点撰写、可视化制作、正文写作)可靠地整合使用。
注意:禁止使用散文形式:笔记需以项目符号/简短字段呈现,而非叙事段落。

Role cards (prompt-level guidance)

角色卡片(提示级指导)

  • Close Reader
    • Mission: extract what is specific and checkable (setup, method, metrics, limits).
    • Do: name concrete tasks/benchmarks and what the paper actually measures.
    • Avoid: generic summary boilerplate that could fit any paper.
  • Results Recorder
    • Mission: capture evaluation anchors that later writing needs.
    • Do: record task + metric + constraints (budget/tool access) whenever available.
    • Avoid: copying numbers without the evaluation setting that makes them meaningful.
  • Limitation Logger
    • Mission: capture the caveats that change interpretation.
    • Do: write paper-specific limitations (protocol mismatch, missing ablations, threat model gaps).
    • Avoid: repeated generic limitations like “may not generalize” without specifics.
  • 精读者
    • 目标:提取具体且可核查的内容(实验设置、方法、指标、局限性)。
    • 要做:明确写出具体任务/基准,以及论文实际测量的内容。
    • 避免:使用可套用于任何论文的通用摘要模板。
  • 结果记录者
    • 目标:记录后续写作所需的评估锚点。
    • 要做:尽可能记录任务+指标+约束条件(预算/工具权限)。
    • 避免:仅复制数字,却忽略赋予其意义的评估设置。
  • 局限性记录者
    • 目标:记录会影响解读的注意事项。
    • 要做:撰写针对该论文的具体局限性(如协议不匹配、缺少消融实验、威胁模型存在漏洞)。
    • 避免:重复使用“可能不具备泛化性”这类无具体内容的通用局限性描述。

When to use

适用时机

  • After you have a core set (and ideally a mapping) and need evidence-ready notes.
  • Before writing a survey draft.
  • 已拥有核心论文集(理想情况下还有映射表),且需要准备好可作为证据的笔记时。
  • 撰写综述初稿之前。

Inputs

输入项

  • papers/core_set.csv
  • Optional:
    outline/mapping.tsv
    (to prioritize)
  • Optional:
    papers/fulltext_index.jsonl
    +
    papers/fulltext/*.txt
    (if running in fulltext mode)
  • papers/core_set.csv
  • 可选:
    outline/mapping.tsv
    (用于确定优先级)
  • 可选:
    papers/fulltext_index.jsonl
    +
    papers/fulltext/*.txt
    (若运行在全文模式下)

Outputs

输出项

  • papers/paper_notes.jsonl
    (JSONL; one record per paper)
  • papers/evidence_bank.jsonl
    (JSONL; addressable evidence snippets derived from notes; A150++ target: >=7 items/paper on average)
  • papers/paper_notes.jsonl
    (JSONL格式;每篇论文对应一条记录)
  • papers/evidence_bank.jsonl
    (JSONL格式;从笔记中提取的可定位证据片段;A150++标准目标:平均每篇论文至少7条片段)

Decision: evidence depth

决策:证据深度

  • If you have extracted text (
    papers/fulltext/*.txt
    ) → enrich key papers using fulltext snippets and set
    evidence_level: "fulltext"
    .
  • If you only have abstracts (default) → keep long-tail notes abstract-level, but still fully enrich high-priority papers (see below).
  • 若已提取全文文本(
    papers/fulltext/*.txt
    )→ 使用全文片段补充关键论文的内容,并设置
    evidence_level: "fulltext"
  • 若仅拥有摘要(默认情况)→ 长尾论文的笔记保持摘要层面即可,但仍需完整补充高优先级论文的内容(见下文)。

Workflow (heuristic)

工作流程(启发式)

Uses:
outline/mapping.tsv
,
papers/fulltext_index.jsonl
.
  1. Ensure coverage: every
    paper_id
    in
    papers/core_set.csv
    must have one JSONL record.
  2. Use mapping to choose high-priority papers:
    • heavily reused across subsections
    • pinned classics (ReAct/Toolformer/Reflexion… if in scope)
  3. For high-priority papers, capture:
    • 3–6 summary bullets (what’s new, what problem setting, what’s the loop)
    • method
      (mechanism and architecture; what differs from baselines)
    • key_results
      (benchmarks/metrics; include numbers if available)
    • limitations
      (specific assumptions/failure modes; avoid generic boilerplate)
  4. For long-tail papers:
    • keep summary bullets short (abstract-derived is OK)
    • still include at least one limitation, but make it specific when possible
  5. Assign a stable
    bibkey
    for each paper for citation generation.
使用工具:
outline/mapping.tsv
papers/fulltext_index.jsonl
  1. 覆盖性检查
    papers/core_set.csv
    中的每个
    paper_id
    必须在JSONL文件中有对应的记录。
  2. 利用映射表选择高优先级论文
    • 在多个子章节中被大量引用的论文
    • 经典标杆论文(如ReAct/Toolformer/Reflexion…若在研究范围内)
  3. 针对高优先级论文,需记录:
    • 3-6条摘要要点(创新点、问题场景、核心循环逻辑)
    • method
      (机制与架构;与基线方法的差异)
    • key_results
      (基准测试/指标;若有数据请包含具体数值)
    • limitations
      (具体假设/失效模式;避免通用模板内容)
  4. 针对长尾论文:
    • 摘要要点需简洁(基于摘要生成即可)
    • 仍需至少包含一条局限性描述,尽可能具体化
  5. 为每篇论文分配一个固定的
    bibkey
    ,用于生成引用。

Quality checklist

质量检查清单

  • Coverage: every
    paper_id
    in
    papers/core_set.csv
    appears in
    papers/paper_notes.jsonl
    .
  • High-priority papers have non-
    TODO
    method/results/limitations.
  • Limitations are not copy-pasted across many papers.
  • evidence_level
    is set correctly (
    abstract
    vs
    fulltext
    ).
  • Evidence bank:
    papers/evidence_bank.jsonl
    exists and is dense enough for A150++ (>=7 items/paper on average).
  • 覆盖性:
    papers/core_set.csv
    中的每个
    paper_id
    都已出现在
    papers/paper_notes.jsonl
    中。
  • 高优先级论文的方法/结果/局限性字段无
    TODO
    占位符。
  • 局限性描述未在多篇论文中重复复制。
  • evidence_level
    设置正确(
    abstract
    fulltext
    )。
  • 证据库:
    papers/evidence_bank.jsonl
    已生成,且密度达到A150++标准(平均每篇论文至少7条片段)。

Helper script (optional)

辅助脚本(可选)

Quick Start

快速开始

  • python .codex/skills/paper-notes/scripts/run.py --help
  • python .codex/skills/paper-notes/scripts/run.py --workspace <workspace_dir>
  • python .codex/skills/paper-notes/scripts/run.py --help
  • python .codex/skills/paper-notes/scripts/run.py --workspace <workspace_dir>

All Options

所有选项

  • See
    --help
    (this helper is intentionally minimal)
  • 查看
    --help
    (此辅助工具设计得尽量精简)

Examples

示例

  • Generate notes, then optionally enrich
    priority=high
    papers:
    • Run the helper once, then refine
      papers/paper_notes.jsonl
      (e.g., add full-text details for key papers and diversify limitations).
  • 生成笔记后,可选择性补充
    priority=high
    的论文内容:
    • 运行一次辅助工具,然后优化
      papers/paper_notes.jsonl
      (例如,为关键论文添加全文细节,丰富局限性描述的多样性)。

Notes

注意事项

  • The helper writes deterministic metadata/abstract-level notes and marks key papers with
    priority=high
    .
  • In
    pipeline.py --strict
    it will be blocked if high-priority notes are incomplete (missing method/key_results/limitations) or contain placeholders.
  • 该辅助工具会生成确定性的元数据/摘要级笔记,并将关键论文标记为
    priority=high
  • pipeline.py --strict
    模式下,若高优先级笔记不完整(缺少方法/关键结果/局限性)或包含占位符,流程将被阻断。

Troubleshooting

故障排除

Common Issues

常见问题

Issue: High-priority notes still look like scaffolds

问题:高优先级笔记仍为框架内容

Symptom:
  • Quality gate reports missing
    method/key_results
    or
    TODO
    placeholders.
Causes:
  • Notes were generated from abstracts only; key papers weren’t enriched.
Solutions:
  • Fully enrich
    priority=high
    papers:
    method
    , ≥1
    key_results
    , ≥3
    summary_bullets
    , ≥1 concrete
    limitations
    .
  • If you need full text evidence, run
    pdf-text-extractor
    in
    fulltext
    mode for key papers.
症状
  • 质量检查提示缺少
    method/key_results
    字段或存在
    TODO
    占位符。
原因
  • 笔记仅基于摘要生成,关键论文未补充全文内容。
解决方案
  • 完整补充
    priority=high
    论文的内容:包含
    method
    、至少1条
    key_results
    、至少3条
    summary_bullets
    、至少1条具体的
    limitations
  • 若需全文证据,为关键论文在
    fulltext
    模式下运行
    pdf-text-extractor

Issue: Repeated limitations across many papers

问题:多篇论文的局限性描述重复

Symptom:
  • Quality gate reports repeated limitation boilerplate.
Causes:
  • Copy-pasted limitations instead of paper-specific failure modes/assumptions.
Solutions:
  • Replace boilerplate with paper-specific limitations (setup, data, evaluation gaps, failure cases).
症状
  • 质量检查提示存在重复的局限性模板内容。
原因
  • 直接复制粘贴局限性描述,未针对各论文的具体失效模式/假设进行撰写。
解决方案
  • 用针对论文的具体局限性描述替代模板内容(如实验设置、数据、评估漏洞、失效案例等)。

Recovery Checklist

恢复检查清单

  • papers/paper_notes.jsonl
    covers all
    papers/core_set.csv
    paper_ids.
  • ≥80% of
    priority=high
    notes satisfy method/results/limitations completeness.
  • No
    TODO
    remains in high-priority notes.
  • papers/paper_notes.jsonl
    已覆盖
    papers/core_set.csv
    中的所有paper_id。
  • 至少80%的
    priority=high
    笔记的方法/结果/局限性字段完整。
  • 高优先级笔记中无
    TODO
    占位符残留。