nsfc-length-aligner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

nsfc-length-aligner

nsfc-length-aligner

目标:把“篇幅”从主观感觉变成可量化、可闭环的指标,并围绕预算(budget)指导扩写/压缩。
Goal: Turn "page length" from a subjective feeling into a quantifiable, closed-loop indicator, and guide expansion/compression based on budget.

适用场景

Applicable Scenarios

  • 你有一份国自然标书,想快速判断是否“某些部分偏短/偏长”
  • 你需要按模板的硬性篇幅要求(页数/字数/字符数)对齐
  • 你希望尽量不改变原意地扩写或压缩(保持论证主线与证据链)
  • You have an NSFC proposal and want to quickly determine if "certain sections are too short/too long"
  • You need to align with the mandatory page length requirements (page count/word count/character count) of the template
  • You want to expand or compress content while preserving the original meaning as much as possible (maintaining the argumentation mainline and evidence chain)

不适用场景

Inapplicable Scenarios

  • 仅需要“统计字数”而不关心预算与改写闭环(可用更简单的脚本即可)
  • 标书不在本地(无法提供文本/文件/路径)
  • Only need to "count words" without caring about budget and rewrite closed-loop (a simpler script can be used)
  • The proposal is not local (cannot provide text/files/paths)

工作流(强烈建议按顺序执行)

Workflow (Strongly Recommended to Follow in Order)

1) 需求确认(预算口径)

1) Requirement Confirmation (Budget Caliber)

先确认你要对齐的“硬标准”是什么:
  • 2026 调研共识的“黄金比例”(面上/青基 C 类,供校对用):立项依据 30%(6–10 页,约 8000–10000 字)/ 研究内容 50%(12–15 页,约 12000–15000 字)/ 研究基础 20%(5–8 页,约 5000–6000 字);合计建议 ≤28 页留缓冲(原则上不超过 30 页)
  • 页数(硬约束):2026+ 改版后“原则上不超过 30 页”,实操建议 ≤28 页留缓冲;不要通过缩小字体/行距“挤页数”
  • 字符预算(代理指标):中文字符 / 总字符等,用于“改写→复检”的确定性闭环(页数最终以 PDF 复核)
  • 预算范围:总篇幅 + 各部分/关键章节预算(至少覆盖:立项依据/研究内容/研究基础)
说明:本 skill 默认使用
config.yaml:length_standard
示例口径(已对齐 2026 调研建议)。你应按当年指南/模板校对后再使用。
First confirm what the "hard standards" you need to align with are:
  • 2026 research consensus "golden ratio" (for General/Young Investigator Category C, for proofreading): Rationale for the Project 30% (6–10 pages, approx. 8000–10000 words) / Research Content 50% (12–15 pages, approx. 12000–15000 words) / Research Basis 20% (5–8 pages, approx. 5000–6000 words); total recommended ≤28 pages for buffer (principally no more than 30 pages)
  • Page count (hard constraint): After the 2026+ revision, "principally no more than 30 pages"; practical suggestion: ≤28 pages for buffer; do not "squeeze pages" by reducing font size/line spacing
  • Character budget (proxy indicator): Chinese characters / total characters, used for the deterministic closed-loop of "rewrite → recheck" (page count must be verified with the final PDF)
  • Budget scope: Total length + budget for each section/key chapter (at least covering: Rationale for the Project, Research Content, Research Basis)
Note: This skill defaults to using the sample caliber in
config.yaml:length_standard
(aligned with 2026 research recommendations)
. You should verify with the annual guidelines/templates before use.

2) 运行篇幅检查(确定性)

2) Run Length Check (Deterministic)

对目标标书目录(或单文件)运行检查脚本,生成报告:
bash
python3 scripts/check_length.py --input <目标标书路径> --config config.yaml
如果你的标书基于
NSFC_Young
/
NSFC_General
模板(项目根目录包含
main.tex
),建议把
--input
指向项目根目录:脚本会自动沿
main.tex
\input/\include
依赖树收集“实际会编译进 PDF 的文件”,并忽略被注释掉的
\input{...}
(避免把可选章节误计入篇幅)。
如果你已编译出最终 PDF(推荐;页数是硬约束),把 PDF 一并传入做页数统计:
bash
python3 scripts/check_length.py --input <目标标书路径> --config config.yaml --pdf <标书.pdf>
输出:
  • 控制台摘要(总篇幅、超/欠预算项)
  • <input>/_artifacts/nsfc-length-aligner/length_report.md
    (默认输出目录;可用
    --out-dir
    自定义)
  • <input>/_artifacts/nsfc-length-aligner/length_report.json
    (默认输出目录;可用
    --out-dir
    自定义)
注意:如果你的
<input>
目录不可写(例如你把模板仓库设为只读),请务必用
--out-dir
指向可写位置。
运行完成后,必须读取
length_report.md
(必要时辅助读取
length_report.json
),将“文件级偏差表 +(可选)章节级统计”作为步骤 3 的输入。
Run the check script on the target proposal directory (or single file) to generate a report:
bash
python3 scripts/check_length.py --input <target proposal path> --config config.yaml
If your proposal is based on the
NSFC_Young
/
NSFC_General
template (the project root directory contains
main.tex
), it is recommended to point
--input
to the project root directory: The script will automatically collect "files that will actually be compiled into the PDF" along the
\input/\include
dependency tree of
main.tex
, and ignore commented-out
\input{...}
(to avoid counting optional chapters by mistake).
If you have compiled the final PDF (recommended; page count is a hard constraint), pass the PDF along to count pages:
bash
python3 scripts/check_length.py --input <target proposal path> --config config.yaml --pdf <proposal.pdf>
Output:
  • Console summary (total length, items over/under budget)
  • <input>/_artifacts/nsfc-length-aligner/length_report.md
    (default output directory; customizable with
    --out-dir
    )
  • <input>/_artifacts/nsfc-length-aligner/length_report.json
    (default output directory; customizable with
    --out-dir
    )
Note: If your
<input>
directory is not writable (e.g., you set the template repository as read-only), be sure to use
--out-dir
to point to a writable location.
After running, must read
length_report.md
(assisted by
length_report.json
if necessary), and use the "file-level deviation table + (optional) section-level statistics" as input for Step 3.

3) 解读差距(差在什么地方)

3) Interpret Gaps (Where the Gaps Lie)

基于报告做 3 件事:
  1. 定位“超长/偏短”的文件或章节
  2. 判断差距属于:
    • 证据链不足(需要补数据/对照/局限)
    • 逻辑跳跃(需要补过渡/定义/假设)
    • 冗余重复(需要合并/删减)
  3. 生成行动清单(扩写/压缩的优先级)
章节级数据用法(更精准定位):
  • length_report.md
    出现章节表格(或 JSON 中存在
    sections
    字段),优先在“超长/偏短”的文件内,定位到贡献最大的具体章节,再做定点改写,而不是只在文件级做平均删改
  • 当某个文件超长/偏短时:对比其章节统计,若差距主要集中在 1–2 个章节,优先只改这 1–2 节(更容易保持原意与结构稳定)
参考:
references/MEANING_PRESERVING_REWRITE_RUBRIC.md
Do 3 things based on the report:
  1. Locate files or chapters that are "too long/too short"
  2. Determine the type of gap:
    • Insufficient evidence chain (needs supplementary data/controls/limitations)
    • Logical jumps (needs supplementary transitions/definitions/hypotheses)
    • Redundancy and repetition (needs merging/deletion)
  3. Generate an action list (priority for expansion/compression)
Section-level data usage (more precise positioning):
  • If a chapter table appears in
    length_report.md
    (or the
    sections
    field exists in JSON), prioritize locating the specific chapters contributing the most within the "too long/too short" files, then perform targeted rewrites instead of only making average deletions/expansions at the file level
  • When a file is too long/too short: Compare its section statistics; if the gap is mainly concentrated in 1–2 chapters, prioritize modifying only those 1–2 sections (easier to preserve original meaning and structural stability)
Reference:
references/MEANING_PRESERVING_REWRITE_RUBRIC.md

4) 扩写/压缩(尽量不改变原意)

4) Expand/Compress (Preserve Original Meaning as Much as Possible)

扩写策略(偏短时)

Expansion Strategies (When Too Short)

  • 先补“可验证信息密度”:定义、假设、对照、消融、风险与备选方案
  • 再补“论证闭环”:为什么做 → 怎么做 → 预期怎么验证 → 失败怎么办
  • 避免空泛扩写:不引入新主张、不堆形容词
  • First supplement "verifiable information density": definitions, hypotheses, controls, ablation studies, risks and alternative solutions
  • Then supplement "argumentation closed-loop": Why do it → How to do it → How to verify expectations → What to do if it fails
  • Avoid vague expansion: Do not introduce new claims, do not stack adjectives

压缩策略(偏长时)

Compression Strategies (When Too Long)

  • 去重复:同一论点只保留一次最强表达
  • 去背景:把泛背景压成 1-2 句,把篇幅留给“问题-方法-验证”
  • 结构化改写:把长段拆成要点(不改变事实顺序)
⚠️ 改写完成后,必须执行步骤 5 复检,确认偏差已消除。未复检视为未完成。
  • Remove repetitions: Keep only the strongest expression for the same argument
  • Cut background: Condense general background into 1-2 sentences, allocate more space to "problem-method-verification"
  • Structured rewriting: Split long paragraphs into bullet points (without changing the order of facts)
⚠️ After rewriting, must perform Step 5 recheck to confirm that gaps have been eliminated. Failure to recheck is considered incomplete.

2026 三部分“该瘦/该厚”清单(用于排优先级)

2026 "Trim/Enrich" List for Three Core Sections (for Priority Setting)

用法(把“静态建议”变成“按差距触发”):
  • 先看报告里对应文件的偏差
    delta
    +N
    表示超长(优先“该瘦”);
    -N
    表示偏短(优先“该厚”);
    OK
    表示该部分无需为了预算而改动
  • delta
    的绝对值越大,越优先处理;处理顺序建议:先改
    |delta|
    最大的文件,再做次大项
立项依据(为什么做):
  • 该瘦:教科书式科普、泛化综述、弱相关“国家需求”铺陈、重复意义、文献凑数
  • 该厚:Gap(卡点)→ Key Idea(突破口)→ 价值论证(为什么值得做)
研究内容(做什么/怎么做):
  • 该瘦:重复表述、过细操作细节、罗列式方法堆砌
  • 该厚:逻辑框架、关键实验设计与对照/消融、预期结果与可验证指标、用图说话
研究基础(为什么你能做):
  • 该瘦:无关成果堆砌、过度铺垫背景
  • 该厚:强相关预实验数据、核心技术能力、平台条件(与研究内容对位)
Usage (turn "static suggestions" into "gap-triggered actions"):
  • First check the deviation
    delta
    of the corresponding file in the report:
    +N
    means over-length (prioritize "trim");
    -N
    means under-length (prioritize "enrich");
    OK
    means no changes needed for budget reasons
  • The larger the absolute value of
    delta
    , the higher the priority; recommended processing order: first modify files with the largest
    |delta|
    , then the next largest
Rationale for the Project (Why do it):
  • Trim: Textbook-style popular science, generalized literature reviews, weakly relevant "national needs" elaboration, repeated significance, filler literature
  • Enrich: Gap (bottleneck) → Key Idea (breakthrough) → Value argumentation (why it is worth doing)
Research Content (What to do/How to do it):
  • Trim: Repetitive statements, overly detailed operational details, list-style method stacking
  • Enrich: Logical framework, key experimental design with controls/ablation studies, expected results and verifiable indicators, use figures to illustrate points
Research Basis (Why you can do it):
  • Trim: Unrelated achievement stacking, excessive background setup
  • Enrich: Strongly relevant pre-experimental data, core technical capabilities, platform conditions (aligned with research content)

5) 复检闭环

5) Recheck for Closed Loop

改完必须再次运行脚本,确认“达标且不超标”:
bash
python3 scripts/check_length.py --input <目标标书路径> --config config.yaml
After revisions, must run the script again to confirm "meets standards and does not exceed limits":
bash
python3 scripts/check_length.py --input <target proposal path> --config config.yaml

格式红线(2026+ 常见)

Format Red Lines (Common for 2026+)

  • 不缩小字体、不缩小行距来“挤页数”(页数要求是评审风险点)
  • 不顶格写到 30 页:建议 ≤28 页留缓冲
  • 若当年指南要求声明生成式 AI 使用情况:务必按要求如实说明(合规项)
  • Do not reduce font size or line spacing to "squeeze pages" (page count requirements are a review risk point)
  • Do not fill up to exactly 30 pages: Recommended ≤28 pages for buffer
  • If the annual guidelines require declaring generative AI usage: Be sure to truthfully explain as required (compliance item)

约定与输出格式

Agreements and Output Formats

  • 报告以“文件级 +(可选)章节级”呈现
  • 预算以
    config.yaml:length_standard
    为唯一真相来源
  • 所有改写应遵循“最小改动、保持原意”的准则(见 references)
  • Reports are presented at "file-level + (optional) section-level"
  • Budget takes
    config.yaml:length_standard
    as the sole source of truth
  • All rewrites should follow the principle of "minimal changes, preserve original meaning" (see references)