nsfc-length-aligner

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

nsfc-length-aligner

目标：把“篇幅”从主观感觉变成可量化、可闭环的指标，并围绕预算（budget）指导扩写/压缩。

Goal: Turn "page length" from a subjective feeling into a quantifiable, closed-loop indicator, and guide expansion/compression based on budget.

适用场景

Applicable Scenarios

你有一份国自然标书，想快速判断是否“某些部分偏短/偏长”
你需要按模板的硬性篇幅要求（页数/字数/字符数）对齐
你希望尽量不改变原意地扩写或压缩（保持论证主线与证据链）

You have an NSFC proposal and want to quickly determine if "certain sections are too short/too long"
You need to align with the mandatory page length requirements (page count/word count/character count) of the template
You want to expand or compress content while preserving the original meaning as much as possible (maintaining the argumentation mainline and evidence chain)

不适用场景

Inapplicable Scenarios

仅需要“统计字数”而不关心预算与改写闭环（可用更简单的脚本即可）
标书不在本地（无法提供文本/文件/路径）

Only need to "count words" without caring about budget and rewrite closed-loop (a simpler script can be used)
The proposal is not local (cannot provide text/files/paths)

工作流（强烈建议按顺序执行）

Workflow (Strongly Recommended to Follow in Order)

1) 需求确认（预算口径）

1) Requirement Confirmation (Budget Caliber)

先确认你要对齐的“硬标准”是什么：

2026 调研共识的“黄金比例”（面上/青基 C 类，供校对用）：立项依据 30%（6–10 页，约 8000–10000 字）/ 研究内容 50%（12–15 页，约 12000–15000 字）/ 研究基础 20%（5–8 页，约 5000–6000 字）；合计建议 ≤28 页留缓冲（原则上不超过 30 页）
页数（硬约束）：2026+ 改版后“原则上不超过 30 页”，实操建议 ≤28 页留缓冲；不要通过缩小字体/行距“挤页数”
字符预算（代理指标）：中文字符 / 总字符等，用于“改写→复检”的确定性闭环（页数最终以 PDF 复核）
预算范围：总篇幅 + 各部分/关键章节预算（至少覆盖：立项依据/研究内容/研究基础）

说明：本 skill 默认使用

config.yaml:length_standard

的示例口径（已对齐 2026 调研建议）。你应按当年指南/模板校对后再使用。

First confirm what the "hard standards" you need to align with are:

2026 research consensus "golden ratio" (for General/Young Investigator Category C, for proofreading): Rationale for the Project 30% (6–10 pages, approx. 8000–10000 words) / Research Content 50% (12–15 pages, approx. 12000–15000 words) / Research Basis 20% (5–8 pages, approx. 5000–6000 words); total recommended ≤28 pages for buffer (principally no more than 30 pages)
Page count (hard constraint): After the 2026+ revision, "principally no more than 30 pages"; practical suggestion: ≤28 pages for buffer; do not "squeeze pages" by reducing font size/line spacing
Character budget (proxy indicator): Chinese characters / total characters, used for the deterministic closed-loop of "rewrite → recheck" (page count must be verified with the final PDF)
Budget scope: Total length + budget for each section/key chapter (at least covering: Rationale for the Project, Research Content, Research Basis)

Note: This skill defaults to using the sample caliber in
config.yaml:length_standard
(aligned with 2026 research recommendations). You should verify with the annual guidelines/templates before use.

2) 运行篇幅检查（确定性）

2) Run Length Check (Deterministic)

对目标标书目录（或单文件）运行检查脚本，生成报告：

bash

python3 scripts/check_length.py --input <目标标书路径> --config config.yaml

如果你的标书基于

NSFC_Young

NSFC_General

模板（项目根目录包含

main.tex

），建议把

--input

指向项目根目录：脚本会自动沿

main.tex

的

\input/\include

依赖树收集“实际会编译进 PDF 的文件”，并忽略被注释掉的

\input{...}

（避免把可选章节误计入篇幅）。

如果你已编译出最终 PDF（推荐；页数是硬约束），把 PDF 一并传入做页数统计：

bash

python3 scripts/check_length.py --input <目标标书路径> --config config.yaml --pdf <标书.pdf>

输出：

控制台摘要（总篇幅、超/欠预算项）

<input>/_artifacts/nsfc-length-aligner/length_report.md

（默认输出目录；可用

--out-dir

自定义）

<input>/_artifacts/nsfc-length-aligner/length_report.json

（默认输出目录；可用

--out-dir

自定义）

注意：如果你的

<input>

目录不可写（例如你把模板仓库设为只读），请务必用

--out-dir

指向可写位置。

运行完成后，必须读取

length_report.md

（必要时辅助读取

length_report.json

），将“文件级偏差表 +（可选）章节级统计”作为步骤 3 的输入。

Run the check script on the target proposal directory (or single file) to generate a report:

bash

python3 scripts/check_length.py --input <target proposal path> --config config.yaml

If your proposal is based on the

NSFC_Young

NSFC_General

template (the project root directory contains

main.tex

), it is recommended to point

--input

to the project root directory: The script will automatically collect "files that will actually be compiled into the PDF" along the

\input/\include

dependency tree of

main.tex

, and ignore commented-out

\input{...}

(to avoid counting optional chapters by mistake).

If you have compiled the final PDF (recommended; page count is a hard constraint), pass the PDF along to count pages:

bash

python3 scripts/check_length.py --input <target proposal path> --config config.yaml --pdf <proposal.pdf>

Output:

Console summary (total length, items over/under budget)

<input>/_artifacts/nsfc-length-aligner/length_report.md

(default output directory; customizable with

--out-dir

)

<input>/_artifacts/nsfc-length-aligner/length_report.json

(default output directory; customizable with

--out-dir

)

Note: If your

<input>

directory is not writable (e.g., you set the template repository as read-only), be sure to use

--out-dir

to point to a writable location.

After running, must read

length_report.md

(assisted by

length_report.json

if necessary), and use the "file-level deviation table + (optional) section-level statistics" as input for Step 3.

3) 解读差距（差在什么地方）

3) Interpret Gaps (Where the Gaps Lie)

基于报告做 3 件事：

定位“超长/偏短”的文件或章节
判断差距属于：
- 证据链不足（需要补数据/对照/局限）
- 逻辑跳跃（需要补过渡/定义/假设）
- 冗余重复（需要合并/删减）
生成行动清单（扩写/压缩的优先级）

章节级数据用法（更精准定位）：

若
```
length_report.md
```
出现章节表格（或 JSON 中存在
```
sections
```
字段），优先在“超长/偏短”的文件内，定位到贡献最大的具体章节，再做定点改写，而不是只在文件级做平均删改
当某个文件超长/偏短时：对比其章节统计，若差距主要集中在 1–2 个章节，优先只改这 1–2 节（更容易保持原意与结构稳定）

参考：

references/MEANING_PRESERVING_REWRITE_RUBRIC.md

Do 3 things based on the report:

Locate files or chapters that are "too long/too short"
Determine the type of gap:
- Insufficient evidence chain (needs supplementary data/controls/limitations)
- Logical jumps (needs supplementary transitions/definitions/hypotheses)
- Redundancy and repetition (needs merging/deletion)
Generate an action list (priority for expansion/compression)

Section-level data usage (more precise positioning):

If a chapter table appears in
```
length_report.md
```
(or the
```
sections
```
field exists in JSON), prioritize locating the specific chapters contributing the most within the "too long/too short" files, then perform targeted rewrites instead of only making average deletions/expansions at the file level
When a file is too long/too short: Compare its section statistics; if the gap is mainly concentrated in 1–2 chapters, prioritize modifying only those 1–2 sections (easier to preserve original meaning and structural stability)

Reference:

references/MEANING_PRESERVING_REWRITE_RUBRIC.md

4) 扩写/压缩（尽量不改变原意）

4) Expand/Compress (Preserve Original Meaning as Much as Possible)

扩写策略（偏短时）

Expansion Strategies (When Too Short)

先补“可验证信息密度”：定义、假设、对照、消融、风险与备选方案
再补“论证闭环”：为什么做 → 怎么做 → 预期怎么验证 → 失败怎么办
避免空泛扩写：不引入新主张、不堆形容词

First supplement "verifiable information density": definitions, hypotheses, controls, ablation studies, risks and alternative solutions
Then supplement "argumentation closed-loop": Why do it → How to do it → How to verify expectations → What to do if it fails
Avoid vague expansion: Do not introduce new claims, do not stack adjectives

压缩策略（偏长时）

Compression Strategies (When Too Long)

去重复：同一论点只保留一次最强表达
去背景：把泛背景压成 1-2 句，把篇幅留给“问题-方法-验证”
结构化改写：把长段拆成要点（不改变事实顺序）

⚠️ 改写完成后，必须执行步骤 5 复检，确认偏差已消除。未复检视为未完成。

Remove repetitions: Keep only the strongest expression for the same argument
Cut background: Condense general background into 1-2 sentences, allocate more space to "problem-method-verification"
Structured rewriting: Split long paragraphs into bullet points (without changing the order of facts)

⚠️ After rewriting, must perform Step 5 recheck to confirm that gaps have been eliminated. Failure to recheck is considered incomplete.

2026 三部分“该瘦/该厚”清单（用于排优先级）

2026 "Trim/Enrich" List for Three Core Sections (for Priority Setting)

用法（把“静态建议”变成“按差距触发”）：

先看报告里对应文件的偏差
```
delta
```
：
```
+N
```
表示超长（优先“该瘦”）；
```
-N
```
表示偏短（优先“该厚”）；
```
OK
```
表示该部分无需为了预算而改动
```
delta
```
的绝对值越大，越优先处理；处理顺序建议：先改
```
|delta|
```
最大的文件，再做次大项

立项依据（为什么做）：

该瘦：教科书式科普、泛化综述、弱相关“国家需求”铺陈、重复意义、文献凑数
该厚：Gap（卡点）→ Key Idea（突破口）→ 价值论证（为什么值得做）

研究内容（做什么/怎么做）：

该瘦：重复表述、过细操作细节、罗列式方法堆砌
该厚：逻辑框架、关键实验设计与对照/消融、预期结果与可验证指标、用图说话

研究基础（为什么你能做）：

该瘦：无关成果堆砌、过度铺垫背景
该厚：强相关预实验数据、核心技术能力、平台条件（与研究内容对位）

Usage (turn "static suggestions" into "gap-triggered actions"):

First check the deviation
```
delta
```
of the corresponding file in the report:
```
+N
```
means over-length (prioritize "trim");
```
-N
```
means under-length (prioritize "enrich");
```
OK
```
means no changes needed for budget reasons
The larger the absolute value of
```
delta
```
, the higher the priority; recommended processing order: first modify files with the largest
```
|delta|
```
, then the next largest

Rationale for the Project (Why do it):

Trim: Textbook-style popular science, generalized literature reviews, weakly relevant "national needs" elaboration, repeated significance, filler literature
Enrich: Gap (bottleneck) → Key Idea (breakthrough) → Value argumentation (why it is worth doing)

Research Content (What to do/How to do it):

Trim: Repetitive statements, overly detailed operational details, list-style method stacking
Enrich: Logical framework, key experimental design with controls/ablation studies, expected results and verifiable indicators, use figures to illustrate points

Research Basis (Why you can do it):

Trim: Unrelated achievement stacking, excessive background setup
Enrich: Strongly relevant pre-experimental data, core technical capabilities, platform conditions (aligned with research content)

5) 复检闭环

5) Recheck for Closed Loop

改完必须再次运行脚本，确认“达标且不超标”：

bash

python3 scripts/check_length.py --input <目标标书路径> --config config.yaml

After revisions, must run the script again to confirm "meets standards and does not exceed limits":

bash

python3 scripts/check_length.py --input <target proposal path> --config config.yaml

格式红线（2026+ 常见）

Format Red Lines (Common for 2026+)

不缩小字体、不缩小行距来“挤页数”（页数要求是评审风险点）
不顶格写到 30 页：建议 ≤28 页留缓冲
若当年指南要求声明生成式 AI 使用情况：务必按要求如实说明（合规项）

Do not reduce font size or line spacing to "squeeze pages" (page count requirements are a review risk point)
Do not fill up to exactly 30 pages: Recommended ≤28 pages for buffer
If the annual guidelines require declaring generative AI usage: Be sure to truthfully explain as required (compliance item)

约定与输出格式

Agreements and Output Formats

报告以“文件级 +（可选）章节级”呈现
预算以
```
config.yaml:length_standard
```
为唯一真相来源
所有改写应遵循“最小改动、保持原意”的准则（见 references）

Reports are presented at "file-level + (optional) section-level"
Budget takes
```
config.yaml:length_standard
```
as the sole source of truth
All rewrites should follow the principle of "minimal changes, preserve original meaning" (see references)