comprehend

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Comprehend

项目/代码库理解

Break large problems into pieces. Use a persistent REPL to store everything you learn. Fan out subagents to do the reading. Keep your own context window small.

That is the entire method. What follows is how to do it.

Priority: For any project-wide or codebase-wide analysis — "analyze this project", "survey this code", "study the repo", "read all the files", "understand this codebase", "review this repo", "explore this project", "map out the code", and similar requests — use this skill instead of launching Explore subagents. Explore subagents are designed for quick, targeted searches (find a file, locate a function). Whole-project analysis requires the measure-first workflow, persistent REPL, and fan-out patterns described here; Explore subagents will exhaust context or miss cross-file relationships.

将大型问题拆分为多个小模块。使用持久化REPL存储所有获取的信息。分派子代理（subagents）进行文件读取工作。保持自身的上下文窗口处于较小状态。

以上就是核心方法。以下是具体的实现步骤：

优先级说明： 对于任何项目级或代码库级的分析请求——比如“分析这个项目”、“调研这段代码”、“研究这个仓库”、“读取所有文件”、“理解这个代码库”、“评审这个仓库”、“探索这个项目”、“梳理代码结构”等类似请求——请使用此技能，而非启动Explore子代理。Explore子代理专为快速、有针对性的搜索设计（比如查找某个文件、定位某个函数）。而全项目分析需要用到本文所述的“先测量再处理”工作流、持久化REPL以及子代理扇出模式；使用Explore子代理会耗尽上下文窗口，或是遗漏跨文件的关联关系。

Prerequisites

前置条件

This skill requires python and nohup. Before doing anything else, verify they are available:

bash

python --version && nohup --version > /dev/null 2>&1 && echo "ok"

If this does not print

ok

, stop here. Tell the user which command is missing. On Windows, both are available in Git Bash. On macOS and Linux, both are standard.

使用此技能需要python和nohup。在执行任何操作前，请先验证这两个工具是否可用：

bash

python --version && nohup --version > /dev/null 2>&1 && echo "ok"

如果未输出

ok

，请立即停止操作，告知用户缺少哪个命令。在Windows系统中，这两个工具可在Git Bash中获取。在macOS和Linux系统中，它们属于标准预装工具。

Script Paths

脚本路径

This skill bundles scripts in its

scripts/

directory. Before using them, resolve the absolute path based on where you loaded this SKILL.md from. For example, if you loaded this file from

/home/user/project/skills/comprehend/SKILL.md

, then the scripts are at

/home/user/project/skills/comprehend/scripts/

Throughout this document,

SCRIPTS

refers to that resolved path. In all bash commands, substitute the actual absolute path.

此技能的脚本打包在

scripts/

目录下。使用前，请根据当前SKILL.md文件的加载路径解析出脚本的绝对路径。例如，如果此文件从

/home/user/project/skills/comprehend/SKILL.md

加载，那么脚本的路径就是

/home/user/project/skills/comprehend/scripts/

。

在本文档中，

SCRIPTS

指代上述解析后的绝对路径。在所有bash命令中，请替换为实际的绝对路径。

The REPL

REPL使用说明

Every comprehension session starts by generating a unique address and launching the server:

bash

undefined

每次理解会话开始时，先生成一个唯一地址并启动REPL服务器：

bash

undefined

Generate a session-unique address (prevents collisions between

生成会话唯一地址（避免同一机器上同时运行的会话发生冲突）

simultaneous sessions on the same machine)

—

REPL_ADDR=$(python SCRIPTS/repl_server.py --make-addr) nohup python SCRIPTS/repl_server.py "$REPL_ADDR" > /dev/null 2>&1 &


The server must outlive the shell that starts it. Use `nohup` and shell
backgrounding (`&`) as shown above. Do **not** use the Bash tool's
`run_in_background` parameter — it may kill the server when the task
"completes." On Windows, the server automatically uses TCP on localhost
instead of Unix sockets. No code changes needed — the interface is
identical.

Throughout this document, `REPL_ADDR` refers to the session-unique
address returned by `--make-addr`. In all bash commands, substitute the
actual path. **Each session must use its own address.**

This launches a persistent Python REPL. Variables, imports, and definitions
survive across calls -- not just during comprehension, but for the entire
session. The REPL is your memory: use it instead of reading files into
your context window.

**This is the key tradeoff.** The upfront cost is ceremony — launching a
server, passing addresses, writing structured results. The payoff comes
later: when the user asks follow-up questions about the code, you can
query `_comprehend_results` from the REPL instead of re-reading source
files. Every answer costs one small Bash call instead of consuming context
window. The REPL turns comprehension from a one-shot summary into a
persistent, queryable knowledge base for the rest of the conversation.

**Always use a heredoc to send code to the REPL.** Never pass code as a
positional command-line argument — it will break on quotes, braces, or
multi-line input. The only exceptions are `--vars` and `--shutdown`.

```bash

REPL_ADDR=$(python SCRIPTS/repl_server.py --make-addr) nohup python SCRIPTS/repl_server.py "$REPL_ADDR" > /dev/null 2>&1 &


服务器的生命周期必须长于启动它的shell进程。请按照上述示例使用`nohup`和shell后台运行符号（`&`）。**不要**使用Bash工具的`run_in_background`参数——当任务“完成”时，该参数可能会终止服务器进程。在Windows系统中，服务器会自动使用本地主机的TCP连接，而非Unix套接字，无需修改代码——接口保持一致。

在本文档中，`REPL_ADDR`指代`--make-addr`返回的会话唯一地址。在所有bash命令中，请替换为实际路径。**每个会话必须使用独立的地址。**

此命令会启动一个持久化Python REPL。变量、导入的模块和定义的内容会在多次调用之间保持持久化——不仅在理解过程中，而是在整个会话周期内。REPL就是你的“记忆库”：请使用它存储信息，而非将文件内容读取到自身的上下文窗口中。

**这是关键的权衡点**。前期需要一些准备工作——启动服务器、传递地址、写入结构化结果，但后续的回报是：当用户询问关于代码的后续问题时，你可以直接从REPL中查询`_comprehend_results`，而非重新读取源文件。每次回答仅需一次小型Bash调用，不会占用上下文窗口。REPL将单次的理解总结转化为整个对话过程中可持久查询的知识库。

**请始终使用here-doc向REPL发送代码**。切勿将代码作为位置命令行参数传递——这会在遇到引号、大括号或多行输入时出错。仅有的例外是`--vars`和`--shutdown`参数。

```bash

Run code (state persists between calls)

执行代码（状态在多次调用间保持持久）

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF' greeting_message = "hello" PYEOF

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF' print(greeting_message) PYEOF

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF' greeting_message = "hello" PYEOF

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF' print(greeting_message) PYEOF

See all stored variables

查看所有已存储的变量

python SCRIPTS/repl_client.py REPL_ADDR --vars


The quoted delimiter (`<<'PYEOF'`) passes all characters through to
Python unchanged — single quotes, double quotes, backslashes, parentheses,
braces, everything. This is the only safe way to send code to the REPL.

**Windows paths in heredocs:** Always use forward slashes in Python code
inside heredocs (`"C:/Users/..."` not `"C:\\Users\\..."`). Python accepts
forward slashes on all platforms. This avoids backslash-as-line-continuation
confusion.

**Use the REPL for everything.** Finding files, searching content, reading
source, storing results — all of it. Every fact you discover goes into a
variable where it accumulates instead of evaporating.

```bash
python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
import glob, os, re

python SCRIPTS/repl_client.py REPL_ADDR --vars


带引号的分隔符（`<<'PYEOF'`）会将所有字符原封不动地传递给Python——单引号、双引号、反斜杠、括号、大括号等所有内容。这是向REPL发送代码的唯一安全方式。

**Here-doc中的Windows路径：** 在here-doc内的Python代码中，请始终使用正斜杠（`"C:/Users/..."`而非`"C:\\Users\\..."`）。Python在所有平台上都支持正斜杠，这样可以避免反斜杠作为行延续符的混淆问题。

**所有操作都请通过REPL完成**。查找文件、搜索内容、读取源码、存储结果——全部操作都通过REPL进行。你发现的每一个信息都将存入变量，持续累积而非消失。

```bash
python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
import glob, os, re

Find files (instead of Glob tool)

查找文件（替代Glob工具）

project_source_files = glob.glob("/path/to/project/**/*.py", recursive=True) project_source_files = [f for f in project_source_files if "/.git/" not in f.replace("\", "/")]

Measure them (instead of wc)

统计文件大小（替代wc工具）

file_size_by_path = {f: os.path.getsize(f) for f in project_source_files} total_source_bytes = sum(file_size_by_path.values())

Search content (instead of Grep tool)

搜索内容（替代Grep工具）

function_definition_matches = {} for filepath in project_source_files: with open(filepath) as fh: for line_number, line_text in enumerate(fh, 1): if re.search(r"def process_", line_text): function_definition_matches.setdefault(filepath, []).append( (line_number, line_text.strip()))

Everything persists: project_source_files, file_size_by_path,

所有数据都会持久化：project_source_files、file_size_by_path、

total_source_bytes, function_definition_matches

total_source_bytes、function_definition_matches

print(f"{len(project_source_files)} files, {total_source_bytes/1024:.0f} KB, " f"{len(function_definition_matches)} files with matches") PYEOF

undefined

print(f"{len(project_source_files)} 个文件，总大小 {total_source_bytes/1024:.0f} KB，" f"{len(function_definition_matches)} 个文件包含匹配结果") PYEOF

undefined

The Results Dict

结果字典

All subagent findings go into one well-known dict: _comprehend_results
.

The REPL server initializes this dict automatically on startup. Do not re-initialize it — that would wipe results from other subagents. Just write to it:

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
_comprehend_results["auth_module_analysis"] = {"functions": [...], "issues": [...]}
PYEOF

The parent assigns every subagent a unique key before launching it. Subagents must never choose their own keys — the parent is the only one that sees all keys in use and can guarantee uniqueness. A subagent writes only to its assigned key; it never reads or writes other subagents' keys.

Keys should be descriptive:

'auth_module_function_signatures'

, not

'chunk1'

. For deeper nesting, use sub-keys within the assigned key:

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
_comprehend_results["auth_module"] = {
    "function_signatures": [...],
    "import_map": {...},
    "issues": [...]
}
PYEOF

The parent reads from

_comprehend_results[key]

. The underscore prefix and specific name avoid collisions with user or project variables.

所有子代理的分析结果都要存入一个固定的字典：_comprehend_results
。

REPL服务器启动时会自动初始化这个字典。请勿重新初始化它——这会清除其他子代理的结果。直接向其中写入内容即可：

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
_comprehend_results["auth_module_analysis"] = {"functions": [...], "issues": [...]}
PYEOF

父代理必须在启动子代理前为每个子代理分配唯一的键。子代理绝不能自行选择键——只有父代理能看到所有已使用的键，确保唯一性。子代理只能写入分配给自己的键；不得读取或写入其他子代理的键。

键名应具备描述性：比如

'auth_module_function_signatures'

，而非

'chunk1'

。如果需要更深层次的嵌套，可以在分配的键下使用子键：

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
_comprehend_results["auth_module"] = {
    "function_signatures": [...],
    "import_map": {...},
    "issues": [...]
}
PYEOF

父代理通过

_comprehend_results[key]

读取结果。下划线前缀和固定的名称可以避免与用户或项目变量发生冲突。

The Workflow

工作流

1. Measure

1. 测量

Before reading any files, measure everything you intend to read. Use the REPL (as above) or the bundled script:

bash

python SCRIPTS/chunk_text.py info <file>

Measure ALL files — source, tests, docs, config. The most common failure is measuring only the core source, classifying it as small, then also reading tests and docs and blowing past the limit.

在读取任何文件前，请先测量所有待读取文件的大小。可以使用REPL（如上述示例）或配套脚本：

bash

python SCRIPTS/chunk_text.py info <file>

请测量所有文件——源码、测试用例、文档、配置文件。最常见的失败场景是只测量核心源码，认为其体积较小，之后又读取测试用例和文档，导致总大小超出限制。

2. Choose a strategy

2. 选择策略

Total Size	Strategy
< 50KB	Read directly into REPL variables. No subagents needed.
50KB–200KB	Fan out subagents — one per file or file group, parallel.
200KB–1MB	Chunk + fan out + aggregate in REPL.
> 1MB	Two-level: chunk, fan out, aggregate chunks, synthesize.

总大小	策略
< 50KB	直接读取到REPL变量中，无需使用子代理。
50KB–200KB	分派子代理——每个文件或文件组对应一个子代理，并行执行。
200KB–1MB	分块 + 子代理扇出 + 在REPL中聚合结果。
> 1MB	两级处理：分块、子代理扇出、聚合分块结果、综合分析。

3. Announce

3. 告知用户

Tell the user what you found and what you plan to do. Example:

47 files, ~145KB total. Fanning out 4 parallel subagents: (1) core library, (2) test harness, (3) test cases, (4) docs + config. All results stored in
_comprehend_results
.

Do not read any files before this step.

向用户说明你发现的情况以及计划执行的操作。示例：

共47个文件，总大小约145KB。将并行分派4个子代理：(1) 核心库分析、(2) 测试框架分析、(3) 测试用例分析、(4) 文档与配置分析。所有结果将存储在
_comprehend_results
中。

在完成此步骤前，请勿读取任何文件。

4. Execute

4. 执行

Fan out subagents. Each writes to

_comprehend_results[key]

. You read the results back. Details are in the next section.

分派子代理。每个子代理将结果写入

_comprehend_results[key]

。你可以从REPL中读取这些结果。详细说明见下一节。

5. Iterate

5. 迭代

If the aggregated answer has gaps, target those specific areas for deeper analysis. The REPL still holds everything from the first pass.

如果聚合后的结果存在信息缺口，请针对这些特定区域进行更深入的分析。REPL会保留第一次分析的所有数据。

6. Answer from the REPL

6. 从REPL中返回答案

After comprehension, the REPL remains running. When the user asks follow-up questions, query

_comprehend_results

instead of re-reading source files. This keeps your main context window small and available for actual work — edits, debugging, new features — rather than filled with source code you've already analyzed.

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'

分析完成后，REPL会保持运行状态。当用户询问后续问题时，请查询

_comprehend_results

，而非重新读取源文件。这样可以保持主上下文窗口的空闲状态，用于实际工作——比如代码编辑、调试、新增功能——而非被已分析过的源码占用。

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'

Answer a specific question without re-reading any files

无需重新读取文件，直接回答特定问题

print(_comprehend_results["core_library"]["design_patterns"]) PYEOF


The REPL is not just a tool for the comprehension phase — it is the
*product* of the comprehension phase.

print(_comprehend_results["core_library"]["design_patterns"]) PYEOF


REPL不仅是理解阶段的工具——它也是理解阶段的**产出物**。

7. Shut down the REPL

7. 关闭REPL

When the user's comprehension questions are answered and the conversation moves on to other work (editing, debugging, new features), shut down the REPL:

bash

python SCRIPTS/repl_client.py REPL_ADDR --shutdown

The

nohup

server runs until explicitly stopped. Shut it down when comprehension is complete to avoid leaving an orphan process.

当用户的分析类问题已全部解答，对话转向其他工作（如编辑、调试、新增功能）时，请关闭REPL：

bash

python SCRIPTS/repl_client.py REPL_ADDR --shutdown

通过

nohup

启动的服务器会一直运行，直到被显式停止。分析完成后请关闭它，避免产生孤儿进程。

The 50KB Rule

50KB规则

Never read more than 50KB of source into your main context window. Everything above that limit must go through subagents that write findings to the REPL. The cost of a subagent is latency. The cost of context exhaustion is the entire rest of the session.

Watch for the trap: you measure 30KB of core source (under threshold!), then also read tests, config, and docs — now you're at 120KB and your context window is shot. Measure the total. All of it.

切勿将超过50KB的源码读取到主上下文窗口中。超出此限制的内容必须通过子代理处理，并将结果写入REPL。使用子代理的代价是少量延迟，而耗尽上下文窗口的代价则是整个会话无法正常进行。

请注意规避陷阱：你测量了30KB的核心源码（低于阈值！），之后又读取了测试用例、配置文件和文档——此时总大小达到120KB，上下文窗口已被耗尽。请测量所有文件的总大小。

Fan-Out Patterns

子代理扇出模式

All patterns follow the same contract:

Parent starts the REPL server (once per session).
```
_comprehend_results
```
is auto-initialized. Never re-initialize it.
Parent assigns a unique key to each subagent in the prompt. Subagents never pick their own keys.
Subagent writes findings only to its assigned
```
_comprehend_results[key]
```
. It may create sub-keys within that key freely, but must not touch any other top-level key.
Subagent reports back the key it wrote and a summary of what's in it.
Parent reads
```
_comprehend_results[key]
```
from the REPL.

Step 4 is critical. The Task tool returns a text message to the parent — that message is the only way the parent learns what the subagent stored. Every subagent prompt must end with an instruction to report what was written.

所有模式都遵循以下约定：

父代理启动REPL服务器（每个会话启动一次）。
```
_comprehend_results
```
会被自动初始化。切勿重新初始化它。
父代理为每个子代理分配唯一的键，并在提示语中告知。子代理绝不能自行选择键。
子代理只能将结果写入分配给自己的
```
_comprehend_results[key]
```
。它可以在该键下自由创建子键，但不得修改任何其他顶级键。
子代理必须返回报告，说明自己写入的键以及存储的内容摘要。
父代理从REPL中读取
```
_comprehend_results[key]
```
。

第4步至关重要。Task工具会向父代理返回一条文本消息——这是父代理了解子代理存储内容的唯一方式。每个子代理的提示语必须以“返回写入的键和内容摘要”的指令结尾。

Direct Query

直接查询

For simple one-shot tasks (summarize, classify, extract a fact). The subagent's return value is used directly — no REPL variable needed.

Task(subagent_type="Explore",
     prompt="Summarize the key functions in this file: <chunk>")

适用于简单的单次任务（如总结、分类、提取特定信息）。子代理的返回值可直接使用——无需存入REPL变量。

Task(subagent_type="Explore",
     prompt="Summarize the key functions in this file: <chunk>")

Recursive Query

递归查询

For sub-problems needing multi-step reasoning or tool access. The subagent stores results in

_comprehend_results

under a descriptive key.

Parent launches subagent:

Task(subagent_type="general-purpose",
     prompt="Use the REPL at REPL_ADDR. Read and analyze these modules.
     Store your findings in _comprehend_results['auth_module_analysis'] as a
     dict with keys:
       'function_signatures' — list of all public function signatures
       'import_dependency_map' — dict mapping each file to its imports
       'identified_concerns' — list of architectural or correctness issues

     Files: src/auth.py, src/models.py, src/tokens.py

     When done, reply with: the key you wrote to in _comprehend_results,
     what sub-keys you stored, and a one-line summary of each.")

Parent receives a message like: "Wrote to

_comprehend_results['auth_module_analysis']

with keys: 'function_signatures' (12 public functions), 'import_dependency_map' (3 files mapped), 'identified_concerns' (2 issues: circular import between auth.py and models.py, unused import in tokens.py)."

Parent reads back:

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
auth_data = _comprehend_results["auth_module_analysis"]
print("Functions:", auth_data["function_signatures"])
print("Concerns:", auth_data["identified_concerns"])
PYEOF

适用于需要多步推理或工具调用的子问题。子代理将结果存入

_comprehend_results

的指定描述性键中。

父代理启动子代理：

Task(subagent_type="general-purpose",
     prompt="使用地址为REPL_ADDR的REPL。读取并分析以下模块。
     将分析结果以字典形式存入_comprehend_results['auth_module_analysis']，字典需包含以下键：
       'function_signatures' — 所有公开函数签名的列表
       'import_dependency_map' — 映射每个文件与其依赖导入的字典
       'identified_concerns' — 架构或正确性问题的列表

     目标文件：src/auth.py, src/models.py, src/tokens.py

     完成后，请回复：你写入_comprehend_results的键名、存储的子键名，以及每个子键内容的一行摘要。")

父代理收到回复，示例：“已写入

_comprehend_results['auth_module_analysis']

，包含以下子键：'function_signatures'（12个公开函数）、'import_dependency_map'（3个文件的依赖映射）、'identified_concerns'（2个问题：auth.py与models.py之间存在循环导入，tokens.py存在未使用的导入）。”

父代理读取结果：

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
auth_data = _comprehend_results["auth_module_analysis"]
print("函数列表：", auth_data["function_signatures"])
print("问题列表：", auth_data["identified_concerns"])
PYEOF

Batched Parallel Query

批量并行查询

For independent chunks that can run concurrently. Issue all Task calls in a single message.

Parent launches all subagents at once:

Task(prompt="Use REPL at REPL_ADDR. Analyze this log segment for errors.
     Store in _comprehend_results['log_segment_hours_00_to_06'] as a dict with
     keys 'error_summary' and 'critical_error_list'.
     Segment: <chunk1>

     When done, reply with: the _comprehend_results key you wrote,
     how many errors found, and one sentence summarizing the most severe.")

Task(prompt="Use REPL at REPL_ADDR. Analyze this log segment for errors.
     Store in _comprehend_results['log_segment_hours_06_to_12'] as a dict with
     keys 'error_summary' and 'critical_error_list'.
     Segment: <chunk2>

     When done, reply with: the _comprehend_results key you wrote,
     how many errors found, and one sentence summarizing the most severe.")

Task(prompt="Use REPL at REPL_ADDR. Analyze this log segment for errors.
     Store in _comprehend_results['log_segment_hours_12_to_18'] as a dict with
     keys 'error_summary' and 'critical_error_list'.
     Segment: <chunk3>

     When done, reply with: the _comprehend_results key you wrote,
     how many errors found, and one sentence summarizing the most severe.")

Parent receives three messages confirming what each wrote.

Parent reads accumulated results:

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
for segment_key, segment_data in sorted(_comprehend_results.items()):
    if segment_key.startswith("log_segment_"):
        print(f"{segment_key}: {segment_data['error_summary']}")
        for critical_error in segment_data.get('critical_error_list', []):
            print(f"  - {critical_error}")
PYEOF

适用于可并发执行的独立模块。在一条消息中发起所有Task调用。

父代理同时启动所有子代理：

Task(prompt="使用地址为REPL_ADDR的REPL。分析此日志片段中的错误。
     将结果以字典形式存入_comprehend_results['log_segment_hours_00_to_06']，字典需包含'error_summary'和'critical_error_list'两个键。
     日志片段：<chunk1>

     完成后，请回复：你写入_comprehend_results的键名、发现的错误数量，以及最严重错误的一句摘要。")

Task(prompt="使用地址为REPL_ADDR的REPL。分析此日志片段中的错误。
     将结果以字典形式存入_comprehend_results['log_segment_hours_06_to_12']，字典需包含'error_summary'和'critical_error_list'两个键。
     日志片段：<chunk2>

     完成后，请回复：你写入_comprehend_results的键名、发现的错误数量，以及最严重错误的一句摘要。")

Task(prompt="使用地址为REPL_ADDR的REPL。分析此日志片段中的错误。
     将结果以字典形式存入_comprehend_results['log_segment_hours_12_to_18']，字典需包含'error_summary'和'critical_error_list'两个键。
     日志片段：<chunk3>

     完成后，请回复：你写入_comprehend_results的键名、发现的错误数量，以及最严重错误的一句摘要。")

父代理收到三条回复，分别确认每个子代理的写入内容。

父代理读取聚合后的结果：

bash

python SCRIPTS/repl_client.py REPL_ADDR <<'PYEOF'
for segment_key, segment_data in sorted(_comprehend_results.items()):
    if segment_key.startswith("log_segment_"):
        print(f"{segment_key}: {segment_data['error_summary']}")
        for critical_error in segment_data.get('critical_error_list', []):
            print(f"  - {critical_error}")
PYEOF

Chunking

分块处理

For large single files, use the bundled script to split at natural boundaries:

bash

python SCRIPTS/chunk_text.py info large_file.txt      # measure
python SCRIPTS/chunk_text.py boundaries source.py      # find split points
python SCRIPTS/chunk_text.py chunk large_file.txt --size 80000 --overlap 200  # split

For structured files (code, markdown), prefer splitting at functions, classes, or section headers rather than arbitrary character boundaries.

对于大型单个文件，可使用配套脚本在自然边界处拆分：

bash

python SCRIPTS/chunk_text.py info large_file.txt      # 测量文件大小
python SCRIPTS/chunk_text.py boundaries source.py      # 查找拆分点
python SCRIPTS/chunk_text.py chunk large_file.txt --size 80000 --overlap 200  # 拆分文件

对于结构化文件（如代码、markdown），优先按函数、类或章节标题拆分，而非按任意字符数拆分。

When NOT to Comprehend

无需使用此技能的场景

< 50KB total — Read into REPL variables directly. No subagents needed.
Global questions — "What is the overall theme?" needs the full picture. Summarize first (in chunks if needed), then analyze the summary whole.
Quick answers — When speed matters more than thoroughness.

总大小<50KB — 直接读取到REPL变量中，无需使用子代理。
全局主题类问题 — 比如“整体主题是什么？”需要完整的上下文。先对内容进行总结（如需可分块处理），再对总结内容进行整体分析。
快速响应需求 — 当速度比全面性更重要时。

References

参考资料

```
references/comprehension-patterns.md
```
— Five worked examples.
```
references/mapping-table.md
```
— RLM primitive to agent tool mapping.
```
references/rlm-system-prompt.md
```
— Theoretical foundation.

```
references/comprehension-patterns.md
```
— 5个实际案例。
```
references/mapping-table.md
```
— RLM原语与Agent工具的映射表。
```
references/rlm-system-prompt.md
```
— 理论基础文档。