graph

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Graph: Extraction, Analysis & Categorical Compression

图：提取、分析与分类压缩

Systematic extraction and analysis of entities, relationships, and ontological structures from unstructured text—enhanced with categorical metagraph compression enabling scale-invariant representation through structural equivalence, k-bisimulation summarization, and quotient constructions that preserve query-answering capabilities while achieving dramatic size reductions.

从非结构化文本中系统性提取和分析实体、关系与本体结构——通过分类元图压缩增强能力，借助结构等价（structural equivalence）、k-互模拟（k-bisimulation）摘要和商构造（quotient constructions）实现尺度不变的表示，在大幅缩减规模的同时保留查询应答能力。

Quick Start

快速开始

Basic Extraction

基础提取

Load schema: Read

/mnt/skills/user/knowledge-graph/schemas/core_ontology.md

for entity/relationship types

Extract entities and relationships using the schema as a guide

Format as JSON following

/mnt/skills/user/knowledge-graph/templates/extraction_template.md

Validate: Run validation script on extracted graph

加载 schema：读取

/mnt/skills/user/knowledge-graph/schemas/core_ontology.md

获取实体/关系类型

提取实体与关系：以schema为指导进行提取

格式化为JSON：遵循

/mnt/skills/user/knowledge-graph/templates/extraction_template.md

的规范

验证：对提取后的图运行验证脚本

Compression Workflow

压缩流程

bash

undefined

bash

undefined

1. Extract → validate → analyze topology

1. 提取 → 验证 → 拓扑分析

python scripts/validate_graph.py graph.json python scripts/analyze_graph.py graph.json --topology

2. Compute structural equivalence and compress

2. 计算结构等价并压缩

python scripts/compress_graph.py graph.json --method k-bisim --k 5

3. Verify query preservation

3. 验证查询保留能力

python scripts/verify_compression.py original.json compressed.json --queries reachability,pattern

undefined

python scripts/verify_compression.py original.json compressed.json --queries reachability,pattern

undefined

Theoretical Foundation

理论基础

The Compression Mechanism

压缩机制

Structural equivalence enables compression through a precise mechanistic chain:

Equivalence → Redundancy → Quotient → Preservation

Equivalence relations partition structures: Graph automorphisms and categorical isomorphisms identify structurally interchangeable elements—vertices with identical connection patterns to equivalent neighbors belong to the same automorphism orbit
Orbits represent information redundancy: For k vertices in one orbit, (k-1) are informationally redundant since they encode the same structural relationships
Quotient constructions eliminate redundancy: Categorical quotients collapse equivalence classes to single representatives while the universal property guarantees any construction respecting the equivalence factors uniquely through the compressed representation
Functors preserve structure across scales: The quotient functor Q: C → C/R is full and bijective on objects—no essential categorical information is lost

结构等价通过精准的机制链实现压缩：

等价 → 冗余 → 商 → 保留

等价关系划分结构：图自同构（Automorphism）和分类同构识别结构上可互换的元素——与等价邻居具有相同连接模式的顶点属于同一自同构轨道
轨道代表信息冗余：对于同一轨道中的k个顶点，*(k-1)*个是信息冗余的，因为它们编码了相同的结构关系
商构造消除冗余：分类商将等价类折叠为单个代表，同时泛性质保证任何尊重等价性的构造都能唯一通过压缩表示实现
函子跨尺度保留结构：商函子Q: C → C/R是满的且在对象上是双射的——不会丢失任何关键分类信息

Quantitative Foundation

量化基础

The connection between automorphisms and Kolmogorov complexity:

K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)

Graphs with large automorphism groups have lower complexity because only one representative from each orbit needs encoding. For highly symmetric structures, compression can reach n/log n factor.

自同构与Kolmogorov复杂度的关联：

K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)

具有大自同构群的图复杂度更低，因为每个轨道只需编码一个代表。对于高度对称的结构，压缩比可达n/log n。

Why This Matters for Knowledge Graphs

为何对知识图谱重要

Knowledge graphs exhibit natural structural regularities:

Pattern	Compression Mechanism	Typical Reduction
Type hierarchies	Automorphism orbits	40-60%
Repeated subgraphs	k-bisimulation equivalence	50-80%
Community structure	Block quotients	30-50%
Self-similar patterns	Scale-invariant quotients	60-95%

知识图谱呈现自然的结构规律性：

模式	压缩机制	典型缩减率
类型层次	自同构轨道	40-60%
重复子图	k-互模拟等价	50-80%
社区结构	块商	30-50%
自相似模式	尺度不变商	60-95%

Core Capabilities

核心能力

1. Structured Entity Extraction

1. 结构化实体提取

Extract entities with confidence scores, provenance tracking, and property attribution:

Entity types: Person, Organization, Concept, Event, Document, Technology, Location
Confidence scoring: 0.0-1.0 scale based on evidence clarity
Provenance metadata: Source document, location, timestamp
Alias tracking: Capture all name variations

Key principle: Every extraction must include confidence score and source tracking for auditability.

提取带有置信度得分、来源追踪和属性归因的实体：

实体类型：Person（人物）、Organization（组织）、Concept（概念）、Event（事件）、Document（文档）、Technology（技术）、Location（地点）
置信度评分：基于证据清晰度的0.0-1.0评分体系
来源元数据：源文档、位置、时间戳
别名追踪：捕获所有名称变体

核心原则：每一项提取都必须包含置信度得分和来源追踪，以满足可审计性要求。

2. Relationship Mapping

2. 关系映射

Identify and classify relationships between entities:

Core relationships: WORKS_FOR, AFFILIATED_WITH, RELATED_TO, AUTHORED, CITES, USES, LOCATED_IN, IMPLEMENTS
Domain-specific relationships: Load schemas from
```
/mnt/skills/user/knowledge-graph/schemas/
```
Bidirectional awareness: Track relationship directionality
Property attribution: Capture relationship metadata (dates, roles, contexts)

识别并分类实体间的关系：

核心关系：WORKS_FOR（供职于）、AFFILIATED_WITH（关联于）、RELATED_TO（相关于）、AUTHORED（著有）、CITES（引用）、USES（使用）、LOCATED_IN（位于）、IMPLEMENTS（实现）

领域特定关系：从

/mnt/skills/user/knowledge-graph/schemas/

加载schema

双向感知：追踪关系的方向性
属性归因：捕获关系元数据（日期、角色、上下文）

3. Domain-Specific Schemas

3. 领域特定Schema

General domains use

core_ontology.md

Coding/software domains additionally use

coding_domain.md

which adds:

CodeEntity, Repository, API, Library, Architecture, Bug types
DEPENDS_ON, CALLS, INHERITS_FROM, FIXES, DEPLOYED_ON relationships
Language-specific extraction patterns

通用领域使用

core_ontology.md

编码/软件领域额外使用

coding_domain.md

，新增内容包括：

CodeEntity（代码实体）、Repository（仓库）、API、Library（库）、Architecture（架构）、Bug（缺陷）类型
DEPENDS_ON（依赖于）、CALLS（调用）、INHERITS_FROM（继承自）、FIXES（修复）、DEPLOYED_ON（部署于）关系
特定语言的提取模式

4. Structural Equivalence Analysis

4. 结构等价分析

Identify and exploit structural redundancy through automorphism detection:

Automorphism-Based Compression:

python

undefined

通过自同构检测识别并利用结构冗余：

基于自同构的压缩:

python

undefined

Compute automorphism group

计算自同构群

aut_group = compute_automorphisms(graph) orbits = partition_by_orbits(graph.nodes, aut_group)

Each orbit → single representative

每个轨道 → 单个代表

compressed_nodes = [orbit.canonical_representative() for orbit in orbits] compression_ratio = len(graph.nodes) / len(compressed_nodes)


**Equivalence Types**:
- **Structural equivalence**: Identical connection patterns (strictest)
- **Regular equivalence**: Same relationship *types* to equivalent alters
- **Automorphic equivalence**: Permutable without changing structure

compressed_nodes = [orbit.canonical_representative() for orbit in orbits] compression_ratio = len(graph.nodes) / len(compressed_nodes)


**等价类型**:
- **结构等价**：完全相同的连接模式（最严格）
- **正则等价**：与等价节点具有相同关系*类型*
- **自同构等价**：可置换且不改变结构

5. k-Bisimulation Summarization

5. k-互模拟摘要

Compress graphs while preserving query semantics using k-bisimulation:

Definition: Two nodes are k-bisimilar if they have:

Same labels
Same edge types to k-bisimilar neighbors
This property holds recursively to depth k

Implementation:

bash

python scripts/compress_graph.py graph.json \
  --method k-bisim \
  --k 5 \                    # k=5 sufficient for most graphs
  --preserve-queries reachability,pattern

Empirical Results:

k > 5 yields minimal additional partition refinement
Achieves 95% reduction for reachability queries
Achieves 57% reduction for pattern matching
Incremental update cost: O(Δ·d^k) where Δ is changes, d is max degree

使用k-bisimulation压缩图同时保留查询语义：

定义：两个节点是k-互模拟的当且仅当：

具有相同标签
与k-互模拟邻居具有相同边类型
该特性递归适用于深度k

实现:

bash

python scripts/compress_graph.py graph.json \
  --method k-bisim \
  --k 5 \                    # k=5对大多数图已足够
  --preserve-queries reachability,pattern

实证结果:

k>5时分区细化的增益极小
可达性查询可实现95%的缩减
模式匹配可实现57%的缩减
增量更新成本：O(Δ·d^k)，其中Δ为变更数，d为最大度数

6. Categorical Quotient Construction

6. 分类商构造

Apply category-theoretic compression with provable structure preservation:

The Universal Property Guarantee:

For any quotient Q: C → C/R, if H: C → D is any functor such that H(f) = H(g) whenever f ~ g in R, then H factors uniquely as H = H' ∘ Q.

This unique factorization means the quotient is the "freest" (most compressed) object respecting the equivalence—any construction built on the original that respects the equivalence can be equivalently built on the quotient.

Skeleton Construction:

python

undefined

应用分类理论压缩并保证结构保留：

泛性质保证:

对于任何商Q: C → C/R，如果H: C → D是任意函子，且当f ~ g在R中时H(f) = H(g)，则H可唯一分解为H = H' ∘ Q。

这种唯一分解意味着商是尊重等价性的“最自由”（压缩程度最高）的对象——任何基于原图且尊重等价性的构造都可等价地基于压缩图实现。

骨架构造:

python

undefined

Every category is equivalent to its skeleton

每个分类都等价于其骨架

skeleton = compute_skeleton(category)

skeleton contains exactly one representative per isomorphism class

骨架包含每个同构类的恰好一个代表

All categorical properties preserved (limits, colimits, exactness)

所有分类属性均被保留（极限、余极限、正合性）

undefined

undefined

7. Metagraph Hierarchical Modeling

7. 元图层次建模

Support edge-of-edge structures for multi-scale representation:

Metagraph Definition: MG = ⟨V, MV, E, ME⟩

V: vertices
MV: metavertices (each containing an embedded metagraph fragment)
E: edges connecting sets of vertices
ME: metaedges connecting vertices, edges, or both

Why Metagraphs Enable Scale Invariance:

The edge-of-edge capability creates holonic structure—self-similar patterns where the relationship between a metavertex and its contents mirrors the relationship between the entire metagraph and its top-level components. Automorphisms operate at multiple levels simultaneously, creating compression opportunities at each scale when these automorphism structures are isomorphic across levels.

2-Category Interpretation:

0-cells: vertices/elements
1-morphisms: edges connecting sets
2-morphisms: metaedges relating edges

The interchange law ensures scale-independent composition.

支持边-边结构以实现多尺度表示：

元图定义：MG = ⟨V, MV, E, ME⟩

V：顶点
MV：元顶点（每个包含嵌入的元图片段）
E：连接顶点集的边
ME：连接顶点、边或两者的元边

元图为何支持尺度不变性:

边-边能力创造了整体结构——自相似模式中，元顶点与其内容的关系，与整个元图与其顶层组件的关系是镜像的。自同构同时在多个层级运作，当这些自同构结构跨层级同构时，每个尺度都存在压缩机会。

2-分类解释:

0-胞元：顶点/元素
1-态射：连接集合的边
2-态射：关联边的元边

交换律确保尺度无关的组合。

8. Topology Metrics & Quality Validation

8. 拓扑指标与质量验证

Graph Quality Metrics:

Metric	Formula	Target	Significance
Edge-to-Node Ratio	\|E\|/\|V\|	≥4:1	Enables emergence through dense connectivity
Isolation Rate	\|V_isolated\|/\|V\|	<20%	Measures integration completeness
Clustering Coefficient	Local triangles/possible triangles	>0.3	Small-world property indicator
Fractal Dimension	d_B from box-covering	Finite	Self-similarity/compressibility
Average Path Length	Mean geodesic distance	Low	Information flow efficiency

Scale-Invariance Indicators:

N_B(l_B) ~ l_B^(-d_B)

Networks with finite fractal dimension d_B are self-similar and can be compressed at multiple resolutions with compression ratio scaling as l^(d_B).

Validation Script:

bash

python scripts/validate_graph.py graph.json --topology --compression-potential

图质量指标:

指标	公式	目标	意义
边-节点比率	\|E\|/\|V\|	≥4:1	通过密集连接实现涌现性
孤立率	\|V_isolated\|/\|V\|	<20%	衡量集成完整性
聚类系数	局部三角形数/可能的三角形数	>0.3	小世界特性指标
分形维数	盒覆盖法得到的d_B	有限	自相似性/可压缩性
平均路径长度	平均测地距离	低	信息流效率

尺度不变性指标:

N_B(l_B) ~ l_B^(-d_B)

具有有限分形维数d_B的网络是自相似的，可在多个分辨率下压缩，压缩比随l^(d_B)缩放。

验证脚本:

bash

python scripts/validate_graph.py graph.json --topology --compression-potential

9. Information-Theoretic Analysis

9. 信息论分析

Structural Entropy:

H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)

The term -n·log(n) represents compression gain from removing label information.

Minimum Description Length (MDL):

For graph G and model M:

L(G,M) = L(M) + L(G|M)

Optimal compression minimizes this total description length. Community structure reduces entropy by ~k·log(n) bits for k communities.

Compressibility Predictors:

High transitivity → higher compressibility
Degree heterogeneity → higher compressibility
Hierarchical structure → enables predictable transitions, lower entropy rates

结构熵:

H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)

-n·log(n)项代表移除标签信息带来的压缩增益。

最小描述长度（MDL）:

对于图G和模型M:

L(G,M) = L(M) + L(G|M)

最优压缩最小化总描述长度。社区结构可减少约k·log(n)比特的熵，其中k为社区数量。

可压缩性预测因子:

高传递性 → 更高可压缩性
度数异质性 → 更高可压缩性
层次结构 → 可预测的转换，更低的熵率

Extraction Guidelines

提取指南

Confidence Scoring Rules

置信度评分规则

Score	Criteria	Example
0.9-1.0	Explicitly stated with clear evidence	"Dr. Jane Smith works for MIT"
0.7-0.89	Strongly implied by context	Person with @mit.edu email
0.5-0.69	Reasonably inferred but ambiguous	Co-authorship implies collaboration
0.3-0.49	Weak inference, requires validation	Similar domain suggests relationship
0.0-0.29	Speculative, likely incorrect	Pure assumption

得分	标准	示例
0.9-1.0	有明确证据的明确表述	"Dr. Jane Smith works for MIT"（简·史密斯博士供职于麻省理工学院）
0.7-0.89	上下文强烈暗示	带有@mit.edu邮箱的人物
0.5-0.69	合理推断但存在歧义	合著关系暗示协作
0.3-0.49	弱推断，需验证	领域相似暗示关系
0.0-0.29	推测性，可能不正确	纯粹假设

ID Generation Strategy

ID生成策略

Create stable, meaningful identifiers:

Format:

{type}_{normalized_name}

(e.g.,

person_jane_smith

org_mit

)

Normalization: Lowercase, replace spaces with underscores, remove special chars
Uniqueness: Add numeric suffix if collision occurs
Stability: Same entity in different documents should generate same ID

创建稳定、有意义的标识符：

格式：

{type}_{normalized_name}

（例如

person_jane_smith

、

org_mit

）

规范化：小写，空格替换为下划线，移除特殊字符
唯一性：发生冲突时添加数字后缀
稳定性：不同文档中的同一实体应生成相同ID

Provenance Best Practices

来源最佳实践

Always include:

```
source_document
```
: Document ID or filename
```
source_location
```
: Page number, section, line range
```
extraction_timestamp
```
: ISO 8601 format
```
extractor_version
```
: Skill version identifier

始终包含：

```
source_document
```
：文档ID或文件名
```
source_location
```
：页码、章节、行范围
```
extraction_timestamp
```
：ISO 8601格式
```
extractor_version
```
：技能版本标识符

Advanced Workflows

高级工作流

Compression Pipeline

压缩流水线

bash

undefined

bash

undefined

1. Initial extraction

1. 初始提取

(Extract to graph.json)

（提取到graph.json）

2. Validate and analyze

2. 验证与分析

python scripts/validate_graph.py graph.json python scripts/analyze_graph.py graph.json --full

3. Compute structural equivalence

3. 计算结构等价

python scripts/structural_equivalence.py graph.json
--output equivalence_classes.json
--method automorphism

4. Apply k-bisimulation compression

4. 应用k-互模拟压缩

python scripts/compress_graph.py graph.json
--equivalence equivalence_classes.json
--method k-bisim --k 5
--output compressed.json

5. Verify preservation

5. 验证保留能力

python scripts/verify_compression.py graph.json compressed.json
--queries reachability,pattern,neighborhood

6. Generate topology report

6. 生成拓扑报告

python scripts/topology_metrics.py compressed.json --report

undefined

python scripts/topology_metrics.py compressed.json --report

undefined

Iterative Refinement with Compression

基于压缩的迭代细化

Initial extraction: Broad pass capturing entities/relationships
Topology analysis: Compute |E|/|V| ratio, isolation rate, clustering
Compression analysis: Identify automorphism orbits, k-bisimilar classes
Strategic refinement: Focus on:
- Central concepts with weak connections
- Isolated high-confidence entities
- Low-compression-potential regions (may need restructuring)
Compress and validate: Apply quotient construction, verify query preservation
Repeat: Continue until quality thresholds met AND compression ratio stabilizes

Termination criteria:

Isolation rate < 20%
|E|/|V| ratio ≥ 4:1
Compression ratio improvement < 5% between iterations
Query preservation verified

初始提取：广泛捕获实体/关系的初步处理
拓扑分析：计算|E|/|V|比率、孤立率、聚类系数
压缩分析：识别自同构轨道、k-互模拟类
策略性细化：重点关注：
- 连接薄弱的核心概念
- 孤立的高置信度实体
- 低压缩潜力区域（可能需要重构）
压缩与验证：应用商构造，验证查询保留能力
重复：持续迭代直到满足质量阈值且压缩比稳定

终止条件:

孤立率 <20%
|E|/|V|比率 ≥4:1
迭代间压缩比提升 <5%
查询保留能力已验证

Multi-Scale Metagraph Construction

多尺度元图构造

For complex domains requiring hierarchical representation:

bash

undefined

对于需要层次表示的复杂领域：

bash

undefined

1. Extract at multiple granularities

1. 多粒度提取

python scripts/extract_hierarchical.py source.txt
--levels strategic,tactical,operational
--output metagraph.json

2. Compute cross-level automorphisms

2. 计算跨层级自同构

python scripts/metagraph_automorphisms.py metagraph.json

3. Apply scale-invariant compression

3. 应用尺度不变压缩

python scripts/compress_metagraph.py metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json

undefined

python scripts/compress_metagraph.py metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json

undefined

Common Patterns

常见模式

Pattern: Query-Preserving Compression

模式：保留查询的压缩

Compress while guaranteeing specific query types remain answerable:

python

undefined

压缩同时保证特定查询类型可应答：

python

undefined

Define query preservation requirements

定义查询保留要求

queries = { "reachability": True, # 95% reduction possible "pattern_match": True, # 57% reduction possible
"neighborhood_k": 3, # Preserve 3-hop neighborhoods }

queries = { "reachability": True, # 可实现95%缩减 "pattern_match": True, # 可实现57%缩减
"neighborhood_k": 3, # 保留3跳邻居 }

Compress with guarantees

带保证的压缩

compressed = compress_with_guarantees( graph, method="k-bisimulation", k=max(5, queries["neighborhood_k"]), preserve=queries )

undefined

compressed = compress_with_guarantees( graph, method="k-bisimulation", k=max(5, queries["neighborhood_k"]), preserve=queries )

undefined

Pattern: Incremental Compression Maintenance

模式：增量压缩维护

Maintain compression as graph evolves:

python

undefined

随图演化维护压缩状态：

python

undefined

Update cost: O(Δ·d^k)

更新成本：O(Δ·d^k)

Δ = number of changes

Δ = 变更数

d = maximum degree

d = 最大度数

k = bisimulation depth

k = 互模拟深度

def update_compression(compressed_graph, changes): affected_classes = identify_affected_equivalence_classes(changes) recompute_local_bisimulation(affected_classes, k=5) return updated_compressed_graph

undefined

undefined

Pattern: Categorical Ontology Integration

模式：分类本体集成

Use ologs (ontology logs) for categorical knowledge representation:

python

undefined

使用ologs（本体日志）进行分类知识表示：

python

undefined

Olog: category where objects = noun phrases, morphisms = verb phrases

Olog：对象=名词短语，态射=动词短语的分类

olog = { "objects": ["a person", "an organization", "a concept"], "morphisms": [ {"source": "a person", "target": "an organization", "label": "works for"}, {"source": "a concept", "target": "a concept", "label": "relates to"} ] }

Yoneda embedding: object determined by morphisms into it

米田嵌入：对象由到它的态射决定

Compression: store relationships, not internal structure

压缩：存储关系而非内部结构

undefined

undefined

Error Handling

错误处理

Compression Quality Issues

压缩质量问题

When compression produces unexpected results:

Over-compression: Raise k value in k-bisimulation (default k=5)
Under-compression: Check for missing type labels, inconsistent schemas
Query degradation: Verify query type is in preservation set
Scale-invariance failure: Check for unbalanced hierarchical structure

当压缩产生意外结果时：

过度压缩：提高k-互模拟中的k值（默认k=5）
压缩不足：检查是否缺少类型标签、schema不一致
查询退化：验证查询类型是否在保留集合中
尺度不变性失效：检查层次结构是否失衡

Topology Violations

拓扑违规

When graph metrics fall outside targets:

|E|/|V| < 4: Graph too sparse—identify disconnected concepts, add bridging relationships
Isolation > 20%: Too many orphan nodes—run connectivity analysis
Clustering < 0.3: Lacks small-world property—add local triangulation

当图指标超出目标范围时：

|E|/|V| <4：图过于稀疏——识别断开的概念，添加桥接关系
孤立率>20%：孤立节点过多——运行连通性分析
聚类系数<0.3：缺乏小世界特性——添加局部三角化

File Structure

文件结构

/mnt/skills/user/knowledge-graph/
├── SKILL.md                    # This file
├── schemas/
│   ├── core_ontology.md        # Universal entity/relationship types
│   ├── coding_domain.md        # Software development extension
│   └── categorical_ontology.md # Category-theoretic type system
├── templates/
│   ├── extraction_template.md  # JSON format specification
│   └── metagraph_template.md   # Hierarchical metagraph format
└── scripts/
    ├── validate_graph.py       # Quality validation
    ├── merge_graphs.py         # Deduplication and merging
    ├── analyze_graph.py        # Refinement strategy generation
    ├── compress_graph.py       # k-bisimulation compression
    ├── structural_equivalence.py # Automorphism computation
    ├── topology_metrics.py     # Graph topology analysis
    └── verify_compression.py   # Query preservation verification

/mnt/skills/user/knowledge-graph/
├── SKILL.md                    # 本文档
├── schemas/
│   ├── core_ontology.md        # 通用实体/关系类型
│   ├── coding_domain.md        # 软件开发扩展
│   └── categorical_ontology.md # 分类理论类型系统
├── templates/
│   ├── extraction_template.md  # JSON格式规范
│   └── metagraph_template.md   # 层次元图格式
└── scripts/
    ├── validate_graph.py       # 质量验证
    ├── merge_graphs.py         # 去重与合并
    ├── analyze_graph.py        # 细化策略生成
    ├── compress_graph.py       # k-互模拟压缩
    ├── structural_equivalence.py # 自同构计算
    ├── topology_metrics.py     # 图拓扑分析
    └── verify_compression.py   # 查询保留验证

Dependencies

依赖

All scripts require Python 3.7+ with standard library only (no external packages for core functionality). Optional NetworkX for advanced topology metrics.

所有脚本需要Python 3.7+，仅依赖标准库（核心功能无需外部包）。可选NetworkX用于高级拓扑指标计算。

Best Practices Summary

最佳实践总结

Always start with schema: Load appropriate ontology before extraction
Include confidence scores: Never omit—use 0.5 if uncertain
Track provenance: Every entity/relationship needs source metadata
Validate early: Run validation after each extraction
Analyze topology: Check |E|/|V| ratio before refinement
Compress strategically: Use k=5 for k-bisimulation (sufficient for most graphs)
Preserve queries: Specify which query types must remain answerable
Iterate with metrics: Let topology and compression metrics guide improvement

始终从schema开始：提取前加载合适的本体
包含置信度得分：绝不可省略——不确定时使用0.5
追踪来源：每个实体/关系都需要来源元数据
尽早验证：每次提取后运行验证
分析拓扑：细化前检查|E|/|V|比率
策略性压缩：k-互模拟使用k=5（对大多数图已足够）
保留查询：指定必须保留的查询类型
基于指标迭代：让拓扑和压缩指标指导改进

Integration with Other Skills

与其他技能的集成

This skill composes naturally with:

hierarchical-reasoning: Strategic→tactical→operational maps to metagraph levels
obsidian-markdown: Compressed graphs export as linked note structures
knowledge-orchestrator: Automatic routing for extraction→compression→documentation workflows
infranodus-orchestrator: Text network analysis → k-bisimulation compression

本技能可与以下技能自然组合：

hierarchical-reasoning（层次推理）：战略→战术→操作层级映射到元图层级
obsidian-markdown：压缩图可导出为链接笔记结构
knowledge-orchestrator（知识编排器）：自动路由提取→压缩→文档工作流
infranodus-orchestrator：文本网络分析 → k-互模拟压缩

Hierarchical Reasoning Integration

层次推理集成

yaml

mapping:
  strategic_level: metagraph_level_0
  tactical_level: metagraph_level_1  
  operational_level: metagraph_level_2
  convergence_metrics: compression_ratio, query_preservation

yaml

mapping:
  strategic_level: metagraph_level_0
  tactical_level: metagraph_level_1  
  operational_level: metagraph_level_2
  convergence_metrics: compression_ratio, query_preservation

Evaluation Criteria

评估标准

A high-quality extraction with compression demonstrates:

Completeness: Major entities and relationships captured
Accuracy: High confidence scores (avg >0.7) and validated
Connectivity: |E|/|V| ≥ 4:1, isolation <20%
Compressibility: Achieves ≥50% reduction via k-bisimulation
Preservation: Specified queries remain answerable post-compression
Scale-invariance: Finite fractal dimension for hierarchical structures

Core Philosophy: Knowledge graphs emerge through iterative refinement—initial extraction establishes structure, topology analysis reveals density gaps, structural equivalence enables compression, and categorical quotients preserve essential relationships while eliminating redundancy. The compression is "lossy but structure-preserving" because categorical equivalence guarantees that compressed representations support all the same inferences as their originals.

高质量的提取与压缩应满足：

完整性：捕获主要实体与关系
准确性：高置信度得分（平均>0.7）且已验证
连通性：|E|/|V|≥4:1，孤立率<20%
可压缩性：通过k-互模拟实现≥50%的缩减
保留性：指定查询在压缩后仍可应答
尺度不变性：层次结构具有有限分形维数

核心理念：知识图谱通过迭代细化涌现——初始提取建立结构，拓扑分析揭示密度缺口，结构等价实现压缩，分类商在消除冗余的同时保留关键关系。这种压缩是“有损但结构保留”的，因为分类等价保证压缩表示支持与原图完全相同的推理。