graph
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGraph: Extraction, Analysis & Categorical Compression
图:提取、分析与分类压缩
Systematic extraction and analysis of entities, relationships, and ontological structures from unstructured text—enhanced with categorical metagraph compression enabling scale-invariant representation through structural equivalence, k-bisimulation summarization, and quotient constructions that preserve query-answering capabilities while achieving dramatic size reductions.
从非结构化文本中系统性提取和分析实体、关系与本体结构——通过分类元图压缩增强能力,借助结构等价(structural equivalence)、k-互模拟(k-bisimulation)摘要和商构造(quotient constructions)实现尺度不变的表示,在大幅缩减规模的同时保留查询应答能力。
Quick Start
快速开始
Basic Extraction
基础提取
- Load schema: Read for entity/relationship types
/mnt/skills/user/knowledge-graph/schemas/core_ontology.md - Extract entities and relationships using the schema as a guide
- Format as JSON following
/mnt/skills/user/knowledge-graph/templates/extraction_template.md - Validate: Run validation script on extracted graph
- 加载 schema:读取获取实体/关系类型
/mnt/skills/user/knowledge-graph/schemas/core_ontology.md - 提取实体与关系:以schema为指导进行提取
- 格式化为JSON:遵循的规范
/mnt/skills/user/knowledge-graph/templates/extraction_template.md - 验证:对提取后的图运行验证脚本
Compression Workflow
压缩流程
bash
undefinedbash
undefined1. Extract → validate → analyze topology
1. 提取 → 验证 → 拓扑分析
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --topology
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --topology
2. Compute structural equivalence and compress
2. 计算结构等价并压缩
python scripts/compress_graph.py graph.json --method k-bisim --k 5
python scripts/compress_graph.py graph.json --method k-bisim --k 5
3. Verify query preservation
3. 验证查询保留能力
python scripts/verify_compression.py original.json compressed.json --queries reachability,pattern
undefinedpython scripts/verify_compression.py original.json compressed.json --queries reachability,pattern
undefinedTheoretical Foundation
理论基础
The Compression Mechanism
压缩机制
Structural equivalence enables compression through a precise mechanistic chain:
Equivalence → Redundancy → Quotient → Preservation
-
Equivalence relations partition structures: Graph automorphisms and categorical isomorphisms identify structurally interchangeable elements—vertices with identical connection patterns to equivalent neighbors belong to the same automorphism orbit
-
Orbits represent information redundancy: For k vertices in one orbit, (k-1) are informationally redundant since they encode the same structural relationships
-
Quotient constructions eliminate redundancy: Categorical quotients collapse equivalence classes to single representatives while the universal property guarantees any construction respecting the equivalence factors uniquely through the compressed representation
-
Functors preserve structure across scales: The quotient functor Q: C → C/R is full and bijective on objects—no essential categorical information is lost
结构等价通过精准的机制链实现压缩:
等价 → 冗余 → 商 → 保留
-
等价关系划分结构:图自同构(Automorphism)和分类同构识别结构上可互换的元素——与等价邻居具有相同连接模式的顶点属于同一自同构轨道
-
轨道代表信息冗余:对于同一轨道中的k个顶点,*(k-1)*个是信息冗余的,因为它们编码了相同的结构关系
-
商构造消除冗余:分类商将等价类折叠为单个代表,同时泛性质保证任何尊重等价性的构造都能唯一通过压缩表示实现
-
函子跨尺度保留结构:商函子Q: C → C/R是满的且在对象上是双射的——不会丢失任何关键分类信息
Quantitative Foundation
量化基础
The connection between automorphisms and Kolmogorov complexity:
K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)Graphs with large automorphism groups have lower complexity because only one representative from each orbit needs encoding. For highly symmetric structures, compression can reach n/log n factor.
自同构与Kolmogorov复杂度的关联:
K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)具有大自同构群的图复杂度更低,因为每个轨道只需编码一个代表。对于高度对称的结构,压缩比可达n/log n。
Why This Matters for Knowledge Graphs
为何对知识图谱重要
Knowledge graphs exhibit natural structural regularities:
| Pattern | Compression Mechanism | Typical Reduction |
|---|---|---|
| Type hierarchies | Automorphism orbits | 40-60% |
| Repeated subgraphs | k-bisimulation equivalence | 50-80% |
| Community structure | Block quotients | 30-50% |
| Self-similar patterns | Scale-invariant quotients | 60-95% |
知识图谱呈现自然的结构规律性:
| 模式 | 压缩机制 | 典型缩减率 |
|---|---|---|
| 类型层次 | 自同构轨道 | 40-60% |
| 重复子图 | k-互模拟等价 | 50-80% |
| 社区结构 | 块商 | 30-50% |
| 自相似模式 | 尺度不变商 | 60-95% |
Core Capabilities
核心能力
1. Structured Entity Extraction
1. 结构化实体提取
Extract entities with confidence scores, provenance tracking, and property attribution:
- Entity types: Person, Organization, Concept, Event, Document, Technology, Location
- Confidence scoring: 0.0-1.0 scale based on evidence clarity
- Provenance metadata: Source document, location, timestamp
- Alias tracking: Capture all name variations
Key principle: Every extraction must include confidence score and source tracking for auditability.
提取带有置信度得分、来源追踪和属性归因的实体:
- 实体类型:Person(人物)、Organization(组织)、Concept(概念)、Event(事件)、Document(文档)、Technology(技术)、Location(地点)
- 置信度评分:基于证据清晰度的0.0-1.0评分体系
- 来源元数据:源文档、位置、时间戳
- 别名追踪:捕获所有名称变体
核心原则:每一项提取都必须包含置信度得分和来源追踪,以满足可审计性要求。
2. Relationship Mapping
2. 关系映射
Identify and classify relationships between entities:
- Core relationships: WORKS_FOR, AFFILIATED_WITH, RELATED_TO, AUTHORED, CITES, USES, LOCATED_IN, IMPLEMENTS
- Domain-specific relationships: Load schemas from
/mnt/skills/user/knowledge-graph/schemas/ - Bidirectional awareness: Track relationship directionality
- Property attribution: Capture relationship metadata (dates, roles, contexts)
识别并分类实体间的关系:
- 核心关系:WORKS_FOR(供职于)、AFFILIATED_WITH(关联于)、RELATED_TO(相关于)、AUTHORED(著有)、CITES(引用)、USES(使用)、LOCATED_IN(位于)、IMPLEMENTS(实现)
- 领域特定关系:从加载schema
/mnt/skills/user/knowledge-graph/schemas/ - 双向感知:追踪关系的方向性
- 属性归因:捕获关系元数据(日期、角色、上下文)
3. Domain-Specific Schemas
3. 领域特定Schema
General domains use
core_ontology.mdCoding/software domains additionally use which adds:
coding_domain.md- CodeEntity, Repository, API, Library, Architecture, Bug types
- DEPENDS_ON, CALLS, INHERITS_FROM, FIXES, DEPLOYED_ON relationships
- Language-specific extraction patterns
通用领域使用
core_ontology.md编码/软件领域额外使用,新增内容包括:
coding_domain.md- CodeEntity(代码实体)、Repository(仓库)、API、Library(库)、Architecture(架构)、Bug(缺陷)类型
- DEPENDS_ON(依赖于)、CALLS(调用)、INHERITS_FROM(继承自)、FIXES(修复)、DEPLOYED_ON(部署于)关系
- 特定语言的提取模式
4. Structural Equivalence Analysis
4. 结构等价分析
Identify and exploit structural redundancy through automorphism detection:
Automorphism-Based Compression:
python
undefined通过自同构检测识别并利用结构冗余:
基于自同构的压缩:
python
undefinedCompute automorphism group
计算自同构群
aut_group = compute_automorphisms(graph)
orbits = partition_by_orbits(graph.nodes, aut_group)
aut_group = compute_automorphisms(graph)
orbits = partition_by_orbits(graph.nodes, aut_group)
Each orbit → single representative
每个轨道 → 单个代表
compressed_nodes = [orbit.canonical_representative() for orbit in orbits]
compression_ratio = len(graph.nodes) / len(compressed_nodes)
**Equivalence Types**:
- **Structural equivalence**: Identical connection patterns (strictest)
- **Regular equivalence**: Same relationship *types* to equivalent alters
- **Automorphic equivalence**: Permutable without changing structurecompressed_nodes = [orbit.canonical_representative() for orbit in orbits]
compression_ratio = len(graph.nodes) / len(compressed_nodes)
**等价类型**:
- **结构等价**:完全相同的连接模式(最严格)
- **正则等价**:与等价节点具有相同关系*类型*
- **自同构等价**:可置换且不改变结构5. k-Bisimulation Summarization
5. k-互模拟摘要
Compress graphs while preserving query semantics using k-bisimulation:
Definition: Two nodes are k-bisimilar if they have:
- Same labels
- Same edge types to k-bisimilar neighbors
- This property holds recursively to depth k
Implementation:
bash
python scripts/compress_graph.py graph.json \
--method k-bisim \
--k 5 \ # k=5 sufficient for most graphs
--preserve-queries reachability,patternEmpirical Results:
- k > 5 yields minimal additional partition refinement
- Achieves 95% reduction for reachability queries
- Achieves 57% reduction for pattern matching
- Incremental update cost: O(Δ·d^k) where Δ is changes, d is max degree
使用k-bisimulation压缩图同时保留查询语义:
定义:两个节点是k-互模拟的当且仅当:
- 具有相同标签
- 与k-互模拟邻居具有相同边类型
- 该特性递归适用于深度k
实现:
bash
python scripts/compress_graph.py graph.json \
--method k-bisim \
--k 5 \ # k=5对大多数图已足够
--preserve-queries reachability,pattern实证结果:
- k>5时分区细化的增益极小
- 可达性查询可实现95%的缩减
- 模式匹配可实现57%的缩减
- 增量更新成本:O(Δ·d^k),其中Δ为变更数,d为最大度数
6. Categorical Quotient Construction
6. 分类商构造
Apply category-theoretic compression with provable structure preservation:
The Universal Property Guarantee:
For any quotient Q: C → C/R, if H: C → D is any functor such that H(f) = H(g) whenever f ~ g in R, then H factors uniquely as H = H' ∘ Q.
This unique factorization means the quotient is the "freest" (most compressed) object respecting the equivalence—any construction built on the original that respects the equivalence can be equivalently built on the quotient.
Skeleton Construction:
python
undefined应用分类理论压缩并保证结构保留:
泛性质保证:
对于任何商Q: C → C/R,如果H: C → D是任意函子,且当f ~ g在R中时H(f) = H(g),则H可唯一分解为H = H' ∘ Q。
这种唯一分解意味着商是尊重等价性的“最自由”(压缩程度最高)的对象——任何基于原图且尊重等价性的构造都可等价地基于压缩图实现。
骨架构造:
python
undefinedEvery category is equivalent to its skeleton
每个分类都等价于其骨架
skeleton = compute_skeleton(category)
skeleton = compute_skeleton(category)
skeleton contains exactly one representative per isomorphism class
骨架包含每个同构类的恰好一个代表
All categorical properties preserved (limits, colimits, exactness)
所有分类属性均被保留(极限、余极限、正合性)
undefinedundefined7. Metagraph Hierarchical Modeling
7. 元图层次建模
Support edge-of-edge structures for multi-scale representation:
Metagraph Definition: MG = ⟨V, MV, E, ME⟩
- V: vertices
- MV: metavertices (each containing an embedded metagraph fragment)
- E: edges connecting sets of vertices
- ME: metaedges connecting vertices, edges, or both
Why Metagraphs Enable Scale Invariance:
The edge-of-edge capability creates holonic structure—self-similar patterns where the relationship between a metavertex and its contents mirrors the relationship between the entire metagraph and its top-level components. Automorphisms operate at multiple levels simultaneously, creating compression opportunities at each scale when these automorphism structures are isomorphic across levels.
2-Category Interpretation:
- 0-cells: vertices/elements
- 1-morphisms: edges connecting sets
- 2-morphisms: metaedges relating edges
The interchange law ensures scale-independent composition.
支持边-边结构以实现多尺度表示:
元图定义:MG = ⟨V, MV, E, ME⟩
- V:顶点
- MV:元顶点(每个包含嵌入的元图片段)
- E:连接顶点集的边
- ME:连接顶点、边或两者的元边
元图为何支持尺度不变性:
边-边能力创造了整体结构——自相似模式中,元顶点与其内容的关系,与整个元图与其顶层组件的关系是镜像的。自同构同时在多个层级运作,当这些自同构结构跨层级同构时,每个尺度都存在压缩机会。
2-分类解释:
- 0-胞元:顶点/元素
- 1-态射:连接集合的边
- 2-态射:关联边的元边
交换律确保尺度无关的组合。
8. Topology Metrics & Quality Validation
8. 拓扑指标与质量验证
Graph Quality Metrics:
| Metric | Formula | Target | Significance |
|---|---|---|---|
| Edge-to-Node Ratio | |E|/|V| | ≥4:1 | Enables emergence through dense connectivity |
| Isolation Rate | |V_isolated|/|V| | <20% | Measures integration completeness |
| Clustering Coefficient | Local triangles/possible triangles | >0.3 | Small-world property indicator |
| Fractal Dimension | d_B from box-covering | Finite | Self-similarity/compressibility |
| Average Path Length | Mean geodesic distance | Low | Information flow efficiency |
Scale-Invariance Indicators:
N_B(l_B) ~ l_B^(-d_B)Networks with finite fractal dimension d_B are self-similar and can be compressed at multiple resolutions with compression ratio scaling as l^(d_B).
Validation Script:
bash
python scripts/validate_graph.py graph.json --topology --compression-potential图质量指标:
| 指标 | 公式 | 目标 | 意义 |
|---|---|---|---|
| 边-节点比率 | |E|/|V| | ≥4:1 | 通过密集连接实现涌现性 |
| 孤立率 | |V_isolated|/|V| | <20% | 衡量集成完整性 |
| 聚类系数 | 局部三角形数/可能的三角形数 | >0.3 | 小世界特性指标 |
| 分形维数 | 盒覆盖法得到的d_B | 有限 | 自相似性/可压缩性 |
| 平均路径长度 | 平均测地距离 | 低 | 信息流效率 |
尺度不变性指标:
N_B(l_B) ~ l_B^(-d_B)具有有限分形维数d_B的网络是自相似的,可在多个分辨率下压缩,压缩比随l^(d_B)缩放。
验证脚本:
bash
python scripts/validate_graph.py graph.json --topology --compression-potential9. Information-Theoretic Analysis
9. 信息论分析
Structural Entropy:
H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)The term -n·log(n) represents compression gain from removing label information.
Minimum Description Length (MDL):
For graph G and model M:
L(G,M) = L(M) + L(G|M)Optimal compression minimizes this total description length. Community structure reduces entropy by ~k·log(n) bits for k communities.
Compressibility Predictors:
- High transitivity → higher compressibility
- Degree heterogeneity → higher compressibility
- Hierarchical structure → enables predictable transitions, lower entropy rates
结构熵:
H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)-n·log(n)项代表移除标签信息带来的压缩增益。
最小描述长度(MDL):
对于图G和模型M:
L(G,M) = L(M) + L(G|M)最优压缩最小化总描述长度。社区结构可减少约k·log(n)比特的熵,其中k为社区数量。
可压缩性预测因子:
- 高传递性 → 更高可压缩性
- 度数异质性 → 更高可压缩性
- 层次结构 → 可预测的转换,更低的熵率
Extraction Guidelines
提取指南
Confidence Scoring Rules
置信度评分规则
| Score | Criteria | Example |
|---|---|---|
| 0.9-1.0 | Explicitly stated with clear evidence | "Dr. Jane Smith works for MIT" |
| 0.7-0.89 | Strongly implied by context | Person with @mit.edu email |
| 0.5-0.69 | Reasonably inferred but ambiguous | Co-authorship implies collaboration |
| 0.3-0.49 | Weak inference, requires validation | Similar domain suggests relationship |
| 0.0-0.29 | Speculative, likely incorrect | Pure assumption |
| 得分 | 标准 | 示例 |
|---|---|---|
| 0.9-1.0 | 有明确证据的明确表述 | "Dr. Jane Smith works for MIT"(简·史密斯博士供职于麻省理工学院) |
| 0.7-0.89 | 上下文强烈暗示 | 带有@mit.edu邮箱的人物 |
| 0.5-0.69 | 合理推断但存在歧义 | 合著关系暗示协作 |
| 0.3-0.49 | 弱推断,需验证 | 领域相似暗示关系 |
| 0.0-0.29 | 推测性,可能不正确 | 纯粹假设 |
ID Generation Strategy
ID生成策略
Create stable, meaningful identifiers:
- Format: (e.g.,
{type}_{normalized_name},person_jane_smith)org_mit - Normalization: Lowercase, replace spaces with underscores, remove special chars
- Uniqueness: Add numeric suffix if collision occurs
- Stability: Same entity in different documents should generate same ID
创建稳定、有意义的标识符:
- 格式:(例如
{type}_{normalized_name}、person_jane_smith)org_mit - 规范化:小写,空格替换为下划线,移除特殊字符
- 唯一性:发生冲突时添加数字后缀
- 稳定性:不同文档中的同一实体应生成相同ID
Provenance Best Practices
来源最佳实践
Always include:
- : Document ID or filename
source_document - : Page number, section, line range
source_location - : ISO 8601 format
extraction_timestamp - : Skill version identifier
extractor_version
始终包含:
- :文档ID或文件名
source_document - :页码、章节、行范围
source_location - :ISO 8601格式
extraction_timestamp - :技能版本标识符
extractor_version
Advanced Workflows
高级工作流
Compression Pipeline
压缩流水线
bash
undefinedbash
undefined1. Initial extraction
1. 初始提取
(Extract to graph.json)
(提取到graph.json)
2. Validate and analyze
2. 验证与分析
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --full
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --full
3. Compute structural equivalence
3. 计算结构等价
python scripts/structural_equivalence.py graph.json
--output equivalence_classes.json
--method automorphism
--output equivalence_classes.json
--method automorphism
python scripts/structural_equivalence.py graph.json
--output equivalence_classes.json
--method automorphism
--output equivalence_classes.json
--method automorphism
4. Apply k-bisimulation compression
4. 应用k-互模拟压缩
python scripts/compress_graph.py graph.json
--equivalence equivalence_classes.json
--method k-bisim --k 5
--output compressed.json
--equivalence equivalence_classes.json
--method k-bisim --k 5
--output compressed.json
python scripts/compress_graph.py graph.json
--equivalence equivalence_classes.json
--method k-bisim --k 5
--output compressed.json
--equivalence equivalence_classes.json
--method k-bisim --k 5
--output compressed.json
5. Verify preservation
5. 验证保留能力
python scripts/verify_compression.py graph.json compressed.json
--queries reachability,pattern,neighborhood
--queries reachability,pattern,neighborhood
python scripts/verify_compression.py graph.json compressed.json
--queries reachability,pattern,neighborhood
--queries reachability,pattern,neighborhood
6. Generate topology report
6. 生成拓扑报告
python scripts/topology_metrics.py compressed.json --report
undefinedpython scripts/topology_metrics.py compressed.json --report
undefinedIterative Refinement with Compression
基于压缩的迭代细化
- Initial extraction: Broad pass capturing entities/relationships
- Topology analysis: Compute |E|/|V| ratio, isolation rate, clustering
- Compression analysis: Identify automorphism orbits, k-bisimilar classes
- Strategic refinement: Focus on:
- Central concepts with weak connections
- Isolated high-confidence entities
- Low-compression-potential regions (may need restructuring)
- Compress and validate: Apply quotient construction, verify query preservation
- Repeat: Continue until quality thresholds met AND compression ratio stabilizes
Termination criteria:
- Isolation rate < 20%
- |E|/|V| ratio ≥ 4:1
- Compression ratio improvement < 5% between iterations
- Query preservation verified
- 初始提取:广泛捕获实体/关系的初步处理
- 拓扑分析:计算|E|/|V|比率、孤立率、聚类系数
- 压缩分析:识别自同构轨道、k-互模拟类
- 策略性细化:重点关注:
- 连接薄弱的核心概念
- 孤立的高置信度实体
- 低压缩潜力区域(可能需要重构)
- 压缩与验证:应用商构造,验证查询保留能力
- 重复:持续迭代直到满足质量阈值且压缩比稳定
终止条件:
- 孤立率 <20%
- |E|/|V|比率 ≥4:1
- 迭代间压缩比提升 <5%
- 查询保留能力已验证
Multi-Scale Metagraph Construction
多尺度元图构造
For complex domains requiring hierarchical representation:
bash
undefined对于需要层次表示的复杂领域:
bash
undefined1. Extract at multiple granularities
1. 多粒度提取
python scripts/extract_hierarchical.py source.txt
--levels strategic,tactical,operational
--output metagraph.json
--levels strategic,tactical,operational
--output metagraph.json
python scripts/extract_hierarchical.py source.txt
--levels strategic,tactical,operational
--output metagraph.json
--levels strategic,tactical,operational
--output metagraph.json
2. Compute cross-level automorphisms
2. 计算跨层级自同构
python scripts/metagraph_automorphisms.py metagraph.json
python scripts/metagraph_automorphisms.py metagraph.json
3. Apply scale-invariant compression
3. 应用尺度不变压缩
python scripts/compress_metagraph.py metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json
undefinedpython scripts/compress_metagraph.py metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json
--preserve-hierarchy
--output compressed_metagraph.json
undefinedCommon Patterns
常见模式
Pattern: Query-Preserving Compression
模式:保留查询的压缩
Compress while guaranteeing specific query types remain answerable:
python
undefined压缩同时保证特定查询类型可应答:
python
undefinedDefine query preservation requirements
定义查询保留要求
queries = {
"reachability": True, # 95% reduction possible
"pattern_match": True, # 57% reduction possible
"neighborhood_k": 3, # Preserve 3-hop neighborhoods }
"neighborhood_k": 3, # Preserve 3-hop neighborhoods }
queries = {
"reachability": True, # 可实现95%缩减
"pattern_match": True, # 可实现57%缩减
"neighborhood_k": 3, # 保留3跳邻居 }
"neighborhood_k": 3, # 保留3跳邻居 }
Compress with guarantees
带保证的压缩
compressed = compress_with_guarantees(
graph,
method="k-bisimulation",
k=max(5, queries["neighborhood_k"]),
preserve=queries
)
undefinedcompressed = compress_with_guarantees(
graph,
method="k-bisimulation",
k=max(5, queries["neighborhood_k"]),
preserve=queries
)
undefinedPattern: Incremental Compression Maintenance
模式:增量压缩维护
Maintain compression as graph evolves:
python
undefined随图演化维护压缩状态:
python
undefinedUpdate cost: O(Δ·d^k)
更新成本:O(Δ·d^k)
Δ = number of changes
Δ = 变更数
d = maximum degree
d = 最大度数
k = bisimulation depth
k = 互模拟深度
def update_compression(compressed_graph, changes):
affected_classes = identify_affected_equivalence_classes(changes)
recompute_local_bisimulation(affected_classes, k=5)
return updated_compressed_graph
undefineddef update_compression(compressed_graph, changes):
affected_classes = identify_affected_equivalence_classes(changes)
recompute_local_bisimulation(affected_classes, k=5)
return updated_compressed_graph
undefinedPattern: Categorical Ontology Integration
模式:分类本体集成
Use ologs (ontology logs) for categorical knowledge representation:
python
undefined使用ologs(本体日志)进行分类知识表示:
python
undefinedOlog: category where objects = noun phrases, morphisms = verb phrases
Olog:对象=名词短语,态射=动词短语的分类
olog = {
"objects": ["a person", "an organization", "a concept"],
"morphisms": [
{"source": "a person", "target": "an organization", "label": "works for"},
{"source": "a concept", "target": "a concept", "label": "relates to"}
]
}
olog = {
"objects": ["a person", "an organization", "a concept"],
"morphisms": [
{"source": "a person", "target": "an organization", "label": "works for"},
{"source": "a concept", "target": "a concept", "label": "relates to"}
]
}
Yoneda embedding: object determined by morphisms into it
米田嵌入:对象由到它的态射决定
Compression: store relationships, not internal structure
压缩:存储关系而非内部结构
undefinedundefinedError Handling
错误处理
Compression Quality Issues
压缩质量问题
When compression produces unexpected results:
- Over-compression: Raise k value in k-bisimulation (default k=5)
- Under-compression: Check for missing type labels, inconsistent schemas
- Query degradation: Verify query type is in preservation set
- Scale-invariance failure: Check for unbalanced hierarchical structure
当压缩产生意外结果时:
- 过度压缩:提高k-互模拟中的k值(默认k=5)
- 压缩不足:检查是否缺少类型标签、schema不一致
- 查询退化:验证查询类型是否在保留集合中
- 尺度不变性失效:检查层次结构是否失衡
Topology Violations
拓扑违规
When graph metrics fall outside targets:
- |E|/|V| < 4: Graph too sparse—identify disconnected concepts, add bridging relationships
- Isolation > 20%: Too many orphan nodes—run connectivity analysis
- Clustering < 0.3: Lacks small-world property—add local triangulation
当图指标超出目标范围时:
- |E|/|V| <4:图过于稀疏——识别断开的概念,添加桥接关系
- 孤立率>20%:孤立节点过多——运行连通性分析
- 聚类系数<0.3:缺乏小世界特性——添加局部三角化
File Structure
文件结构
/mnt/skills/user/knowledge-graph/
├── SKILL.md # This file
├── schemas/
│ ├── core_ontology.md # Universal entity/relationship types
│ ├── coding_domain.md # Software development extension
│ └── categorical_ontology.md # Category-theoretic type system
├── templates/
│ ├── extraction_template.md # JSON format specification
│ └── metagraph_template.md # Hierarchical metagraph format
└── scripts/
├── validate_graph.py # Quality validation
├── merge_graphs.py # Deduplication and merging
├── analyze_graph.py # Refinement strategy generation
├── compress_graph.py # k-bisimulation compression
├── structural_equivalence.py # Automorphism computation
├── topology_metrics.py # Graph topology analysis
└── verify_compression.py # Query preservation verification/mnt/skills/user/knowledge-graph/
├── SKILL.md # 本文档
├── schemas/
│ ├── core_ontology.md # 通用实体/关系类型
│ ├── coding_domain.md # 软件开发扩展
│ └── categorical_ontology.md # 分类理论类型系统
├── templates/
│ ├── extraction_template.md # JSON格式规范
│ └── metagraph_template.md # 层次元图格式
└── scripts/
├── validate_graph.py # 质量验证
├── merge_graphs.py # 去重与合并
├── analyze_graph.py # 细化策略生成
├── compress_graph.py # k-互模拟压缩
├── structural_equivalence.py # 自同构计算
├── topology_metrics.py # 图拓扑分析
└── verify_compression.py # 查询保留验证Dependencies
依赖
All scripts require Python 3.7+ with standard library only (no external packages for core functionality). Optional NetworkX for advanced topology metrics.
所有脚本需要Python 3.7+,仅依赖标准库(核心功能无需外部包)。可选NetworkX用于高级拓扑指标计算。
Best Practices Summary
最佳实践总结
- Always start with schema: Load appropriate ontology before extraction
- Include confidence scores: Never omit—use 0.5 if uncertain
- Track provenance: Every entity/relationship needs source metadata
- Validate early: Run validation after each extraction
- Analyze topology: Check |E|/|V| ratio before refinement
- Compress strategically: Use k=5 for k-bisimulation (sufficient for most graphs)
- Preserve queries: Specify which query types must remain answerable
- Iterate with metrics: Let topology and compression metrics guide improvement
- 始终从schema开始:提取前加载合适的本体
- 包含置信度得分:绝不可省略——不确定时使用0.5
- 追踪来源:每个实体/关系都需要来源元数据
- 尽早验证:每次提取后运行验证
- 分析拓扑:细化前检查|E|/|V|比率
- 策略性压缩:k-互模拟使用k=5(对大多数图已足够)
- 保留查询:指定必须保留的查询类型
- 基于指标迭代:让拓扑和压缩指标指导改进
Integration with Other Skills
与其他技能的集成
This skill composes naturally with:
- hierarchical-reasoning: Strategic→tactical→operational maps to metagraph levels
- obsidian-markdown: Compressed graphs export as linked note structures
- knowledge-orchestrator: Automatic routing for extraction→compression→documentation workflows
- infranodus-orchestrator: Text network analysis → k-bisimulation compression
本技能可与以下技能自然组合:
- hierarchical-reasoning(层次推理):战略→战术→操作层级映射到元图层级
- obsidian-markdown:压缩图可导出为链接笔记结构
- knowledge-orchestrator(知识编排器):自动路由提取→压缩→文档工作流
- infranodus-orchestrator:文本网络分析 → k-互模拟压缩
Hierarchical Reasoning Integration
层次推理集成
yaml
mapping:
strategic_level: metagraph_level_0
tactical_level: metagraph_level_1
operational_level: metagraph_level_2
convergence_metrics: compression_ratio, query_preservationyaml
mapping:
strategic_level: metagraph_level_0
tactical_level: metagraph_level_1
operational_level: metagraph_level_2
convergence_metrics: compression_ratio, query_preservationEvaluation Criteria
评估标准
A high-quality extraction with compression demonstrates:
- Completeness: Major entities and relationships captured
- Accuracy: High confidence scores (avg >0.7) and validated
- Connectivity: |E|/|V| ≥ 4:1, isolation <20%
- Compressibility: Achieves ≥50% reduction via k-bisimulation
- Preservation: Specified queries remain answerable post-compression
- Scale-invariance: Finite fractal dimension for hierarchical structures
Core Philosophy: Knowledge graphs emerge through iterative refinement—initial extraction establishes structure, topology analysis reveals density gaps, structural equivalence enables compression, and categorical quotients preserve essential relationships while eliminating redundancy. The compression is "lossy but structure-preserving" because categorical equivalence guarantees that compressed representations support all the same inferences as their originals.
高质量的提取与压缩应满足:
- 完整性:捕获主要实体与关系
- 准确性:高置信度得分(平均>0.7)且已验证
- 连通性:|E|/|V|≥4:1,孤立率<20%
- 可压缩性:通过k-互模拟实现≥50%的缩减
- 保留性:指定查询在压缩后仍可应答
- 尺度不变性:层次结构具有有限分形维数
核心理念:知识图谱通过迭代细化涌现——初始提取建立结构,拓扑分析揭示密度缺口,结构等价实现压缩,分类商在消除冗余的同时保留关键关系。这种压缩是“有损但结构保留”的,因为分类等价保证压缩表示支持与原图完全相同的推理。