Loading...
Loading...
Use when designing and building knowledge graphs from unstructured data. Invoke when user mentions entity extraction, schema design, LPG vs RDF, graph data model, ontology alignment, knowledge graph construction, or building a KG for RAG. Provides extraction pipelines, schema patterns, and data model selection guidance.
npx skill4agent add lyndonkl/claude knowledge-graph-constructionKG Construction Progress:
- [ ] Step 1: Identify data sources and domain scope
- [ ] Step 2: Select graph data model
- [ ] Step 3: Design schema and ontology
- [ ] Step 4: Configure extraction pipeline
- [ ] Step 5: Define layered architecture
- [ ] Step 6: Validate and quality-check the graph| Model | Flexibility | Standardization | Reasoning | Vector Integration | Query Language | Best For |
|---|---|---|---|---|---|---|
| LPG | High | Low | Limited | Native (Neo4j) | Cypher, Gremlin | Rapid development, RAG pipelines |
| RDF/OWL | Medium | High | Full (OWL-DL) | Via extensions | SPARQL | Interoperability, ontology-heavy domains |
| Hypergraph | High | Low | Limited | Custom | Custom APIs | N-ary relations, multi-entity events |
| Temporal | Medium | Low | Time-based | Via extensions | Temporal Cypher | Evolving knowledge, episodic memory |
| Domain | Recommended Model | Rationale |
|---|---|---|
| Biomedical / Clinical | RDF/OWL | UMLS/SNOMED ontologies, reasoning needed |
| Enterprise / RAG | LPG | Fast iteration, vector search integration |
| Event-centric (news, logs) | Hypergraph or Temporal | Multi-participant events, time evolution |
| Legal / Compliance | RDF/OWL | Formal reasoning, provenance chains |
| Scientific Literature | LPG + Layered | Flexible extraction, layered trust |
(:Person {name, role}) -[:WORKS_AT {since}]-> (:Organization {name, type})
(:Drug {name, class}) -[:TREATS {efficacy}]-> (:Disease {name, icd_code})(:ClinicalTrial {id, phase, start_date})
-[:HAS_DRUG]-> (:Drug {name})
-[:HAS_CONDITION]-> (:Disease {name})
-[:HAS_OUTCOME]-> (:Outcome {measure, value})
-[:CONDUCTED_BY]-> (:Organization {name})Layer 3 (Canonical Ontology): Formal class hierarchy, relation definitions, constraints
Layer 2 (Domain Knowledge): Curated facts from literature, expert-validated
Layer 1 (Instance Data): Extracted from user documents, case-specific, lower confidenceKNOWLEDGE GRAPH CONSTRUCTION SPECIFICATION
============================================
Domain: [Target domain and scope]
Use Case: [RAG / Reasoning / Analytics / Hybrid]
Data Sources: [List of input data types and volumes]
Data Model: [LPG / RDF / Hypergraph / Temporal]
Query Language: [Cypher / SPARQL / Gremlin / Custom]
Storage Backend: [Neo4j / Amazon Neptune / Virtuoso / etc.]
Schema Definition:
Node Types:
1. [EntityType] - [description]
Properties: [list with types]
2. [EntityType] - [description]
Properties: [list with types]
3. [Continue for each node type...]
Edge Types:
1. [RelationType] (source -> target) - [description]
Properties: [list with types]
2. [Continue for each edge type...]
Constraints:
- [Cardinality, uniqueness, required properties]
Extraction Pipeline:
1. Entity Extraction
- Method: [LLM-assisted / NER / Hybrid]
- Prompt template: [summary or reference]
- Verification: [Multi-round / Second-LLM / Manual sample]
2. Relation Extraction
- Method: [Prompt-based / Dependency parsing / Hybrid]
- Few-shot examples: [count and source]
3. Normalization
- Deduplication: [method]
- Ontology linking: [target ontology]
- Synonym resolution: [approach]
Layered Architecture:
Layer 1 (Instance): [description of instance-level data]
Layer 2 (Domain): [description of curated domain knowledge]
Layer 3 (Ontology): [description of formal schema]
Provenance: [How source/confidence/timestamp are tracked]
Validation Plan:
- Schema conformance: [automated checks]
- Coverage: [expected entity/relation counts]
- Consistency: [contradiction detection method]
- Human review: [sampling strategy]
Estimated Scale: [node count, edge count, properties per node]
Key Dependencies: [libraries, APIs, ontologies]
NEXT STEPS:
- Implement extraction pipeline on sample data
- Populate graph and run validation suite
- Iterate schema based on extraction results
- Integrate with downstream application (RAG, reasoning, etc.)