When to Use
- Designing graph model from scratch (domain → nodes, rels, props)
- Reviewing existing model for anti-patterns
- Deciding node vs property vs relationship vs label
- Migrating relational or document schema to graph
- Designing intermediate nodes for n-ary or complex relationships
- Detecting and mitigating supernode / high-fanout problems
- Choosing and creating constraints + indexes for a model
When NOT to Use
- Writing or optimizing Cypher →
- Spring Data Neo4j (@Node, @Relationship) →
- GraphQL type definitions →
- Importing data (LOAD CSV, APOC import) →
Inspect Before Designing
On existing database, run first — never propose changes without current state:
cypher
CALL db.schema.visualization() YIELD nodes, relationships RETURN nodes, relationships;
SHOW CONSTRAINTS YIELD name, type, labelsOrTypes, properties RETURN name, type, labelsOrTypes, properties;
SHOW INDEXES YIELD name, type, labelsOrTypes, state WHERE state = 'ONLINE' RETURN name, type, labelsOrTypes;
If APOC available:
cypher
CALL apoc.meta.schema() YIELD value RETURN value;
MCP tool map:
| Operation | Tool |
|---|
| Inspect schema | |
| , | |
CREATE CONSTRAINT ... IF NOT EXISTS
| (show + confirm first) |
Defaults — Apply to Every Model
- Use-case first — list 5+ queries the model must answer before designing
- Nodes = entities (nouns) with identity; rels = connections (verbs) with direction
- Labels PascalCase; rel types SCREAMING_SNAKE_CASE; properties camelCase
- Every node type used in MERGE has a uniqueness constraint on its key property
- No generic labels (, , ); no generic rel types (, )
- Rel direction encodes semantic meaning — not arbitrary
- Inspect schema before proposing any change on an existing database
- All constraint/index DDL uses — safe to rerun
Key Patterns
Node vs Relationship vs Property — Decision Table
| Question | Answer | Model as |
|---|
| Is it a thing with identity, queried as entry point? | Yes | Node |
| Is it a connection between two things with direction? | Yes | Relationship |
| Does the connection have its own properties or multiple targets? | Yes | Intermediate node |
| Is it a scalar always returned with its parent, never filtered alone? | Yes | Property on parent |
| Is it a category used for type-based filtering or path traversal? | Yes | Label (not a property) |
| Does the same attribute value repeat across many nodes (low cardinality)? | Yes | Label, not a property node |
| Is it a fact connecting >2 entities? | Yes | Intermediate node |
Property vs Label — Decision Table
| Use label when | Use property when |
|---|
| Values are few, fixed, used as traversal filters () | Values are many, dynamic, or unique per node |
| You traverse by type () | You filter by value () |
| Category drives index selection | Fine-grained value drives range scans |
| Example: , , | Example: , , |
Rule: adding a label is cheap; scanning all
nodes is fast. Never model high-cardinality values as labels.
Intermediate Node Pattern
Use when a relationship needs its own properties, connects >2 entities, or is independently queryable.
Before (relationship with property — limited):
(Person)-[:ACTED_IN {role: "Neo"}]->(Movie)
// Cannot query roles independent of movies
After (intermediate node — queryable, extensible):
(Person)-[:PLAYED]->(Role {name: "Neo"})-[:IN]->(Movie)
// MATCH (r:Role) WHERE r.name STARTS WITH 'Neo' RETURN r
Employment overlap example:
cypher
// Find colleagues who overlapped at same company
MATCH (p1:Person)-[:WORKED_AT]->(e1:Employment)-[:AT]->(c:Company)<-[:AT]-(e2:Employment)<-[:WORKED_AT]-(p2:Person)
WHERE p1 <> p2
AND e1.startDate <= e2.endDate AND e2.startDate <= e1.endDate
RETURN p1.name, p2.name, c.name
Promote relationship to intermediate node when:
- Relationship has >2 properties
- Relationship is the subject of another query
- Multiple entities share the same connection context
- You need to connect >2 entities in one fact
Relational → Graph Migration Table
| Relational construct | Graph equivalent | Notes |
|---|
| Table row | Node | One label per table (add more as needed) |
| Column (scalar) | Node property | |
| Primary key | Uniqueness constraint property | Use , not (too generic) |
| Foreign key | Relationship | Direction: from dependent → referenced |
| Many-to-many junction table | Intermediate node | Especially if junction has own columns |
| Junction table (no own columns) | Direct relationship | Simpler; upgrade to intermediate node later |
| NULL FK (optional relation) | Absent relationship | No node created; absence is the signal |
| Polymorphic FK (Rails-style) | Multiple labels or relationship types | Split into type-specific rels |
| Self-referential FK | Same-label relationship | → |
| Audit/history columns | Intermediate versioning node | See References for versioning pattern |
Supernode Detection and Mitigation
Detect:
cypher
// Find top-10 highest-degree nodes
MATCH (n)
RETURN labels(n) AS labels, elementId(n) AS id, count{ (n)--() } AS degree
ORDER BY degree DESC LIMIT 10
Node with degree >> median for its label = supernode candidate. Any node with >100K relationships will degrade traversal queries that pass through it.
Causes:
- Domain supernodes: airports, celebrities, popular hashtags — unavoidable
- Modeling supernodes: gender, country, status modeled as nodes with millions of edges — avoidable
Mitigation strategies (in priority order):
| Strategy | When to use | Implementation |
|---|
| Query direction | Directional asymmetry exists | Query from low-degree side; exploit direction |
| Relationship type split | Supernode serves multiple roles | + instead of single |
| Label segregation | Supernode conflates entity types | vs → query only relevant subtype |
| Bucket pattern | Time-series or high-volume event nodes | See below |
| Avoid modeling | Low-cardinality categoricals | Use label instead of node ( not (:Status {name:"Active"})
) |
| Join hint | Query tuning last resort | in Cypher |
Bucket pattern (time-series / high-volume):
cypher
// Instead of: (:User)-[:VIEWED]->(:Page) (millions of rels per user)
// Bucket by hour:
(u:User)-[:VIEWED_IN]->(b:ViewBucket {userId: u.id, hour: '2025-04-28T14'})-[:VIEWED]->(p:Page)
// Query last hour's views without traversing full history:
MATCH (u:User {id: $uid})-[:VIEWED_IN]->(b:ViewBucket {hour: $hour})-[:VIEWED]->(p)
RETURN p.url
Naming Conventions
| Element | Convention | Good | Bad |
|---|
| Node label | PascalCase, singular noun | , | , , |
| Relationship type | SCREAMING_SNAKE_CASE, verb phrase | , | , , |
| Property key | camelCase | , | , |
| Constraint name | snake_case descriptive | | |
| Index name | snake_case descriptive | | |
Schema Enforcement — What to Create for Each Element
Run all DDL with
. Apply before importing data.
cypher
// 1. Uniqueness constraint — every node type used in MERGE
CREATE CONSTRAINT person_id_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;
// 2. Existence constraint (Enterprise) — mandatory properties
CREATE CONSTRAINT person_name_exists IF NOT EXISTS
FOR (p:Person) REQUIRE p.name IS NOT NULL;
// 3. Property type constraint (Enterprise) — enforce data type
CREATE CONSTRAINT person_born_integer IF NOT EXISTS
FOR (p:Person) REQUIRE p.born IS :: INTEGER;
// 4. Key constraint (Enterprise) — unique + exists in one
CREATE CONSTRAINT movie_tmdbid_key IF NOT EXISTS
FOR (m:Movie) REQUIRE m.tmdbId IS NODE KEY;
// 5. Range index — equality and range filters on properties
CREATE INDEX person_name_idx IF NOT EXISTS
FOR (p:Person) ON (p.name);
// 6. Fulltext index — CONTAINS, STARTS WITH, free text search
CREATE FULLTEXT INDEX person_fulltext IF NOT EXISTS
FOR (n:Person) ON EACH [n.name, n.bio];
// 7. Vector index — embedding similarity search
CREATE VECTOR INDEX chunk_embedding_idx IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: { `vector.dimensions`: 1536, `vector.similarity_function`: 'cosine' } };
// 8. Relationship index — filter on rel properties
CREATE INDEX acted_in_year_idx IF NOT EXISTS
FOR ()-[r:ACTED_IN]-() ON (r.year);
After creating indexes, poll until ONLINE:
cypher
SHOW INDEXES YIELD name, state WHERE state <> 'ONLINE' RETURN name, state;
Do NOT use an index until state =
.
Vector / Embedding Property Modeling
Store embeddings on dedicated
nodes, never on business nodes:
(:Document)-[:HAS_CHUNK]->(c:Chunk {text: "...", embedding: [...]})
Rules:
- Chunk node: (source text), (float array), (int)
- Parent document: metadata only (title, url, createdAt)
- Vector index on only
- Chunk size 200–500 tokens with 20% overlap is production default [field]
- Do NOT put embedding on — makes the node too large and pollutes traversal
Anti-Patterns Table
| Anti-pattern | Problem | Fix |
|---|
| Generic labels , | No filtering benefit; all nodes scan | Use domain labels , |
| Generic rel types , | Can't filter by relationship type | Use semantic types , |
| Low-cardinality value as node | Supernode ( → millions of edges) | Use label instead |
| Property as label ( + label both exist) | Inconsistency, duplication | Pick one; prefer label if used in traversal |
| Storing embeddings on business node | Node bloat, slow traversal | Dedicated node |
| MERGE without uniqueness constraint | Duplicate nodes silently created | Add constraint before any MERGE |
| Missing relationship direction meaning | Arbitrary direction; confusing model | Direction = semantic flow of action |
| Junction table modeled as bare property | Loses history and extensibility | Intermediate node with its own properties |
| as property name | Conflicts with driver internal | Use , , |
| All dates as strings | No range queries; no temporal operators | Use Neo4j or type |
Output Format — Schema Assessment
When reviewing an existing model:
## Schema Assessment
### Compliant
- [constraint / pattern that is correct]
### Issues Found
#### [Title] — Severity: ERROR / WARNING / INFO
- **Current**: what the model does
- **Problem**: why it is an issue
- **Fix**: specific Cypher DDL or model change
## Recommended Schema
### Node Labels
- :Label {key: TYPE, prop: TYPE, ...} → constraints: [list]
### Relationships
- (:LabelA)-[:TYPE {prop: TYPE}]->(:LabelB)
### Constraints to Create
[CREATE CONSTRAINT ... statements]
### Indexes to Create
[CREATE INDEX ... statements]
Severity semantics:
| Severity | Meaning | Action |
|---|
| Model correctness failure (duplicates possible, data loss risk) | Stop; fix before proceeding |
| Performance or extensibility risk | Report; ask user before proceeding |
| Style or convention deviation | Surface; continue |
Provenance Labels
- — stated directly in Neo4j docs
- — follows from documented behavior
- — community heuristic; treat as default but validate
Checklist
References
Load on demand: