neo4j-graphrag-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNeo4j GraphRAG Skill
Neo4j GraphRAG 技能指南
When to Use
适用场景
- Building GraphRAG retrieval pipelines with Python package
neo4j-graphrag - Choosing between VectorRetriever, HybridRetriever, VectorCypherRetriever, HybridCypherRetriever
- Writing Cypher fragments that traverse the graph after vector lookup
retrieval_query - Wiring retriever + LLM into a pipeline
GraphRAG - Debugging low retrieval quality (when to use graph traversal vs plain vector)
- Integrating Neo4j with LangChain (), LlamaIndex, or Haystack
langchain-neo4j
- 使用Python包构建GraphRAG检索管道
neo4j-graphrag - 在VectorRetriever、HybridRetriever、VectorCypherRetriever、HybridCypherRetriever之间选择合适的检索器
- 编写向量查找后遍历图谱的Cypher片段
retrieval_query - 将检索器与LLM集成到管道中
GraphRAG - 调试检索质量低下的问题(判断何时使用图谱遍历而非纯向量检索)
- 将Neo4j与LangChain()、LlamaIndex或Haystack集成
langchain-neo4j
When NOT to Use
不适用场景
- KG construction from documents →
neo4j-document-import-skill - Plain vector/semantic search without graph traversal →
neo4j-vector-index-skill - GDS algorithms (PageRank, Louvain, node embeddings) →
neo4j-gds-skill - Agent long-term memory →
neo4j-agent-memory-skill - Writing raw Cypher queries →
neo4j-cypher-skill
- 从文档构建知识图谱(KG) → 使用
neo4j-document-import-skill - 无图谱遍历的纯向量/语义搜索 → 使用
neo4j-vector-index-skill - GDS算法(PageRank、Louvain、节点嵌入) → 使用
neo4j-gds-skill - Agent长期内存 → 使用
neo4j-agent-memory-skill - 编写原生Cypher查询 → 使用
neo4j-cypher-skill
Step 1 — Install
步骤1 — 安装
bash
pip install neo4j-graphragbash
pip install neo4j-graphragLLM/embedder extras (choose one or more):
LLM/嵌入器扩展包(选择一个或多个):
pip install neo4j-graphrag[openai] # OpenAI + AzureOpenAI
pip install neo4j-graphrag[google] # VertexAI
pip install neo4j-graphrag[anthropic] # Anthropic
pip install neo4j-graphrag[ollama] # Ollama (local)
pip install neo4j-graphrag[cohere] # Cohere
pip install neo4j-graphrag[sentence-transformers] # local embeddings
pip install neo4j-graphrag[openai] # OpenAI + AzureOpenAI
pip install neo4j-graphrag[google] # VertexAI
pip install neo4j-graphrag[anthropic] # Anthropic
pip install neo4j-graphrag[ollama] # Ollama(本地部署)
pip install neo4j-graphrag[cohere] # Cohere
pip install neo4j-graphrag[sentence-transformers] # 本地嵌入模型
BREAKING: old package neo4j-genai
is deprecated — imports also changed:
neo4j-genai重大变更:旧包neo4j-genai
已弃用——导入路径也已更改:
neo4j-genaipip uninstall neo4j-genai
pip uninstall neo4j-genai
neo4j_genai.retrievers → neo4j_graphrag.retrievers
neo4j_genai.retrievers → neo4j_graphrag.retrievers
neo4j_genai.generation → neo4j_graphrag.generation
neo4j_genai.generation → neo4j_graphrag.generation
Requires: Python ≥ 3.10, Neo4j ≥ 5.18.1 or Aura ≥ 5.18.0.
---
要求:Python ≥ 3.10,Neo4j ≥ 5.18.1 或 Aura ≥ 5.18.0。
---Step 2 — Choose Retriever
步骤2 — 选择检索器
Has fulltext index? YES → Hybrid variants (better recall)
NO → Vector variants (baseline)
Needs graph context after vector lookup? YES → Cypher variants
NO → plain variants
For natural-language-to-Cypher? → Text2CypherRetriever (no embedder needed)
For multi-tool LLM routing? → ToolsRetriever
Using external vector DB? → WeaviateNeo4jRetriever / PineconeNeo4jRetriever / QdrantNeo4jRetriever| Retriever | Vector | Fulltext | Graph | When to use |
|---|---|---|---|---|
| ✓ | — | — | Baseline; quick start |
| ✓ | ✓ | — | Better recall; no graph context |
| ✓ | — | ✓ | GraphRAG without fulltext |
| ✓ | ✓ | ✓ | Production GraphRAG — default choice |
| — | — | ✓ | LLM generates Cypher; no embedder |
| varies | varies | varies | Multi-retriever LLM routing |
是否有全文索引?是 → 混合变体(召回率更高)
否 → 向量变体(基线版本)
向量查找后是否需要图谱上下文?是 → Cypher变体
否 → 普通变体
需要自然语言转Cypher? → Text2CypherRetriever(无需嵌入器)
需要多工具LLM路由? → ToolsRetriever
使用外部向量数据库? → WeaviateNeo4jRetriever / PineconeNeo4jRetriever / QdrantNeo4jRetriever| 检索器 | 向量检索 | 全文检索 | 图谱遍历 | 使用场景 |
|---|---|---|---|---|
| ✓ | — | — | 基线版本;快速入门 |
| ✓ | ✓ | — | 召回率更高;无需图谱上下文 |
| ✓ | — | ✓ | 无全文检索的GraphRAG |
| ✓ | ✓ | ✓ | 生产级GraphRAG——默认选择 |
| — | — | ✓ | LLM生成Cypher;无需嵌入器 |
| 视情况而定 | 视情况而定 | 视情况而定 | 多检索器LLM路由 |
Step 3 — Create Indexes (run once)
步骤3 — 创建索引(仅需运行一次)
cypher
// Vector index (all retrievers need this)
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
} };
// Fulltext index (Hybrid retrievers only)
CREATE FULLTEXT INDEX chunk_fulltext IF NOT EXISTS
FOR (c:Chunk) ON EACH [c.text];
// Confirm ONLINE before ingesting:
SHOW INDEXES YIELD name, state
WHERE name IN ['chunk_embedding', 'chunk_fulltext']
RETURN name, state;
// Both must show state = 'ONLINE'If index not ONLINE: wait, poll every 5s. Do NOT start ingestion until ONLINE.
cypher
// 向量索引(所有检索器都需要)
CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
} };
// 全文索引(仅混合检索器需要)
CREATE FULLTEXT INDEX chunk_fulltext IF NOT EXISTS
FOR (c:Chunk) ON EACH [c.text];
// 确认索引处于ONLINE状态后再进行数据导入:
SHOW INDEXES YIELD name, state
WHERE name IN ['chunk_embedding', 'chunk_fulltext']
RETURN name, state;
// 两个索引的状态都必须显示为'ONLINE'如果索引未处于ONLINE状态:请等待,每5秒检查一次。在索引变为ONLINE前不要开始数据导入。
Step 4 — Core Pattern (HybridCypherRetriever)
步骤4 — 核心模式(HybridCypherRetriever)
python
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM
driver = GraphDatabase.driver("neo4j+s://<host>:7687", auth=("neo4j", "<password>"))
embedder = OpenAIEmbeddings(model="text-embedding-3-small") # 1536 dims — match indexpython
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import HybridCypherRetriever
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.llm import OpenAILLM
driver = GraphDatabase.driver("neo4j+s://<host>:7687", auth=("neo4j", "<password>"))
embedder = OpenAIEmbeddings(model="text-embedding-3-small") # 1536维度——需与索引匹配
// retrieval_query:向量查找后执行的Cypher片段。
// `node` = 向量索引匹配到的节点 (自动注入——请勿声明)
// `score` = 相似度浮点值 (自动注入——请勿声明)
// 必须包含RETURN子句,且必须返回`score`列。
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)
OPTIONAL MATCH (article)-[:MENTIONS]->(org:Organization)
RETURN node.text AS chunk_text,
article.title AS article_title,
collect(DISTINCT org.name) AS mentioned_organizations,
score
"""
retriever = HybridCypherRetriever(
driver=driver,
vector_index_name="chunk_embedding",
fulltext_index_name="chunk_fulltext",
retrieval_query=retrieval_query,
embedder=embedder,
)
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)
response = rag.search(query_text="Who does Alice work for?", retriever_config={"top_k": 5})
print(response.answer)retrieval_query: Cypher fragment executed after vector lookup.
步骤5 — query_params(参数化retrieval_query)
node
= matched node from vector index (AUTO-INJECTED — do NOT declare)
node—
score
= similarity float (AUTO-INJECTED — do NOT declare)
score—
MUST include RETURN clause. MUST return score
column.
score—
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)
OPTIONAL MATCH (article)-[:MENTIONS]->(org:Organization)
RETURN node.text AS chunk_text,
article.title AS article_title,
collect(DISTINCT org.name) AS mentioned_organizations,
score
"""
retriever = HybridCypherRetriever(
driver=driver,
vector_index_name="chunk_embedding",
fulltext_index_name="chunk_fulltext",
retrieval_query=retrieval_query,
embedder=embedder,
)
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)
response = rag.search(query_text="Who does Alice work for?", retriever_config={"top_k": 5})
print(response.answer)
---通过将运行时参数传入:
retriever_configretrieval_querypython
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)-[:MENTIONS]->(org:Organization)
WHERE org.name = $entity_name
RETURN node.text AS chunk_text, article.title AS title, score
"""
retriever = VectorCypherRetriever(
driver=driver,
index_name="chunk_embedding",
retrieval_query=retrieval_query,
embedder=embedder,
)
// 在每次搜索时通过retriever_config传递query_params:
response = rag.search(
query_text="What happened at Apple?",
retriever_config={"top_k": 10, "query_params": {"entity_name": "Apple"}},
)
// 直接调用检索器(无需GraphRAG包装器):
results = retriever.search(
query_text="What happened at Apple?",
top_k=10,
query_params={"entity_name": "Apple"},
)Step 5 — query_params (Parameterized retrieval_query)
步骤6 — 过滤器(向量搜索前预过滤)
Pass runtime parameters into via :
retrieval_queryretriever_configpython
retrieval_query = """
MATCH (node)<-[:HAS_CHUNK]-(article:Article)-[:MENTIONS]->(org:Organization)
WHERE org.name = $entity_name
RETURN node.text AS chunk_text, article.title AS title, score
"""
retriever = VectorCypherRetriever(
driver=driver,
index_name="chunk_embedding",
retrieval_query=retrieval_query,
embedder=embedder,
)python
// 过滤器会在向量相似度排序前缩小候选池范围
results = retriever.search(
query_text="quarterly results",
top_k=5,
filters={"date": {"$gte": "2024-01-01"}},
)
// 支持的操作符:$eq $ne $lt $lte $gt $gte $between $in $like $ilikePass query_params inside retriever_config on each search:
步骤7 — VectorRetriever(return_properties)
response = rag.search(
query_text="What happened at Apple?",
retriever_config={"top_k": 10, "query_params": {"entity_name": "Apple"}},
)
python
from neo4j_graphrag.retrievers import VectorRetriever
retriever = VectorRetriever(
driver=driver,
index_name="chunk_embedding",
embedder=embedder,
return_properties=["text", "source", "page_number"], // 节点属性的子集
)
// 无需retrieval_query——直接返回节点属性Direct retriever call (without GraphRAG wrapper):
步骤8 — Text2CypherRetriever(无需嵌入器)
results = retriever.search(
query_text="What happened at Apple?",
top_k=10,
query_params={"entity_name": "Apple"},
)
---python
from neo4j_graphrag.retrievers import Text2CypherRetriever
// LLM从自然语言生成Cypher;无需向量索引
retriever = Text2CypherRetriever(
driver=driver,
llm=OpenAILLM(model_name="gpt-4o"),
neo4j_schema=None, // 从数据库自动获取;或传入字符串
examples=["Q: Who works at Neo4j? A: MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Neo4j'}) RETURN p.name"],
)
results = retriever.search(query_text="Which people work at Neo4j?")如果:检索器会自动获取模式。对于大型模式,传入精简后的字符串以减少LLM提示词长度。
neo4j_schema=NoneStep 6 — Filters (Pre-filter before vector search)
步骤9 — 自定义提示词模板
python
undefinedpython
from neo4j_graphrag.generation.prompts import RagTemplate
custom_template = RagTemplate(
template="""仅使用以下上下文回答问题。
Context: {context}
Question: {query_text}
Answer:""",
expected_inputs=["context", "query_text"],
)
rag = GraphRAG(retriever=retriever, llm=llm, prompt_template=custom_template)Filter reduces candidate pool BEFORE vector similarity ranking
常见错误
results = retriever.search(
query_text="quarterly results",
top_k=5,
filters={"date": {"$gte": "2024-01-01"}},
)
| 错误 | 原因 | 修复方案 |
|---|---|---|
| 安装了旧版本包 | 执行 |
| 缺少 | 在Cypher前添加 |
结果中出现 | | 在每个 |
| 在Cypher中声明了 | 删除该声明—— |
| retrieval_query中使用了错误的变量名 | 必须使用小写的 |
| 嵌入维度不匹配 | 创建索引时使用了不同的维度 | 删除索引,使用正确的 |
| 索引名称拼写错误或索引未处于ONLINE状态 | 执行 |
| 混合搜索召回率低 | 全文索引未关联正确的属性 | 全文索引必须覆盖retrieval_query中 |
| 语料库较大且包含大量实体 | 初始测试时设置 |
| 调用 | 使用 |
| 管道运行后知识图谱为空 | | 临时设置 |
Supported operators: $eq $ne $lt $lte $gt $gte $between $in $like $ilike
嵌入器快速参考
---python
from neo4j_graphrag.embeddings import (
OpenAIEmbeddings, // OpenAI text-embedding-3-*系列
AzureOpenAIEmbeddings, // Azure托管的OpenAI
VertexAIEmbeddings, // Google Vertex AI
MistralAIEmbeddings, // Mistral
CohereEmbeddings, // Cohere embed-v3
OllamaEmbeddings, // 通过Ollama本地部署
SentenceTransformerEmbeddings, // 本地HuggingFace模型
)
// 维度映射(必须与向量索引匹配):
// text-embedding-3-small → 1536
// text-embedding-3-large → 3072
// text-embedding-ada-002 → 1536
// all-MiniLM-L6-v2 → 384所有嵌入器均包含自动限流功能,采用指数退避策略。
Step 7 — VectorRetriever (return_properties)
LLM快速参考
python
from neo4j_graphrag.retrievers import VectorRetriever
retriever = VectorRetriever(
driver=driver,
index_name="chunk_embedding",
embedder=embedder,
return_properties=["text", "source", "page_number"], # subset of node props
)python
from neo4j_graphrag.llm import (
OpenAILLM,
AzureOpenAILLM,
AnthropicLLM,
VertexAILLM,
MistralAILLM,
CohereLLM,
OllamaLLM,
)
// GraphRAG也接受任何LangChain聊天模型No retrieval_query needed — returns node properties directly
GraphRAG.search()完整签名
---python
response = rag.search(
query_text="...",
retriever_config={
"top_k": 5, // 每次搜索的候选数量(默认5)
"query_params": {...}, // 传递给retrieval_query Cypher的参数
"filters": {...}, // 向量搜索前的预过滤条件
},
return_context=False, // 设为True:在响应中包含检索到的文本块
response_fallback="No context found.", // 检索器无结果时返回的内容
)
// response.answer → 字符串类型
// response.retriever_result → RawSearchResult类型(当return_context=True时返回)Step 8 — Text2CypherRetriever (no embedder)
故障排查
python
from neo4j_graphrag.retrievers import Text2CypherRetriever- 检索返回0结果:直接调用(跳过LLM);检查
retriever.search()、索引名称、嵌入维度top_k - LLM产生幻觉:减小,优化
top_k以返回更具体的上下文retrieval_query - 查询缓慢:在中对耗时的扩展操作添加
retrieval_query;使用LIMIT预减少候选数量filters - 嵌入维度不匹配:执行——检查
SHOW INDEXES YIELD name, optionsvector.dimensions
LLM generates Cypher from natural language; no vector index needed
参考资料
retriever = Text2CypherRetriever(
driver=driver,
llm=OpenAILLM(model_name="gpt-4o"),
neo4j_schema=None, # auto-fetched from db; or pass string
examples=["Q: Who works at Neo4j? A: MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Neo4j'}) RETURN p.name"],
)
results = retriever.search(query_text="Which people work at Neo4j?")
If `neo4j_schema=None`: retriever fetches schema automatically. For large schemas, pass a trimmed string to reduce LLM prompt size.
---- references/retrievers.md — 完整的检索器API、所有构造函数参数、result_formatter、ToolsRetriever、外部数据库检索器
- GraphRAG Python文档
- neo4j-graphrag GitHub仓库
Step 9 — Custom Prompt Template
检查清单
python
from neo4j_graphrag.generation.prompts import RagTemplate
custom_template = RagTemplate(
template="""Answer the question using ONLY the context below.
Context: {context}
Question: {query_text}
Answer:""",
expected_inputs=["context", "query_text"],
)
rag = GraphRAG(retriever=retriever, llm=llm, prompt_template=custom_template)- 已卸载;已安装
neo4j-genai;已更新导入路径neo4j-graphrag - 向量索引在导入数据或查询前已处于ONLINE状态
- 使用混合检索器时,全文索引已处于ONLINE状态
- 嵌入维度与索引配置中的匹配
vector.dimensions - 的RETURN子句包含
retrieval_query和node(两者均为必填)score - 中未重新声明
retrieval_query和node——它们是自动注入的score - 通过
query_params或直接retriever_config参数传递retriever.search() - 在中设置了
rag.search()(默认5)retriever_config={"top_k": N} - 凭证存储在环境变量中;从未硬编码
Common Errors
—
| Error | Cause | Fix |
|---|---|---|
| Old package installed | |
| Missing | Add |
| | Add |
| Declared | Remove it — |
| Wrong variable name in retrieval_query | Use exactly |
| Embedding dimension mismatch | Index created with different dims | Drop index, recreate with correct |
| Index name typo or index not ONLINE | |
| Low recall on hybrid search | Fulltext index not on right property | Fulltext index must cover same property as |
| Large corpus with many entities | Set |
| Calling | Wrap in |
| Empty KG after pipeline run | | Temporarily set |
—
Embedder Quick Reference
—
python
from neo4j_graphrag.embeddings import (
OpenAIEmbeddings, # OpenAI text-embedding-3-*
AzureOpenAIEmbeddings, # Azure-hosted OpenAI
VertexAIEmbeddings, # Google Vertex AI
MistralAIEmbeddings, # Mistral
CohereEmbeddings, # Cohere embed-v3
OllamaEmbeddings, # Local via Ollama
SentenceTransformerEmbeddings, # Local HuggingFace
)—
Dimension mapping (must match vector index):
—
text-embedding-3-small → 1536
—
text-embedding-3-large → 3072
—
text-embedding-ada-002 → 1536
—
all-MiniLM-L6-v2 → 384
—
All embedders include automatic rate limiting with exponential backoff.
---—
LLM Quick Reference
—
python
from neo4j_graphrag.llm import (
OpenAILLM,
AzureOpenAILLM,
AnthropicLLM,
VertexAILLM,
MistralAILLM,
CohereLLM,
OllamaLLM,
)—
Any LangChain chat model also accepted by GraphRAG
—
---—
GraphRAG.search() Full Signature
—
python
response = rag.search(
query_text="...",
retriever_config={
"top_k": 5, # candidates per search (default 5)
"query_params": {...}, # passed to retrieval_query Cypher
"filters": {...}, # pre-filter before vector search
},
return_context=False, # True: include retrieved chunks in response
response_fallback="No context found.", # returned when retriever yields nothing
)—
response.answer → str
—
response.retriever_result → RawSearchResult (if return_context=True)
—
---—
Failure Recovery
—
- 0 results from retrieval: run directly (skip LLM); check
retriever.search(), index name, embedding dimstop_k - LLM hallucinating: reduce , improve
top_kto return more specific contextretrieval_query - Slow queries: add inside
LIMITon expensive expansions; useretrieval_queryto pre-reduce candidatesfilters - Embedding dimension mismatch: — check
SHOW INDEXES YIELD name, optionsvector.dimensions
—
References
—
- references/retrievers.md — full retriever API, all constructor params, result_formatter, ToolsRetriever, external DB retrievers
- GraphRAG Python Docs
- neo4j-graphrag GitHub
—
Checklist
—
- uninstalled;
neo4j-genaiinstalled; import paths updatedneo4j-graphrag - Vector index ONLINE before ingesting or querying
- Fulltext index ONLINE if using Hybrid retriever
- Embedding dims match in index config
vector.dimensions - includes
retrieval_queryandnodein RETURN clause (both required)score - and
nodeNOT re-declared inscore— auto-injectedretrieval_query - passed via
query_paramsor directretriever_configargretriever.search() - set on
retriever_config={"top_k": N}(default 5)rag.search() - Credentials in env vars; never hardcoded
—