llamaindex-development

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LlamaIndex Development

LlamaIndex 开发

You are an expert in LlamaIndex for building RAG (Retrieval-Augmented Generation) applications, data indexing, and LLM-powered applications with Python.

你是LlamaIndex领域的专家，擅长使用Python构建RAG（检索增强生成）应用、数据索引以及基于大语言模型（LLM）的应用。

Key Principles

核心原则

Write concise, technical responses with accurate Python examples
Use functional, declarative programming; avoid classes where possible
Prioritize code quality, maintainability, and performance
Use descriptive variable names that reflect their purpose
Follow PEP 8 style guidelines

撰写简洁、专业的技术回复，附带准确的Python示例
使用函数式、声明式编程；尽可能避免使用类
优先考虑代码质量、可维护性和性能
使用能体现用途的描述性变量名
遵循PEP 8编码风格指南

Code Organization

代码组织

Directory Structure

目录结构

project/
├── data/                 # Source documents and data
├── indexes/              # Persisted index storage
├── loaders/              # Custom document loaders
├── retrievers/           # Custom retriever implementations
├── query_engines/        # Query engine configurations
├── prompts/              # Custom prompt templates
├── transformations/      # Document transformations
├── callbacks/            # Custom callback handlers
├── utils/                # Utility functions
├── tests/                # Test files
└── config/               # Configuration files

project/
├── data/                 # 源文档和数据
├── indexes/              # 持久化索引存储
├── loaders/              # 自定义文档加载器
├── retrievers/           # 自定义检索器实现
├── query_engines/        # 查询引擎配置
├── prompts/              # 自定义提示词模板
├── transformations/      # 文档转换
├── callbacks/            # 自定义回调处理器
├── utils/                # 工具函数
├── tests/                # 测试文件
└── config/               # 配置文件

Naming Conventions

命名规范

Use snake_case for files, functions, and variables
Use PascalCase for classes
Prefix private functions with underscore
Use descriptive names (e.g.,
```
create_vector_index
```
,
```
build_query_engine
```
)

文件、函数和变量使用蛇形命名法（snake_case）
类使用帕斯卡命名法（PascalCase）
私有函数以下划线开头
使用描述性名称（例如：
```
create_vector_index
```
,
```
build_query_engine
```
）

Document Loading

文档加载

Using Document Loaders

使用文档加载器

python

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader

python

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader

Load from directory

从目录加载

documents = SimpleDirectoryReader( input_dir="./data", recursive=True, required_exts=[".pdf", ".txt", ".md"] ).load_data()

Load specific file types

加载特定文件类型

pdf_reader = PDFReader() documents = pdf_reader.load_data(file="document.pdf")

undefined

pdf_reader = PDFReader() documents = pdf_reader.load_data(file="document.pdf")

undefined

Custom Loaders

自定义加载器

python

from llama_index.core.readers.base import BaseReader
from llama_index.core import Document

class CustomLoader(BaseReader):
    def load_data(self, file_path: str) -> list[Document]:
        # Custom loading logic
        with open(file_path, 'r') as f:
            content = f.read()

        return [Document(
            text=content,
            metadata={"source": file_path}
        )]

python

from llama_index.core.readers.base import BaseReader
from llama_index.core import Document

class CustomLoader(BaseReader):
    def load_data(self, file_path: str) -> list[Document]:
        # 自定义加载逻辑
        with open(file_path, 'r') as f:
            content = f.read()

        return [Document(
            text=content,
            metadata={"source": file_path}
        )]

Text Splitting and Processing

文本拆分与处理

Node Parsing

节点解析

python

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser
)

python

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser
)

Simple sentence splitting

简单句子拆分

splitter = SentenceSplitter( chunk_size=1024, chunk_overlap=200 ) nodes = splitter.get_nodes_from_documents(documents)

Semantic splitting (preserves meaning)

语义拆分（保留语义）

from llama_index.embeddings.openai import OpenAIEmbedding

semantic_splitter = SemanticSplitterNodeParser( embed_model=OpenAIEmbedding(), breakpoint_percentile_threshold=95 )

from llama_index.embeddings.openai import OpenAIEmbedding

semantic_splitter = SemanticSplitterNodeParser( embed_model=OpenAIEmbedding(), breakpoint_percentile_threshold=95 )

Markdown-aware splitting

支持Markdown的拆分

markdown_splitter = MarkdownNodeParser()

undefined

markdown_splitter = MarkdownNodeParser()

undefined

Best Practices for Chunking

分块最佳实践

Choose chunk size based on your embedding model's context window
Use overlap to maintain context between chunks
Preserve document structure when possible
Include metadata for filtering and retrieval
Use semantic splitting for better coherence

根据嵌入模型的上下文窗口选择分块大小
使用重叠区域来维持块之间的上下文关联
尽可能保留文档结构
包含元数据用于过滤和检索
使用语义拆分以获得更好的连贯性

Vector Stores and Indexing

向量存储与索引

Creating Indexes

创建索引

python

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

python

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

In-memory index

内存索引

index = VectorStoreIndex.from_documents(documents)

With persistent vector store

带持久化向量存储的索引

chroma_client = chromadb.PersistentClient(path="./chroma_db") chroma_collection = chroma_client.get_or_create_collection("my_collection")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

undefined

chroma_client = chromadb.PersistentClient(path="./chroma_db") chroma_collection = chroma_client.get_or_create_collection("my_collection")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents( documents, storage_context=storage_context )

undefined

Supported Vector Stores

支持的向量存储

Chroma (local development)
Pinecone (production, managed)
Weaviate (production, self-hosted or managed)
Qdrant (production, self-hosted or managed)
PostgreSQL with pgvector
MongoDB Atlas Vector Search

Chroma（本地开发）
Pinecone（生产环境，托管式）
Weaviate（生产环境，自托管或托管式）
Qdrant（生产环境，自托管或托管式）
带pgvector的PostgreSQL
MongoDB Atlas向量搜索

Index Persistence

索引持久化

python

from llama_index.core import StorageContext, load_index_from_storage

python

from llama_index.core import StorageContext, load_index_from_storage

Persist index

持久化索引

index.storage_context.persist(persist_dir="./storage")

Load index

加载索引

storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)

undefined

storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)

undefined

Query Engines

查询引擎

Basic Query Engine

基础查询引擎

python

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

response = query_engine.query("What is the main topic?")
print(response.response)

python

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

response = query_engine.query("What is the main topic?")
print(response.response)

Response Modes

响应模式

```
refine
```
: Iteratively refine answer through each node
```
compact
```
: Combine chunks before sending to LLM
```
tree_summarize
```
: Build tree and summarize
```
simple_summarize
```
: Truncate and summarize
```
accumulate
```
: Accumulate responses from each node

```
refine
```
: 遍历每个节点迭代优化答案
```
compact
```
: 在发送给LLM前合并分块内容
```
tree_summarize
```
: 构建树状结构并总结
```
simple_summarize
```
: 截断内容并总结
```
accumulate
```
: 累加每个节点的响应结果

Advanced Query Engine

高级查询引擎

python

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = RetrieverQueryEngine.from_args(
    retriever=index.as_retriever(similarity_top_k=10),
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ],
    response_mode="compact"
)

python

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = RetrieverQueryEngine.from_args(
    retriever=index.as_retriever(similarity_top_k=10),
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ],
    response_mode="compact"
)

Retrievers

检索器

Custom Retrievers

自定义检索器

python

from llama_index.core.retrievers import VectorIndexRetriever

python

from llama_index.core.retrievers import VectorIndexRetriever

Basic retriever

基础检索器

retriever = VectorIndexRetriever( index=index, similarity_top_k=10 )

Retrieve nodes

检索节点

nodes = retriever.retrieve("search query")

undefined

nodes = retriever.retrieve("search query")

undefined

Hybrid Search

混合搜索

python

from llama_index.core.retrievers import QueryFusionRetriever

python

from llama_index.core.retrievers import QueryFusionRetriever

Combine multiple retrieval strategies

组合多种检索策略

retriever = QueryFusionRetriever( [ index.as_retriever(similarity_top_k=5), bm25_retriever, # Keyword-based ], num_queries=4, use_async=True )

undefined

retriever = QueryFusionRetriever( [ index.as_retriever(similarity_top_k=5), bm25_retriever, # 基于关键词的检索器 ], num_queries=4, use_async=True )

undefined

Embeddings

嵌入模型

Embedding Models

嵌入模型配置

python

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

python

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

OpenAI embeddings

OpenAI嵌入模型

Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small", dimensions=512 # Optional dimension reduction )

Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small", dimensions=512 # 可选的维度缩减 )

Local embeddings

本地嵌入模型

Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" )

undefined

Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" )

undefined

LLM Configuration

LLM配置

Setting Up LLMs

LLM设置

python

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

python

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

OpenAI

Settings.llm = OpenAI( model="gpt-4o", temperature=0.1 )

Anthropic

Settings.llm = Anthropic( model="claude-sonnet-4-20250514", temperature=0.1 )

undefined

Settings.llm = Anthropic( model="claude-sonnet-4-20250514", temperature=0.1 )

undefined

Agents

智能体（Agents）

Building Agents

构建智能体

python

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

python

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

Create tools from query engines

从查询引擎创建工具

tools = [ QueryEngineTool( query_engine=documents_query_engine, metadata=ToolMetadata( name="documents", description="Search through documents" ) ), QueryEngineTool( query_engine=code_query_engine, metadata=ToolMetadata( name="codebase", description="Search through code" ) ) ]

Create agent

创建智能体

agent = ReActAgent.from_tools( tools, llm=llm, verbose=True )

response = agent.chat("Find information about X")

undefined

agent = ReActAgent.from_tools( tools, llm=llm, verbose=True )

response = agent.chat("Find information about X")

undefined

Performance Optimization

性能优化

Caching

缓存

python

from llama_index.core import Settings
from llama_index.core.llms import LLMCache

python

from llama_index.core import Settings
from llama_index.core.llms import LLMCache

Enable LLM response caching

启用LLM响应缓存

Settings.llm = OpenAI(model="gpt-4o") Settings.llm_cache = LLMCache()

undefined

Settings.llm = OpenAI(model="gpt-4o") Settings.llm_cache = LLMCache()

undefined

Async Operations

异步操作

python

undefined

python

undefined

Use async for better performance

使用异步提升性能

response = await query_engine.aquery("question")

Batch processing

批量处理

responses = await asyncio.gather(*[ query_engine.aquery(q) for q in questions ])

undefined

responses = await asyncio.gather(*[ query_engine.aquery(q) for q in questions ])

undefined

Embedding Optimization

嵌入优化

Batch embeddings when possible
Use smaller embedding dimensions when accuracy allows
Cache embeddings for repeated documents
Use local models for cost-sensitive applications

尽可能使用批量嵌入
在精度允许的情况下使用更小的嵌入维度
为重复文档缓存嵌入结果
对成本敏感的应用使用本地模型

Error Handling

错误处理

python

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

python

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

Debug handler for troubleshooting

用于故障排查的调试处理器

debug_handler = LlamaDebugHandler() callback_manager = CallbackManager([debug_handler])

Settings.callback_manager = callback_manager

undefined

debug_handler = LlamaDebugHandler() callback_manager = CallbackManager([debug_handler])

Settings.callback_manager = callback_manager

undefined

Testing

测试

Unit test document loaders and transformations
Test retrieval quality with known queries
Validate index persistence and loading
Test query engine responses
Monitor retrieval metrics (precision, recall)

对文档加载器和转换逻辑进行单元测试
使用已知查询测试检索质量
验证索引的持久化与加载功能
测试查询引擎的响应结果
监控检索指标（精确率、召回率）

Dependencies

依赖项

llama-index
llama-index-embeddings-openai
llama-index-llms-openai
llama-index-vector-stores-chroma
chromadb
python-dotenv
pydantic

llama-index
llama-index-embeddings-openai
llama-index-llms-openai
llama-index-vector-stores-chroma
chromadb
python-dotenv
pydantic