rag

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RAG Implementation

RAG 实现方案

Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.

构建可借助外部知识源扩展AI能力的检索增强生成（RAG）系统。

Overview

概述

RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.

RAG（Retrieval-Augmented Generation，检索增强生成）通过从知识库中检索相关信息并将其融入AI响应，来增强AI应用的能力，减少幻觉并提供准确、有依据的回答。

When to Use

适用场景

Use this skill when:

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling AI systems to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation
Developing knowledge management systems

在以下场景中使用该方案：

针对专有文档构建问答系统
创建具备实时事实信息的聊天机器人
实现基于自然语言查询的语义搜索
通过有依据的响应减少AI幻觉
让AI系统能够访问特定领域的知识库
构建文档助手
创建带来源引用的研究工具
开发知识管理系统

Instructions

操作步骤

Step 1: Choose Vector Database

步骤1：选择向量数据库

Select an appropriate vector database based on your requirements:

For production scalability: Use Pinecone or Milvus
For open-source requirements: Use Weaviate or Qdrant
For local development: Use Chroma or FAISS
For hybrid search needs: Use Weaviate with BM25 support

根据需求选择合适的向量数据库：

面向生产环境可扩展性：使用Pinecone或Milvus
满足开源需求：使用Weaviate或Qdrant
用于本地开发：使用Chroma或FAISS
需要混合搜索：使用支持BM25的Weaviate

Step 2: Select Embedding Model

步骤2：选择嵌入模型

Choose an embedding model based on your use case:

General purpose: text-embedding-ada-002 (OpenAI)
Fast and lightweight: all-MiniLM-L6-v2
Multilingual support: e5-large-v2
Best performance: bge-large-en-v1.5

根据使用场景选择嵌入模型：

通用场景：text-embedding-ada-002（OpenAI）
快速轻量型：all-MiniLM-L6-v2
多语言支持：e5-large-v2
最佳性能：bge-large-en-v1.5

Step 3: Implement Document Processing Pipeline

步骤3：实现文档处理流水线

Load documents from your source (file system, database, API)
Clean and preprocess documents (remove formatting artifacts, normalize text)
Split documents into chunks using appropriate chunking strategy
Generate embeddings for each chunk
Store embeddings in your vector database with metadata

从数据源加载文档（文件系统、数据库、API）
清理和预处理文档（移除格式残留、标准化文本）
使用合适的分块策略将文档拆分为片段
为每个片段生成嵌入向量
将嵌入向量与元数据一起存储到向量数据库

Step 4: Configure Retrieval Strategy

步骤4：配置检索策略

Dense Retrieval: Use semantic similarity via embeddings for most use cases
Hybrid Search: Combine dense + sparse retrieval for better coverage
Metadata Filtering: Add filters based on document attributes
Reranking: Implement cross-encoder reranking for high-precision requirements

密集检索：对于大多数场景，使用基于嵌入向量的语义相似度检索
混合搜索：结合密集检索与稀疏检索以提升覆盖范围
元数据过滤：基于文档属性添加过滤条件
重排序：针对高精度需求，实现基于交叉编码器的重排序

Step 5: Build RAG Pipeline

步骤5：构建RAG流水线

Create content retriever with your embedding store
Configure AI service with retriever and chat memory
Implement prompt template with context injection
Add response validation and grounding checks

基于嵌入存储创建内容检索器
为AI服务配置检索器和对话记忆
实现包含上下文注入的提示词模板
添加响应验证和依据检查

Step 6: Evaluate and Optimize

步骤6：评估与优化

Measure retrieval metrics (precision@k, recall@k, MRR)
Evaluate answer quality (faithfulness, relevance)
Monitor performance and user feedback
Iterate on chunking, retrieval, and prompt parameters

衡量检索指标（precision@k、recall@k、MRR）
评估回答质量（忠实度、相关性）
监控性能和用户反馈
迭代优化分块、检索和提示词参数

Examples

示例

Example 1: Basic Document Q&A System

示例1：基础文档问答系统

java

// Simple RAG setup for document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

String answer = assistant.answer("What is the company policy on remote work?");

java

// Simple RAG setup for document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

String answer = assistant.answer("What is the company policy on remote work?");

Example 2: Metadata-Filtered Retrieval

示例2：元数据过滤检索

java

// RAG with metadata filtering for specific document categories
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

java

// RAG with metadata filtering for specific document categories
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

Example 3: Multi-Source RAG Pipeline

示例3：多源RAG流水线

java

// Combine multiple knowledge sources
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);

List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));

// Rerank and return top results
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);

java

// Combine multiple knowledge sources
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);

List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));

// Rerank and return top results
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);

Example 4: RAG with Chat Memory

示例4：带对话记忆的RAG

java

// Conversational RAG with context retention
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Multi-turn conversation with context
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?");  // Maintains context

Use this skill when:

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling AI systems to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation
Developing knowledge management systems

java

// Conversational RAG with context retention
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Multi-turn conversation with context
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?");  // Maintains context

在以下场景中使用该方案：

针对专有文档构建问答系统
创建具备实时事实信息的聊天机器人
实现基于自然语言查询的语义搜索
通过有依据的响应减少AI幻觉
让AI系统能够访问特定领域的知识库
构建文档助手
创建带来源引用的研究工具
开发知识管理系统

Core Components

核心组件

Vector Databases

向量数据库

Store and efficiently retrieve document embeddings for semantic search.

Key Options:

Pinecone: Managed, scalable, production-ready
Weaviate: Open-source, hybrid search capabilities
Milvus: High performance, on-premise deployment
Chroma: Lightweight, easy local development
Qdrant: Fast, advanced filtering
FAISS: Meta's library, full control

存储并高效检索用于语义搜索的文档嵌入向量。

主要选项：

Pinecone：托管式、可扩展、适用于生产环境
Weaviate：开源、具备混合搜索能力
Milvus：高性能、支持本地部署
Chroma：轻量型、便于本地开发
Qdrant：快速、具备高级过滤功能
FAISS：Meta开源库、完全可控

Embedding Models

嵌入模型

Convert text to numerical vectors for similarity search.

Popular Models:

text-embedding-ada-002 (OpenAI): General purpose, 1536 dimensions
all-MiniLM-L6-v2: Fast, lightweight, 384 dimensions
e5-large-v2: High quality, multilingual
bge-large-en-v1.5: State-of-the-art performance

将文本转换为用于相似度搜索的数值向量。

热门模型：

text-embedding-ada-002（OpenAI）：通用场景、1536维度
all-MiniLM-L6-v2：快速轻量、384维度
e5-large-v2：高质量、支持多语言
bge-large-en-v1.5：当前最优性能

Retrieval Strategies

检索策略

Find relevant content based on user queries.

Approaches:

Dense Retrieval: Semantic similarity via embeddings
Sparse Retrieval: Keyword matching (BM25, TF-IDF)
Hybrid Search: Combine dense + sparse for best results
Multi-Query: Generate multiple query variations
Contextual Compression: Extract only relevant parts

基于用户查询查找相关内容。

常见方式：

密集检索：通过嵌入向量实现语义相似度匹配
稀疏检索：关键词匹配（BM25、TF-IDF）
混合搜索：结合密集与稀疏检索以获得最佳结果
多查询生成：生成多个查询变体
上下文压缩：仅提取相关内容片段

Quick Implementation

快速实现

Basic RAG Setup

基础RAG配置

java

// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);

// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
    .build();

java

// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);

// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
    .build();

Document Processing Pipeline

文档处理流水线

java

// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
    500,  // chunk size
    100   // overlap
);

// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey("your-api-key")
    .build();

// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .database("postgres")
    .user("postgres")
    .password("password")
    .table("embeddings")
    .dimension(1536)
    .build();

// Process and store documents
for (Document document : documents) {
    List<TextSegment> segments = splitter.split(document);
    for (TextSegment segment : segments) {
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }
}

java

// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
    500,  // chunk size
    100   // overlap
);

// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey("your-api-key")
    .build();

// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .database("postgres")
    .user("postgres")
    .password("password")
    .table("embeddings")
    .dimension(1536)
    .build();

// Process and store documents
for (Document document : documents) {
    List<TextSegment> segments = splitter.split(document);
    for (TextSegment segment : segments) {
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }
}

Implementation Patterns

实现模式

Pattern 1: Simple Document Q&A

模式1：基础文档问答

Create a basic Q&A system over your documents.

java

public interface DocumentAssistant {
    String answer(String question);
}

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

创建基于自有文档的基础问答系统。

java

public interface DocumentAssistant {
    String answer(String question);
}

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

Pattern 2: Metadata-Filtered Retrieval

模式2：元数据过滤检索

Filter results based on document metadata.

java

// Add metadata during document loading
Document document = Document.builder()
    .text("Content here")
    .metadata("source", "technical-manual.pdf")
    .metadata("category", "technical")
    .metadata("date", "2024-01-15")
    .build();

// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

基于文档元数据过滤检索结果。

java

// Add metadata during document loading
Document document = Document.builder()
    .text("Content here")
    .metadata("source", "technical-manual.pdf")
    .metadata("category", "technical")
    .metadata("date", "2024-01-15")
    .build();

// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

Pattern 3: Multi-Source Retrieval

模式3：多源检索

Combine results from multiple knowledge sources.

java

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);

// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));

// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);

合并来自多个知识源的检索结果。

java

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);

// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));

// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);

Best Practices

最佳实践

Document Preparation

文档准备

Clean and preprocess documents before ingestion
Remove irrelevant content and formatting artifacts
Standardize document structure for consistent processing
Add relevant metadata for filtering and context

在导入前清理和预处理文档
移除无关内容和格式残留
标准化文档结构以保证处理一致性
添加相关元数据用于过滤和上下文补充

Chunking Strategy

分块策略

Use 500-1000 tokens per chunk for optimal balance
Include 10-20% overlap to preserve context at boundaries
Consider document structure when determining chunk boundaries
Test different chunk sizes for your specific use case

每个分块使用500-1000个token以达到最优平衡
保留10-20%的重叠内容以避免边界上下文丢失
确定分块边界时考虑文档结构
针对特定场景测试不同分块大小

Retrieval Optimization

检索优化

Start with high k values (10-20) then filter/rerank
Use metadata filtering to improve relevance
Combine multiple retrieval strategies for better coverage
Monitor retrieval quality and user feedback

初始使用较大的k值（10-20），再进行过滤/重排序
使用元数据过滤提升相关性
结合多种检索策略以扩大覆盖范围
监控检索质量和用户反馈

Performance Considerations

性能考量

Cache embeddings for frequently accessed content
Use batch processing for document ingestion
Optimize vector store configuration for your scale
Monitor query performance and system resources

为频繁访问的内容缓存嵌入向量
文档导入使用批处理方式
根据规模优化向量存储配置
监控查询性能和系统资源占用

Common Issues and Solutions

常见问题与解决方案

Poor Retrieval Quality

检索质量不佳

Problem: Retrieved documents don't match user queries Solutions:

Improve document preprocessing and cleaning
Adjust chunk size and overlap parameters
Try different embedding models
Use hybrid search combining semantic and keyword matching

问题：检索到的文档与用户查询不匹配 解决方案：

优化文档预处理和清理流程
调整分块大小和重叠参数
尝试不同的嵌入模型
使用结合语义与关键词匹配的混合搜索

Irrelevant Results

结果相关性不足

Problem: Retrieved documents contain relevant information but are not specific enough Solutions:

Add metadata filtering for domain-specific constraints
Implement reranking with cross-encoder models
Use contextual compression to extract relevant parts
Fine-tune retrieval parameters (k values, similarity thresholds)

问题：检索到的文档包含相关信息但不够具体 解决方案：

添加针对特定领域约束的元数据过滤
使用交叉编码器模型实现重排序
采用上下文压缩提取相关片段
微调检索参数（k值、相似度阈值）

Performance Issues

性能问题

Problem: Slow response times during retrieval Solutions:

Optimize vector store configuration and indexing
Implement caching for frequently retrieved content
Use smaller embedding models for faster inference
Consider approximate nearest neighbor algorithms

问题：检索阶段响应缓慢 解决方案：

优化向量存储配置和索引
为频繁检索的内容实现缓存
使用更小的嵌入模型加快推理速度
考虑使用近似最近邻算法

Hallucination Prevention

幻觉预防

Problem: AI generates information not present in retrieved documents Solutions:

Improve prompt engineering to emphasize grounding
Add verification steps to check answer alignment
Include confidence scoring for responses
Implement fact-checking mechanisms

问题：AI生成的信息未出现在检索到的文档中 解决方案：

优化提示词设计以强调依据性
添加验证步骤检查回答与文档的一致性
为响应添加置信度评分
实现事实核查机制

Evaluation Framework

评估框架

Retrieval Metrics

检索指标

Precision@k: Percentage of relevant documents in top-k results
Recall@k: Percentage of all relevant documents found in top-k results
Mean Reciprocal Rank (MRR): Average rank of first relevant result
Normalized Discounted Cumulative Gain (nDCG): Ranking quality metric

Precision@k：前k个结果中相关文档的占比
Recall@k：所有相关文档中在前k个结果中被找到的占比
Mean Reciprocal Rank (MRR)：首个相关结果的平均排名倒数
Normalized Discounted Cumulative Gain (nDCG)：排名质量指标

Answer Quality Metrics

回答质量指标

Faithfulness: Degree to which answers are grounded in retrieved documents
Answer Relevance: How well answers address user questions
Context Recall: Percentage of relevant context used in answers
Context Precision: Percentage of retrieved context that is relevant

忠实度：回答基于检索文档的程度
回答相关性：回答对用户问题的满足程度
上下文召回率：回答中使用到的相关上下文占比
上下文精准度：检索到的上下文中相关内容的占比

User Experience Metrics

用户体验指标

Response Time: Time from query to answer
User Satisfaction: Feedback ratings on answer quality
Task Completion: Rate of successful task completion
Engagement: User interaction patterns with the system

响应时间：从查询到得到回答的时长
用户满意度：对回答质量的反馈评分
任务完成率：成功完成任务的比例
用户参与度：用户与系统的交互模式

Resources

资源

Reference Documentation

参考文档

Vector Database Comparison - Detailed comparison of vector database options
Embedding Models Guide - Model selection and optimization
Retrieval Strategies - Advanced retrieval techniques
Document Chunking - Chunking strategies and best practices
LangChain4j RAG Guide - Official implementation patterns

Vector Database Comparison - 向量数据库详细对比
Embedding Models Guide - 模型选择与优化指南
Retrieval Strategies - 高级检索技术
Document Chunking - 分块策略与最佳实践
LangChain4j RAG Guide - 官方实现模式

Assets

资源文件

```
assets/vector-store-config.yaml
```
- Configuration templates for different vector stores
```
assets/retriever-pipeline.java
```
- Complete RAG pipeline implementation
```
assets/evaluation-metrics.java
```
- Evaluation framework code

```
assets/vector-store-config.yaml
```
- 不同向量存储的配置模板
```
assets/retriever-pipeline.java
```
- 完整RAG流水线实现代码
```
assets/evaluation-metrics.java
```
- 评估框架代码

Constraints and Limitations

约束与限制

Token Limits: Respect model context window limitations
API Rate Limits: Manage external API rate limits and costs
Data Privacy: Ensure compliance with data protection regulations
Resource Requirements: Consider memory and computational requirements
Maintenance: Plan for regular updates and system monitoring

Token限制：遵守模型上下文窗口的token数量限制
API速率限制：管理外部API的速率限制和成本
数据隐私：确保符合数据保护法规
资源需求：考虑内存和计算资源要求
维护成本：规划定期更新和系统监控

Constraints and Warnings

约束与警告

System Constraints

系统约束

Embedding models have maximum token limits per document
Vector databases require proper indexing for performance
Chunk boundaries may lose context for complex documents
Hybrid search requires additional infrastructure components

嵌入模型对单文档有最大token限制
向量数据库需要合理配置索引以保证性能
分块边界可能导致复杂文档的上下文丢失
混合搜索需要额外的基础设施组件

Quality Considerations

质量考量

Retrieval quality depends heavily on chunking strategy
Embedding models may not capture domain-specific semantics
Metadata filtering requires proper document annotation
Reranking adds latency to query responses

检索质量很大程度上取决于分块策略
嵌入模型可能无法捕捉特定领域的语义
元数据过滤需要规范的文档标注
重排序会增加查询响应延迟

Operational Warnings

运营警告

Monitor vector database storage and query performance
Implement proper data backup and recovery procedures
Regular embedding model updates may affect retrieval quality
Document processing pipelines require ongoing maintenance

监控向量数据库的存储和查询性能
实现完善的数据备份与恢复流程
嵌入模型的定期更新可能影响检索质量
文档处理流水线需要持续维护

Security Considerations

安全考量

Secure access to vector databases and embedding services
Implement proper authentication and authorization
Validate and sanitize user inputs
Monitor for abuse and unusual usage patterns
Regular security audits and penetration testing

保障向量数据库和嵌入服务的访问安全
实现完善的身份认证与授权机制
验证并清理用户输入
监控滥用和异常使用模式
定期开展安全审计与渗透测试