chroma
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInstructions
操作指南
Before writing any code, gather this information:
在编写任何代码之前,收集以下信息:
-
Deployment target: Local Chroma or Chroma Cloud?
- If Cloud: they'll need API key, tenant, and database configured
- If Local: determine if they need persistence or ephemeral storage
-
Search type (Cloud only): Dense only, or hybrid search?
- Dense only: simpler setup, good for most semantic search
- Hybrid (dense + sparse): better for keyword-heavy queries, use SPLADE
-
Embedding model: Which provider/model?
- Default: (TypeScript) or built-in (Python)
@chroma-core/default-embed - OpenAI: is most popular, requires
text-embedding-3-large@chroma-core/openai - Ask the user if they have a preference or existing provider
- Default:
-
Data structure: What are they indexing?
- Needed to determine chunking strategy
- Needed to design metadata schema for filtering
-
部署目标:本地Chroma还是Chroma Cloud?
- 如果是Cloud:用户需要配置API密钥、租户和数据库
- 如果是Local:确定用户需要持久化存储还是临时存储
-
搜索类型(仅Cloud支持):仅稠密搜索,还是混合搜索?
- 仅稠密搜索:设置更简单,适合大多数语义搜索场景
- 混合搜索(稠密+稀疏):更适合关键词密集型查询,使用SPLADE
-
嵌入模型:使用哪个提供商/模型?
- 默认:(TypeScript)或内置模型(Python)
@chroma-core/default-embed - OpenAI:是最受欢迎的模型,需要安装
text-embedding-3-large@chroma-core/openai - 询问用户是否有偏好或已在使用的提供商
- 默认:
-
数据结构:用户要索引的内容是什么?
- 这将决定分块策略
- 这将决定用于过滤的元数据架构设计
Decision workflow
决策流程
- User wants to add search
- Ask Local Chroma or Chroma Cloud?
- Local Chroma
- Use collection.query() with a dense embedding model
- Chroma Cloud
- Ask if hybrid search is needed
- Yes
- Use Schema() + Search() APIs with SPLADE sparse index
- No
- Use collection.query() with a dense embedding model
- Yes
- Ask if hybrid search is needed
- Local Chroma
- Ask for which embedding model
- Design metadata schema
- Implement data sync strategy
- 用户想要添加搜索功能
- 询问是使用本地Chroma还是Chroma Cloud?
- 本地Chroma
- 使用collection.query()搭配稠密嵌入模型
- Chroma Cloud
- 询问是否需要混合搜索
- 是
- 使用Schema() + Search() API搭配SPLADE稀疏索引
- 否
- 使用collection.query()搭配稠密嵌入模型
- 是
- 询问是否需要混合搜索
- 本地Chroma
- 询问使用哪个嵌入模型
- 设计元数据架构
- 实现数据同步策略
When to ask questions vs proceed
何时询问问题 vs 直接推进
Ask first:
- Embedding model choice (cost and quality implications)
- Cloud vs local deployment
- Hybrid vs dense-only search
- Multi-tenant data isolation strategy
Proceed with sensible defaults:
- Use /
getOrCreateCollection()get_or_create_collection() - Use cosine similarity (most common)
- Chunk size under 8KB
- Store source IDs in metadata for updates/deletes
先询问:
- 嵌入模型选择(涉及成本和质量影响)
- 云端部署 vs 本地部署
- 混合搜索 vs 仅稠密搜索
- 多租户数据隔离策略
使用合理默认值直接推进:
- 使用/
getOrCreateCollection()get_or_create_collection() - 使用余弦相似度(最常用)
- 分块大小控制在8KB以下
- 在元数据中存储源ID以便更新/删除
What to validate
需要验证的内容
- Environment variables are set for Cloud deployments
- Correct client import (vs
CloudClient)Client - Embedding function package is installed (TypeScript)
- Schema and Search APIs only used with Cloud
- Important: accepts either an
get_or_create_collection()OR aembedding_function, but not both. Use Schema when you need multiple indexes (hybrid search) or sparse embeddings; use embedding_function for simple dense-only search.schema
- 云端部署已设置好环境变量
- 导入了正确的客户端(vs
CloudClient)Client - 已安装嵌入函数包(TypeScript)
- 仅在Cloud环境中使用Schema和Search API
- 重要提示: 可以接受
get_or_create_collection()或embedding_function,但不能同时接受两者。当需要多个索引(混合搜索)或稀疏嵌入时使用Schema;简单的仅稠密搜索则使用embedding_function。schema
Quick Start
快速开始
Chroma Cloud Setup (CLI)
Chroma Cloud设置(CLI)
To get started with Chroma Cloud, use the CLI to log in, create a database, and write your credentials to a file:
.envbash
chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-fileThis writes a file with , , and to the current directory. The code examples below read from these environment variables.
.envCHROMA_API_KEYCHROMA_TENANTCHROMA_DATABASETypeScript (Chroma Cloud):
typescript
import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';
const client = new CloudClient({
apiKey: process.env.CHROMA_API_KEY,
tenant: process.env.CHROMA_TENANT,
database: process.env.CHROMA_DATABASE,
});
const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
name: 'my_collection',
embeddingFunction,
});
// Add documents
await collection.add({
ids: ['doc1', 'doc2'],
documents: ['First document text', 'Second document text'],
});
// Query
const results = await collection.query({
queryTexts: ['search query'],
nResults: 5,
});Python (Chroma Cloud):
python
import os
import chromadb
client = chromadb.CloudClient(
api_key=os.environ["CHROMA_API_KEY"],
tenant=os.environ["CHROMA_TENANT"],
database=os.environ["CHROMA_DATABASE"],
)
collection = client.get_or_create_collection(name="my_collection")要开始使用Chroma Cloud,使用CLI登录、创建数据库,并将凭据写入文件:
.envbash
chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-file这会在当前目录生成一个包含、和的文件。以下代码示例将从这些环境变量中读取配置。
CHROMA_API_KEYCHROMA_TENANTCHROMA_DATABASE.envTypeScript(Chroma Cloud):
typescript
import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';
const client = new CloudClient({
apiKey: process.env.CHROMA_API_KEY,
tenant: process.env.CHROMA_TENANT,
database: process.env.CHROMA_DATABASE,
});
const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
name: 'my_collection',
embeddingFunction,
});
// 添加文档
await collection.add({
ids: ['doc1', 'doc2'],
documents: ['First document text', 'Second document text'],
});
// 查询
const results = await collection.query({
queryTexts: ['search query'],
nResults: 5,
});Python(Chroma Cloud):
python
import os
import chromadb
client = chromadb.CloudClient(
api_key=os.environ["CHROMA_API_KEY"],
tenant=os.environ["CHROMA_TENANT"],
database=os.environ["CHROMA_DATABASE"],
)
collection = client.get_or_create_collection(name="my_collection")Add documents
添加文档
collection.add(
ids=["doc1", "doc2"],
documents=["First document text", "Second document text"],
)
collection.add(
ids=["doc1", "doc2"],
documents=["First document text", "Second document text"],
)
Query
查询
results = collection.query(
query_texts=["search query"],
n_results=5,
)
undefinedresults = collection.query(
query_texts=["search query"],
n_results=5,
)
undefinedUnderstanding Chroma
理解Chroma
Chroma is a database.
A Chroma database contains collections.
A collection contains documents.
Unlike tables in a relational database, collections are created and destroyed at the application level. Each Chroma database can have millions of collections. There may be a collection for each user, or team or organization. Rather than tables be partitioned by some key, the partition in Chroma is the collection.
Collections don't have rows, they have documents, the document is the text data that is to be searched. When data is created or updated, the client will create an embedding of the data. This is done on the client side based on the embedding function(s) provided to the client. To create the embedding the client will use its configuration to call out to the defined embedding model provider via the embedding function. This could happen in process, but overwhelmingly happens on a third party service over HTTP.
There are ways to further partition or filtering data with document metadata. Each document has a key/value object of metadata. keys are strings and values can be strings, ints or booleans. There are a variety of operators on the metadata.
During query time, the query text is embedded using the collection's defined embedding function and then is sent to Chroma with the rest of the query parameters. Chroma will then consider any query parameters like metadata filters to reduce the potential result set, then search for the nearest neighbors using a distance algorithm between the query vector and the index of vectors in the collection that is being queried.
Working with collections is made easy by using the ( in TypeScript) on the Chroma client, preventing annoying boilerplate code.
get_or_create_collection()getOrCreateCollection()Chroma是一个数据库。
一个Chroma数据库包含多个集合(collection)。
一个集合包含多个文档(document)。
与关系型数据库中的表不同,集合是在应用层面创建和销毁的。每个Chroma数据库可以拥有数百万个集合。可以为每个用户、团队或组织创建一个集合。Chroma中不是通过某个键来分区表,而是通过集合来实现分区。
集合没有行,而是包含文档,文档是要被搜索的文本数据。当创建或更新数据时,客户端会基于提供给客户端的嵌入函数为数据生成嵌入向量。这一过程在客户端侧完成,客户端会根据其配置,通过嵌入函数调用定义好的嵌入模型提供商的服务。这一过程可能在进程内完成,但绝大多数情况下是通过HTTP调用第三方服务完成的。
可以使用文档元数据对数据进行进一步分区或过滤。每个文档都有一个键值对形式的元数据对象。键是字符串,值可以是字符串、整数或布尔值。元数据支持多种操作符。
在查询阶段,查询文本会使用集合定义的嵌入函数生成嵌入向量,然后与其他查询参数一起发送给Chroma。Chroma会先考虑元数据过滤等查询参数来缩小潜在结果集,然后使用距离算法在被查询集合的向量索引中搜索与查询向量最相似的近邻。
通过在Chroma客户端使用(TypeScript中为)可以轻松操作集合,避免繁琐的样板代码。
get_or_create_collection()getOrCreateCollection()Local vs Cloud
本地部署 vs 云端部署
Chroma can be run locally as a process or can be used in the cloud with Chroma Cloud.
Everything that can be done locally can be done in the cloud, but not everything that can be done in the cloud can be done locally.
The biggest difference to the developer experience is the Schema() and Search() APIs, those are only available on Chroma Cloud.
Otherwise, the only thing that needs to change is the client that is imported from the Chroma package, the interface is the same.
If you're using cloud, you probably want to use the Schema() and Search() APIs.
Also, if the user wants to use cloud, ask them what type of search they want to use. Just dense embeddings, or hybrid. If hybrid, you probably want to use SPLADE as the sparse embedding strategy.
Chroma可以作为进程在本地运行,也可以通过Chroma Cloud在云端使用。
本地能实现的所有功能云端都支持,但云端支持的部分功能本地不具备。
对开发者体验影响最大的差异是Schema()和Search() API,这些仅在Chroma Cloud中可用。
除此之外,唯一需要改变的是从Chroma包中导入的客户端,接口是相同的。
如果使用云端,建议使用Schema()和Search() API。
另外,如果用户想要使用云端,询问他们想要使用哪种搜索类型。仅使用稠密嵌入,还是混合搜索。如果是混合搜索,建议使用SPLADE作为稀疏嵌入策略。
Embeddings
嵌入向量
When working with embedding functions, the default embedding function is available, but it's often not the best option. The recommended option is to use Chroma Cloud Qwen. Typescript: , python, included but needs .
npm install @chroma-core/chroma-cloud-qwenpip install httpxIn typescript, you need to install a package for each embedding function, install the correct one based on what the user says.
Note that Chroma has server side embedding support for SPLADE and Qwen (via @chroma-core/chroma-cloud-qwen in typescript), all other embedding functions would be external.
使用嵌入函数时,默认嵌入函数是可用的,但通常不是最佳选择。推荐使用Chroma Cloud Qwen。TypeScript:,Python:已内置但需要安装。
npm install @chroma-core/chroma-cloud-qwenpip install httpx在TypeScript中,每个嵌入函数都需要安装对应的包,根据用户的选择安装正确的包。
注意:Chroma对SPLADE和Qwen支持服务器端嵌入(TypeScript中通过),所有其他嵌入函数都是外部的。
@chroma-core/chroma-cloud-qwenLearn More
了解更多
If you need more detailed information about Chroma beyond what's covered in this skill, fetch Chroma's llms.txt for comprehensive documentation: https://docs.trychroma.com/llms.txt
如果您需要超出本技能范围的Chroma详细信息,请获取Chroma的llms.txt以查看完整文档:https://docs.trychroma.com/llms.txt
Available Topics
可用主题
Typescript
Typescript
- Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
- Query and Get - Query and Get Data from Chroma Collections
- Schema - Schema() configures collections with multiple indexes
- Updating and Deleting - Update existing documents and delete data from collections
- Error Handling - Handling errors and failures when working with Chroma
- Local Chroma - How to run and use local chroma
- Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search
- Chroma正则过滤 - 学习如何在Chroma查询中使用正则过滤
- 查询与获取 - 从Chroma集合中查询和获取数据
- Schema - Schema()用于为集合配置多个索引
- 更新与删除 - 更新现有文档并从集合中删除数据
- 错误处理 - 处理使用Chroma时的错误和故障
- 本地Chroma - 如何运行和使用本地Chroma
- Search() API - 一个用于在集合中执行稠密和稀疏向量搜索以及混合搜索的灵活且富表现力的API
Python
Python
- Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
- Query and Get - Query and Get Data from Chroma Collections
- Schema - Schema() configures collections with multiple indexes
- Updating and Deleting - Update existing documents and delete data from collections
- Error Handling - Handling errors and failures when working with Chroma
- Local Chroma - How to run and use local chroma
- Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search
- Chroma正则过滤 - 学习如何在Chroma查询中使用正则过滤
- 查询与获取 - 从Chroma集合中查询和获取数据
- Schema - Schema()用于为集合配置多个索引
- 更新与删除 - 更新现有文档并从集合中删除数据
- 错误处理 - 处理使用Chroma时的错误和故障
- 本地Chroma - 如何运行和使用本地Chroma
- Search() API - 一个用于在集合中执行稠密和稀疏向量搜索以及混合搜索的灵活且富表现力的API
General
通用主题
- Chroma CLI - Getting started with the Chroma CLI for cloud database management
- Data Model - An overview of how Chroma stores data
- Integrating Chroma into an existing system - Guidance for adding Chroma search to an existing application
- Chroma CLI - 开始使用Chroma CLI进行云端数据库管理
- 数据模型 - Chroma数据存储方式概述
- 将Chroma集成到现有系统 - 为现有应用添加Chroma搜索功能的指导