chroma

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Instructions

操作指南

Before writing any code, gather this information:

在编写任何代码之前，收集以下信息：

Deployment target: Local Chroma or Chroma Cloud?
- If Cloud: they'll need API key, tenant, and database configured
- If Local: determine if they need persistence or ephemeral storage
Search type (Cloud only): Dense only, or hybrid search?
- Dense only: simpler setup, good for most semantic search
- Hybrid (dense + sparse): better for keyword-heavy queries, use SPLADE
Embedding model: Which provider/model?
- Default:
```
@chroma-core/default-embed
```
  (TypeScript) or built-in (Python)
- OpenAI:
```
text-embedding-3-large
```
  is most popular, requires
```
@chroma-core/openai
```
- Ask the user if they have a preference or existing provider
Data structure: What are they indexing?
- Needed to determine chunking strategy
- Needed to design metadata schema for filtering

部署目标：本地Chroma还是Chroma Cloud？
- 如果是Cloud：用户需要配置API密钥、租户和数据库
- 如果是Local：确定用户需要持久化存储还是临时存储
搜索类型（仅Cloud支持）：仅稠密搜索，还是混合搜索？
- 仅稠密搜索：设置更简单，适合大多数语义搜索场景
- 混合搜索（稠密+稀疏）：更适合关键词密集型查询，使用SPLADE
嵌入模型：使用哪个提供商/模型？
- 默认：
```
@chroma-core/default-embed
```
  （TypeScript）或内置模型（Python）
- OpenAI：
```
text-embedding-3-large
```
  是最受欢迎的模型，需要安装
```
@chroma-core/openai
```
- 询问用户是否有偏好或已在使用的提供商
数据结构：用户要索引的内容是什么？
- 这将决定分块策略
- 这将决定用于过滤的元数据架构设计

Decision workflow

决策流程

User wants to add search
Ask Local Chroma or Chroma Cloud?
- Local Chroma
  - Use collection.query() with a dense embedding model
- Chroma Cloud
  - Ask if hybrid search is needed
    - Yes
      - Use Schema() + Search() APIs with SPLADE sparse index
    - No
      - Use collection.query() with a dense embedding model
Ask for which embedding model
Design metadata schema
Implement data sync strategy

用户想要添加搜索功能
询问是使用本地Chroma还是Chroma Cloud？
- 本地Chroma
  - 使用collection.query()搭配稠密嵌入模型
- Chroma Cloud
  - 询问是否需要混合搜索
    - 是
      - 使用Schema() + Search() API搭配SPLADE稀疏索引
    - 否
      - 使用collection.query()搭配稠密嵌入模型
询问使用哪个嵌入模型
设计元数据架构
实现数据同步策略

When to ask questions vs proceed

何时询问问题 vs 直接推进

Ask first:

Embedding model choice (cost and quality implications)
Cloud vs local deployment
Hybrid vs dense-only search
Multi-tenant data isolation strategy

Proceed with sensible defaults:

Use

getOrCreateCollection()

get_or_create_collection()

Use cosine similarity (most common)
Chunk size under 8KB
Store source IDs in metadata for updates/deletes

先询问：

嵌入模型选择（涉及成本和质量影响）
云端部署 vs 本地部署
混合搜索 vs 仅稠密搜索
多租户数据隔离策略

使用合理默认值直接推进：

使用

getOrCreateCollection()

get_or_create_collection()

使用余弦相似度（最常用）
分块大小控制在8KB以下
在元数据中存储源ID以便更新/删除

What to validate

需要验证的内容

Environment variables are set for Cloud deployments
Correct client import (
```
CloudClient
```
vs
```
Client
```
)
Embedding function package is installed (TypeScript)
Schema and Search APIs only used with Cloud
Important:
```
get_or_create_collection()
```
accepts either an
```
embedding_function
```
OR a
```
schema
```
, but not both. Use Schema when you need multiple indexes (hybrid search) or sparse embeddings; use embedding_function for simple dense-only search.

云端部署已设置好环境变量
导入了正确的客户端（
```
CloudClient
```
vs
```
Client
```
）
已安装嵌入函数包（TypeScript）
仅在Cloud环境中使用Schema和Search API
重要提示：
```
get_or_create_collection()
```
可以接受
```
embedding_function
```
或
```
schema
```
，但不能同时接受两者。当需要多个索引（混合搜索）或稀疏嵌入时使用Schema；简单的仅稠密搜索则使用embedding_function。

Quick Start

快速开始

Chroma Cloud Setup (CLI)

Chroma Cloud设置（CLI）

To get started with Chroma Cloud, use the CLI to log in, create a database, and write your credentials to a

.env

file:

bash

chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-file

This writes a

.env

file with

CHROMA_API_KEY

CHROMA_TENANT

, and

CHROMA_DATABASE

to the current directory. The code examples below read from these environment variables.

TypeScript (Chroma Cloud):

typescript

import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new CloudClient({
  apiKey: process.env.CHROMA_API_KEY,
  tenant: process.env.CHROMA_TENANT,
  database: process.env.CHROMA_DATABASE,
});

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// Add documents
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// Query
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});

Python (Chroma Cloud):

python

import os
import chromadb

client = chromadb.CloudClient(
    api_key=os.environ["CHROMA_API_KEY"],
    tenant=os.environ["CHROMA_TENANT"],
    database=os.environ["CHROMA_DATABASE"],
)

collection = client.get_or_create_collection(name="my_collection")

要开始使用Chroma Cloud，使用CLI登录、创建数据库，并将凭据写入

.env

文件：

bash

chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-file

这会在当前目录生成一个包含

CHROMA_API_KEY

、

CHROMA_TENANT

和

CHROMA_DATABASE

的

.env

文件。以下代码示例将从这些环境变量中读取配置。

TypeScript（Chroma Cloud）：

typescript

import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new CloudClient({
  apiKey: process.env.CHROMA_API_KEY,
  tenant: process.env.CHROMA_TENANT,
  database: process.env.CHROMA_DATABASE,
});

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// 添加文档
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// 查询
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});

Python（Chroma Cloud）：

python

import os
import chromadb

client = chromadb.CloudClient(
    api_key=os.environ["CHROMA_API_KEY"],
    tenant=os.environ["CHROMA_TENANT"],
    database=os.environ["CHROMA_DATABASE"],
)

collection = client.get_or_create_collection(name="my_collection")

Add documents

添加文档

collection.add( ids=["doc1", "doc2"], documents=["First document text", "Second document text"], )

Query

查询

results = collection.query( query_texts=["search query"], n_results=5, )

undefined

results = collection.query( query_texts=["search query"], n_results=5, )

undefined

Understanding Chroma

理解Chroma

Chroma is a database. A Chroma database contains collections. A collection contains documents.

Unlike tables in a relational database, collections are created and destroyed at the application level. Each Chroma database can have millions of collections. There may be a collection for each user, or team or organization. Rather than tables be partitioned by some key, the partition in Chroma is the collection.

Collections don't have rows, they have documents, the document is the text data that is to be searched. When data is created or updated, the client will create an embedding of the data. This is done on the client side based on the embedding function(s) provided to the client. To create the embedding the client will use its configuration to call out to the defined embedding model provider via the embedding function. This could happen in process, but overwhelmingly happens on a third party service over HTTP.

There are ways to further partition or filtering data with document metadata. Each document has a key/value object of metadata. keys are strings and values can be strings, ints or booleans. There are a variety of operators on the metadata.

During query time, the query text is embedded using the collection's defined embedding function and then is sent to Chroma with the rest of the query parameters. Chroma will then consider any query parameters like metadata filters to reduce the potential result set, then search for the nearest neighbors using a distance algorithm between the query vector and the index of vectors in the collection that is being queried.

Working with collections is made easy by using the

get_or_create_collection()

(

getOrCreateCollection()

in TypeScript) on the Chroma client, preventing annoying boilerplate code.

Chroma是一个数据库。一个Chroma数据库包含多个集合（collection）。一个集合包含多个文档（document）。

与关系型数据库中的表不同，集合是在应用层面创建和销毁的。每个Chroma数据库可以拥有数百万个集合。可以为每个用户、团队或组织创建一个集合。Chroma中不是通过某个键来分区表，而是通过集合来实现分区。

集合没有行，而是包含文档，文档是要被搜索的文本数据。当创建或更新数据时，客户端会基于提供给客户端的嵌入函数为数据生成嵌入向量。这一过程在客户端侧完成，客户端会根据其配置，通过嵌入函数调用定义好的嵌入模型提供商的服务。这一过程可能在进程内完成，但绝大多数情况下是通过HTTP调用第三方服务完成的。

可以使用文档元数据对数据进行进一步分区或过滤。每个文档都有一个键值对形式的元数据对象。键是字符串，值可以是字符串、整数或布尔值。元数据支持多种操作符。

在查询阶段，查询文本会使用集合定义的嵌入函数生成嵌入向量，然后与其他查询参数一起发送给Chroma。Chroma会先考虑元数据过滤等查询参数来缩小潜在结果集，然后使用距离算法在被查询集合的向量索引中搜索与查询向量最相似的近邻。

通过在Chroma客户端使用

get_or_create_collection()

（TypeScript中为

getOrCreateCollection()

）可以轻松操作集合，避免繁琐的样板代码。

Local vs Cloud

本地部署 vs 云端部署

Chroma can be run locally as a process or can be used in the cloud with Chroma Cloud.

Everything that can be done locally can be done in the cloud, but not everything that can be done in the cloud can be done locally.

The biggest difference to the developer experience is the Schema() and Search() APIs, those are only available on Chroma Cloud.

Otherwise, the only thing that needs to change is the client that is imported from the Chroma package, the interface is the same.

If you're using cloud, you probably want to use the Schema() and Search() APIs.

Also, if the user wants to use cloud, ask them what type of search they want to use. Just dense embeddings, or hybrid. If hybrid, you probably want to use SPLADE as the sparse embedding strategy.

Chroma可以作为进程在本地运行，也可以通过Chroma Cloud在云端使用。

本地能实现的所有功能云端都支持，但云端支持的部分功能本地不具备。

对开发者体验影响最大的差异是Schema()和Search() API，这些仅在Chroma Cloud中可用。

除此之外，唯一需要改变的是从Chroma包中导入的客户端，接口是相同的。

如果使用云端，建议使用Schema()和Search() API。

另外，如果用户想要使用云端，询问他们想要使用哪种搜索类型。仅使用稠密嵌入，还是混合搜索。如果是混合搜索，建议使用SPLADE作为稀疏嵌入策略。

Embeddings

嵌入向量

When working with embedding functions, the default embedding function is available, but it's often not the best option. The recommended option is to use Chroma Cloud Qwen. Typescript:

npm install @chroma-core/chroma-cloud-qwen

, python, included but needs

pip install httpx

In typescript, you need to install a package for each embedding function, install the correct one based on what the user says.

Note that Chroma has server side embedding support for SPLADE and Qwen (via @chroma-core/chroma-cloud-qwen in typescript), all other embedding functions would be external.

使用嵌入函数时，默认嵌入函数是可用的，但通常不是最佳选择。推荐使用Chroma Cloud Qwen。TypeScript：

npm install @chroma-core/chroma-cloud-qwen

，Python：已内置但需要安装

pip install httpx

。

在TypeScript中，每个嵌入函数都需要安装对应的包，根据用户的选择安装正确的包。

注意：Chroma对SPLADE和Qwen支持服务器端嵌入（TypeScript中通过

@chroma-core/chroma-cloud-qwen

），所有其他嵌入函数都是外部的。

Learn More

了解更多

If you need more detailed information about Chroma beyond what's covered in this skill, fetch Chroma's llms.txt for comprehensive documentation: https://docs.trychroma.com/llms.txt

如果您需要超出本技能范围的Chroma详细信息，请获取Chroma的llms.txt以查看完整文档：https://docs.trychroma.com/llms.txt

Available Topics

可用主题

Typescript

Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
Query and Get - Query and Get Data from Chroma Collections
Schema - Schema() configures collections with multiple indexes
Updating and Deleting - Update existing documents and delete data from collections
Error Handling - Handling errors and failures when working with Chroma
Local Chroma - How to run and use local chroma
Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search

Chroma正则过滤 - 学习如何在Chroma查询中使用正则过滤
查询与获取 - 从Chroma集合中查询和获取数据
Schema - Schema()用于为集合配置多个索引
更新与删除 - 更新现有文档并从集合中删除数据
错误处理 - 处理使用Chroma时的错误和故障
本地Chroma - 如何运行和使用本地Chroma
Search() API - 一个用于在集合中执行稠密和稀疏向量搜索以及混合搜索的灵活且富表现力的API

Python

Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
Query and Get - Query and Get Data from Chroma Collections
Schema - Schema() configures collections with multiple indexes
Updating and Deleting - Update existing documents and delete data from collections
Error Handling - Handling errors and failures when working with Chroma
Local Chroma - How to run and use local chroma
Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search

Chroma正则过滤 - 学习如何在Chroma查询中使用正则过滤
查询与获取 - 从Chroma集合中查询和获取数据
Schema - Schema()用于为集合配置多个索引
更新与删除 - 更新现有文档并从集合中删除数据
错误处理 - 处理使用Chroma时的错误和故障
本地Chroma - 如何运行和使用本地Chroma
Search() API - 一个用于在集合中执行稠密和稀疏向量搜索以及混合搜索的灵活且富表现力的API

General

通用主题

Chroma CLI - Getting started with the Chroma CLI for cloud database management
Data Model - An overview of how Chroma stores data
Integrating Chroma into an existing system - Guidance for adding Chroma search to an existing application

Chroma CLI - 开始使用Chroma CLI进行云端数据库管理
数据模型 - Chroma数据存储方式概述
将Chroma集成到现有系统 - 为现有应用添加Chroma搜索功能的指导