chroma-local

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Instructions

说明

Determine these before writing code. Prefer discovering them from the repo and the user request. Ask only when the choice materially changes the implementation.
  1. Runtime shape
    • Are they connecting to a running local server, embedding Chroma into tests, or setting up local development from scratch?
    • Decide whether they need
      chroma run
      , a Docker or service command,
      HttpClient
      or
      ChromaClient
      , or Python
      EphemeralClient
      .
  2. Persistence
    • Persistent local data: choose an intentional data path.
    • Disposable test data: use defaults or a temp directory.
  3. Embedding model
    • Reuse the app's existing embedding provider when possible.
    • Otherwise default to
      @chroma-core/default-embed
      in TypeScript or the standard local default in Python.
    • If the user explicitly wants OpenAI embeddings in TypeScript, install and use
      @chroma-core/openai
      .
  4. Indexed data shape
    • Determine what is being indexed, how it should be chunked, and what metadata is needed for filtering and updates.
编写代码前请先确定以下内容。优先从代码库和用户请求中获取这些信息,仅当选择会对实现产生实质性影响时才询问用户。
  1. 运行时形态
    • 他们是要连接到正在运行的本地服务器、在测试中嵌入Chroma,还是从头开始搭建本地开发环境?
    • 确定他们是否需要
      chroma run
      、Docker或服务命令、
      HttpClient
      ChromaClient
      ,还是Python
      EphemeralClient
  2. 持久化
    • 持久化本地数据:选择一个明确的数据路径。
    • 一次性测试数据:使用默认路径或临时目录。
  3. 嵌入模型
    • 尽可能复用应用现有的嵌入提供商。
    • 否则在TypeScript中默认使用
      @chroma-core/default-embed
      ,在Python中使用标准本地默认模型。
    • 如果用户明确要求在TypeScript中使用OpenAI嵌入,请安装并使用
      @chroma-core/openai
  4. 索引数据形态
    • 确定要索引的内容、如何分块,以及过滤和更新所需的元数据。

Routing

路由规则

  • Existing local server
    • Confirm host and port before changing client code.
    • Validate the server is reachable before assuming collections are missing.
  • Fresh local development
    • Add a local startup path such as
      chroma run
      or the repo's existing Docker or service command.
    • Default to
      localhost:8000
      unless the repo already uses another address.
  • Python tests or disposable local workflows
    • Prefer
      EphemeralClient
      when persistence is unnecessary.
    • Call out that data is lost when the process exits.
  • Persistent local development
    • Use a stable data path and make persistence explicit in code or config.
    • Do not silently switch between ephemeral and persistent modes.
  • Search integration work
    • Use
      getOrCreateCollection()
      in TypeScript or
      get_or_create_collection()
      in Python.
    • Design document IDs and metadata so upserts and deletes are straightforward.
    • Batch writes when syncing large datasets.
  • 现有本地服务器
    • 修改客户端代码前先确认主机和端口。
    • 在假设集合缺失前,先验证服务器是否可访问。
  • 全新本地开发环境
    • 添加本地启动命令,如
      chroma run
      或代码库中已有的Docker或服务命令。
    • 除非代码库已使用其他地址,否则默认使用
      localhost:8000
  • Python测试或一次性本地工作流
    • 当不需要持久化时,优先使用
      EphemeralClient
    • 需说明进程退出时数据会丢失。
  • 持久化本地开发环境
    • 使用稳定的数据路径,并在代码或配置中明确持久化模式。
    • 不要在临时模式和持久化模式之间静默切换。
  • 搜索集成工作
    • 在TypeScript中使用
      getOrCreateCollection()
      ,在Python中使用
      get_or_create_collection()
    • 设计文档ID和元数据,使更新和删除操作简单直接。
    • 同步大型数据集时使用批量写入。

Ask vs proceed

询问与直接处理

Ask first:
  • Embedding model choice (cost and quality implications)
  • Whether they need persistent local data
  • How they are starting the local server
  • Multi-tenant data isolation strategy
Proceed with sensible defaults:
  • Use
    getOrCreateCollection()
    (TypeScript) /
    get_or_create_collection()
    (Python)
  • Use cosine similarity (most common)
  • Chunk size under 8KB
  • Store source IDs in metadata for updates/deletes
  • Use a local server on
    localhost:8000
    unless the repo already configures another address or is using Python
    EphemeralClient
需先询问用户的情况:
  • 嵌入模型的选择(涉及成本和质量影响)
  • 是否需要持久化本地数据
  • 如何启动本地服务器
  • 多租户数据隔离策略
可使用合理默认值直接处理的情况:
  • 使用
    getOrCreateCollection()
    (TypeScript)/
    get_or_create_collection()
    (Python)
  • 使用余弦相似度(最常用)
  • 分块大小不超过8KB
  • 在元数据中存储源ID以便更新/删除
  • 除非代码库已配置其他地址或使用Python
    EphemeralClient
    ,否则使用
    localhost:8000
    上的本地服务器

What to validate

验证要点

  • Correct client import (
    ChromaClient
    ,
    HttpClient
    , or
    Client
    )
  • Embedding function package is installed (TypeScript)
  • Local server is reachable before assuming collections are missing
  • Local path and persistence mode are intentional
  • 客户端导入正确(
    ChromaClient
    HttpClient
    Client
  • 已安装嵌入函数包(TypeScript)
  • 在假设集合缺失前,先验证本地服务器是否可访问
  • 本地路径和持久化模式是明确设置的

Implementation notes

实现注意事项

  • Local Chroma is the right default for development, tests, and self-hosted deployments.
  • OSS Chroma does not include Chroma Cloud-only features such as
    Schema()
    and
    Search()
    .
  • If the user asks for hybrid dense and sparse retrieval, treat that as a likely Chroma Cloud requirement unless the repo already implements an OSS workaround.
  • For open source Chroma, dense retrieval with a single embedding function is the normal baseline.
  • 本地Chroma是开发、测试和自托管部署的合适默认选择。
  • 开源版Chroma不包含仅Chroma Cloud有的功能,如
    Schema()
    Search()
  • 如果用户要求混合稠密和稀疏检索,除非代码库已实现开源解决方案,否则将其视为可能需要Chroma Cloud的需求。
  • 对于开源Chroma,使用单个嵌入函数的稠密检索是常规基准。

Minimal patterns

最简示例

Start a local Chroma server when the repo needs one:
bash
chroma run
Default address:
localhost:8000
.
TypeScript local client:
typescript
import { ChromaClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new ChromaClient();

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// Add documents
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// Query
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});
Python local client:
python
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)

collection = client.get_or_create_collection(name="my_collection")
当代码库需要时,启动本地Chroma服务器:
bash
chroma run
默认地址:
localhost:8000
TypeScript本地客户端:
typescript
import { ChromaClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new ChromaClient();

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// 添加文档
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// 查询
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});
Python本地客户端:
python
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)

collection = client.get_or_create_collection(name="my_collection")

Add documents

添加文档

collection.add( ids=["doc1", "doc2"] , documents=["First document text", "Second document text"], )
collection.add( ids=["doc1", "doc2"] , documents=["First document text", "Second document text"], )

Query

查询

results = collection.query( query_texts=["search query"], n_results=5, )
undefined
results = collection.query( query_texts=["search query"], n_results=5, )
undefined

Learn More

了解更多

Fetch Chroma's
llms.txt
only when you need API or product details that are not already in the repo or this skill: https://docs.trychroma.com/llms.txt
仅当需要代码库或本技能中未涵盖的API或产品细节时,才获取Chroma的
llms.txt
https://docs.trychroma.com/llms.txt

Available Topics

可用主题

Typescript

Typescript

  • Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
  • Query and Get - Query and Get Data from Chroma Collections
  • Metadata - Store and query metadata, including filters and array values
  • Updating and Deleting - Update existing documents and delete data from collections
  • Error Handling - Handling errors and failures when working with Chroma
  • Local Chroma - How to run and use local chroma
  • Chroma Regex Filtering - 学习如何在Chroma查询中使用正则表达式过滤器
  • Query and Get - 从Chroma集合中查询和获取数据
  • Metadata - 存储和查询元数据,包括过滤器和数组值
  • Updating and Deleting - 更新现有文档并从集合中删除数据
  • Error Handling - 使用Chroma时处理错误和故障
  • Local Chroma - 如何运行和使用本地Chroma

Python

Python

  • Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
  • Query and Get - Query and Get Data from Chroma Collections
  • Metadata - Store and query metadata, including filters and array values
  • Updating and Deleting - Update existing documents and delete data from collections
  • Error Handling - Handling errors and failures when working with Chroma
  • Local Chroma - How to run and use local chroma
  • Chroma Regex Filtering - 学习如何在Chroma查询中使用正则表达式过滤器
  • Query and Get - 从Chroma集合中查询和获取数据
  • Metadata - 存储和查询元数据,包括过滤器和数组值
  • Updating and Deleting - 更新现有文档并从集合中删除数据
  • Error Handling - 使用Chroma时处理错误和故障
  • Local Chroma - 如何运行和使用本地Chroma

General

通用主题

  • Data Model - An overview of how Chroma stores data
  • Integrating Chroma into an existing system - Guidance for adding Chroma search to an existing application
  • Chroma CLI - Starting and managing a local open source Chroma server from the CLI
  • Data Model - Chroma数据存储方式概述
  • Integrating Chroma into an existing system - 向现有应用添加Chroma搜索的指南
  • Chroma CLI - 通过CLI启动和管理本地开源Chroma服务器