langsmith-dataset
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<oneliner>
Create, manage, and upload evaluation datasets to LangSmith for testing and validation.
</oneliner>
<setup>
Environment Variables
</setup>
<usage>
Use the `langsmith` CLI to manage datasets and examples.
bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # Required
LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keysIMPORTANT: Always check the environment variables or file for before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.
.envLANGSMITH_PROJECTPython Dependencies
bash
pip install langsmithJavaScript Dependencies
bash
npm install langsmithCLI Tool
bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh<oneliner>
创建、管理并将评估数据集上传至LangSmith,用于测试与验证。
</oneliner>
<setup>
环境变量
</setup>
<usage>
使用`langsmith` CLI管理数据集和示例。
bash
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # 必填
LANGSMITH_PROJECT=your-project-name # 查看此项以了解哪些项目包含追踪数据
LANGSMITH_WORKSPACE_ID=your-workspace-id # 可选:适用于组织级密钥重要提示: 在查询或与LangSmith交互前,请务必检查环境变量或.env文件中的配置。这能告知你相关追踪数据和信息所在的项目。若LangSmith项目不可用,请根据判断选择合适的项目。
LANGSMITH_PROJECTPython依赖
bash
pip install langsmithJavaScript依赖
bash
npm install langsmithCLI工具安装
bash
curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | shDataset Commands
数据集命令
- - List datasets in LangSmith
langsmith dataset list - - View dataset details
langsmith dataset get <name-or-id> - - Create a new empty dataset
langsmith dataset create --name <name> - - Delete a dataset
langsmith dataset delete <name-or-id> - - Export dataset to local JSON file
langsmith dataset export <name-or-id> <output-file> - - Upload a local JSON file as a dataset
langsmith dataset upload <file> --name <name>
- - 列出LangSmith中的所有数据集
langsmith dataset list - - 查看数据集详情
langsmith dataset get <name-or-id> - - 创建新的空数据集
langsmith dataset create --name <name> - - 删除数据集
langsmith dataset delete <name-or-id> - - 将数据集导出至本地JSON文件
langsmith dataset export <name-or-id> <output-file> - - 将本地JSON文件上传为数据集
langsmith dataset upload <file> --name <name>
Example Commands
示例命令
- - List examples in a dataset
langsmith example list --dataset <name> - - Add an example to a dataset
langsmith example create --dataset <name> --inputs <json> - - Delete an example
langsmith example delete <example-id>
- - 列出数据集中的示例
langsmith example list --dataset <name> - - 向数据集中添加示例
langsmith example create --dataset <name> --inputs <json> - - 删除示例
langsmith example delete <example-id>
Experiment Commands
实验命令
- - List experiments for a dataset
langsmith experiment list --dataset <name> - - View experiment results
langsmith experiment get <name>
- - 列出数据集对应的实验
langsmith experiment list --dataset <name> - - 查看实验结果
langsmith experiment get <name>
Common Flags
通用参数
- - Limit number of results
--limit N - - Skip confirmation prompts (use with caution)
--yes
IMPORTANT - Safety Prompts:
- The CLI prompts for confirmation before destructive operations (delete, overwrite)
- If you are running with user input: ALWAYS wait for user input; NEVER use unless the user explicitly requests it
--yes - If you are running non-interactively: Use to skip confirmation prompts </usage>
--yes
<dataset_types_overview>
Common evaluation dataset types:
- final_response - Full conversation with expected output. Tests complete agent behavior.
- single_step - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
- trajectory - Tool call sequence. Tests execution path (ordered list of tool names).
- rag - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview>
<creating_datasets>
- - 限制返回结果数量
--limit N - - 跳过确认提示(谨慎使用)
--yes
重要安全提示:
- CLI会在执行破坏性操作(删除、覆盖)前提示确认
- 若涉及用户输入: 务必等待用户输入确认;除非用户明确要求,否则绝不要使用参数
--yes - 若非交互式运行: 可使用参数跳过确认提示 </usage>
--yes
<dataset_types_overview>
常见的评估数据集类型:
- final_response - 包含完整对话及预期输出,用于测试Agent的完整行为。
- single_step - 单节点输入/输出,用于测试特定节点的行为(如单次LLM调用或工具调用)。
- trajectory - 工具调用序列,用于测试执行路径(有序的工具名称列表)。
- rag - 问题/文本块/答案/引用,用于测试检索质量。 </dataset_types_overview>
<creating_datasets>
Creating Datasets
创建数据集
Datasets are JSON files with an array of examples. Each example has and .
inputsoutputs数据集为包含示例数组的JSON文件,每个示例需包含和字段。
inputsoutputsFrom Exported Traces (Programmatic)
从导出的追踪数据创建(程序化方式)
Export traces first, then process them into dataset format using code:
bash
undefined先导出追踪数据,再通过代码将其处理为数据集格式:
bash
undefined1. Export traces to JSONL files
1. 将追踪数据导出为JSONL文件
langsmith trace export ./traces --project my-project --limit 20 --full
<python>
```python
import json
from pathlib import Path
from langsmith import Client
client = Client()langsmith trace export ./traces --project my-project --limit 20 --full
<python>
```python
import json
from pathlib import Path
from langsmith import Client
client = Client()2. Process traces into dataset examples
2. 将追踪数据处理为数据集示例
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]
root = next((r for r in runs if r.get("parent_run_id") is None), None)
if root and root.get("inputs") and root.get("outputs"):
examples.append({
"trace_id": root.get("trace_id"),
"inputs": root["inputs"],
"outputs": root["outputs"]
})
examples = []
for jsonl_file in Path("./traces").glob("*.jsonl"):
runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")]
root = next((r for r in runs if r.get("parent_run_id") is None), None)
if root and root.get("inputs") and root.get("outputs"):
examples.append({
"trace_id": root.get("trace_id"),
"inputs": root["inputs"],
"outputs": root["outputs"]
})
3. Save locally
3. 保存至本地
with open("/tmp/dataset.json", "w") as f:
json.dump(examples, f, indent=2)
</python>
<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";
const client = new Client();
// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));
for (const file of files) {
const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
const runs = lines.map(line => JSON.parse(line));
const root = runs.find(r => r.parent_run_id == null);
if (root?.inputs && root?.outputs) {
examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
}
}
// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));with open("/tmp/dataset.json", "w") as f:
json.dump(examples, f, indent=2)
</python>
<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";
const client = new Client();
// 2. 将追踪数据处理为数据集示例
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));
for (const file of files) {
const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
const runs = lines.map(line => JSON.parse(line));
const root = runs.find(r => r.parent_run_id == null);
if (root?.inputs && root?.outputs) {
examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
}
}
// 3. 保存至本地
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));Upload to LangSmith
上传至LangSmith
bash
undefinedbash
undefinedUpload local JSON file as a dataset
将本地JSON文件上传为数据集
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
undefinedlangsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
undefinedUsing the SDK Directly
直接使用SDK创建
<python>
```python
from langsmith import Client
client = Client()
<python>
```python
from langsmith import Client
client = Client()
Create dataset and add examples in one step
一步完成数据集创建和示例添加
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples(
inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}],
outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}],
dataset_name="My Dataset",
)
</python>
<typescript>
```typescript
import { Client } from "langsmith";
const client = new Client();
// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
description: "Evaluation dataset",
});
await client.createExamples({
inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
datasetName: "My Dataset",
});<dataset_structures>
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples(
inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}],
outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}],
dataset_name="My Dataset",
)
</python>
<typescript>
```typescript
import { Client } from "langsmith";
const client = new Client();
// 创建数据集并添加示例
const dataset = await client.createDataset("My Dataset", {
description: "Evaluation dataset",
});
await client.createExamples({
inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
datasetName: "My Dataset",
});<dataset_structures>
Dataset Structures by Type
不同类型的数据集结构
Final Response
Final Response
json
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}json
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}Single Step
Single Step
json
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}json
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}Trajectory
Trajectory
json
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}json
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}RAG
RAG
json
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}</dataset_structures>
<script_usage>
json
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}</dataset_structures>
<script_usage>
CLI Usage
CLI使用示例
bash
undefinedbash
undefinedList all datasets
列出所有数据集
langsmith dataset list
langsmith dataset list
Get dataset details
查看数据集详情
langsmith dataset get "My Dataset"
langsmith dataset get "My Dataset"
Create an empty dataset
创建空数据集
langsmith dataset create --name "New Dataset" --description "For evaluation"
langsmith dataset create --name "New Dataset" --description "For evaluation"
Upload a local JSON file
上传本地JSON文件
langsmith dataset upload /tmp/dataset.json --name "My Dataset"
langsmith dataset upload /tmp/dataset.json --name "My Dataset"
Export a dataset to local file
将数据集导出至本地文件
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100
Delete a dataset
删除数据集
langsmith dataset delete "My Dataset"
langsmith dataset delete "My Dataset"
List examples in a dataset
列出数据集中的示例
langsmith example list --dataset "My Dataset" --limit 10
langsmith example list --dataset "My Dataset" --limit 10
Add an example
添加示例
langsmith example create --dataset "My Dataset"
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'
langsmith example create --dataset "My Dataset"
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'
--inputs '{"query": "test"}'
--outputs '{"answer": "result"}'
List experiments
列出实验
langsmith experiment list --dataset "My Dataset"
langsmith experiment get "eval-v1"
</script_usage>
<example_workflow>
Complete workflow from traces to uploaded LangSmith dataset:
```bashlangsmith experiment list --dataset "My Dataset"
langsmith experiment get "eval-v1"
</script_usage>
<example_workflow>
从追踪数据到上传至LangSmith数据集的完整流程:
```bash1. Export traces from LangSmith
1. 从LangSmith导出追踪数据
langsmith trace export ./traces --project my-project --limit 20 --full
langsmith trace export ./traces --project my-project --limit 20 --full
2. Process traces into dataset format (using Python/JS code)
2. 将追踪数据处理为数据集格式(使用Python/JS代码)
See "Creating Datasets" section above
详见上方「创建数据集」章节
3. Upload to LangSmith
3. 上传至LangSmith
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response"
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response"
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"
4. Verify upload
4. 验证上传结果
langsmith dataset list
langsmith dataset get "Skills: Final Response"
langsmith example list --dataset "Skills: Final Response" --limit 3
langsmith dataset list
langsmith dataset get "Skills: Final Response"
langsmith example list --dataset "Skills: Final Response" --limit 3
5. Run experiments
5. 运行实验
langsmith experiment list --dataset "Skills: Final Response"
</example_workflow>
<troubleshooting>
**Dataset upload fails:**
- Verify LANGSMITH_API_KEY is set
- Check JSON file is valid: each element needs `inputs` (and optionally `outputs`)
- Dataset name must be unique, or delete existing first with `langsmith dataset delete`
**Empty dataset after upload:**
- Verify JSON file contains an array of objects with `inputs` key
- Check file isn't empty: `langsmith example list --dataset "Name"`
**Export has no data:**
- Ensure traces were exported with `--full` flag to include inputs/outputs
- Verify traces have both `inputs` and `outputs` populated
**Example count mismatch:**
- Use `langsmith dataset get "Name"` to check remote count
- Compare with local file to verify upload completeness
</troubleshooting>
</output>langsmith experiment list --dataset "Skills: Final Response"
</example_workflow>
<troubleshooting>
**数据集上传失败:**
- 确认已正确设置LANGSMITH_API_KEY
- 检查JSON文件格式是否有效:每个元素需包含`inputs`字段(可选`outputs`字段)
- 数据集名称需唯一,或先使用`langsmith dataset delete`命令删除现有同名数据集
**上传后数据集为空:**
- 确认JSON文件包含带`inputs`字段的对象数组
- 使用`langsmith example list --dataset "Name"`命令检查是否有示例
**导出的追踪数据无内容:**
- 确保导出时使用`--full`参数以包含输入/输出数据
- 确认追踪数据同时包含`inputs`和`outputs`字段
**示例数量不匹配:**
- 使用`langsmith dataset get "Name"`命令查看远程数据集的示例数量
- 与本地文件对比以确认上传是否完整
</troubleshooting>