fiftyone-embeddings-visualization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEmbeddings Visualization in FiftyOne
FiftyOne中的嵌入可视化
Key Directives
核心规则
ALWAYS follow these rules:
请始终遵循以下规则:
1. Set context first
1. 先设置上下文
python
set_context(dataset_name="my-dataset")python
set_context(dataset_name="my-dataset")2. Launch FiftyOne App
2. 启动FiftyOne App
Brain operators are delegated and require the app:
python
launch_app()Wait 5-10 seconds for initialization.
Brain算子需要依赖App运行:
python
launch_app()等待5-10秒完成初始化。
3. Discover operators dynamically
3. 动态发现算子
python
undefinedpython
undefinedList all brain operators
列出所有Brain算子
list_operators(builtin_only=False)
list_operators(builtin_only=False)
Get schema for specific operator
获取指定算子的架构
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefinedget_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefined4. Compute embeddings before visualization
4. 先计算嵌入再可视化
Embeddings are required for dimensionality reduction:
python
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_sim",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)降维需要先有嵌入数据:
python
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_sim",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)5. Close app when done
5. 完成后关闭App
python
close_app()python
close_app()Complete Workflow
完整工作流程
Step 1: Setup
步骤1:环境设置
python
undefinedpython
undefinedSet context
设置上下文
set_context(dataset_name="my-dataset")
set_context(dataset_name="my-dataset")
Launch app (required for brain operators)
启动App(Brain算子必需)
launch_app()
undefinedlaunch_app()
undefinedStep 2: Verify Brain Plugin
步骤2:验证Brain插件
python
undefinedpython
undefinedCheck if brain plugin is available
检查Brain插件是否可用
list_plugins(enabled=True)
list_plugins(enabled=True)
If not installed:
如果未安装:
download_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
undefineddownload_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
undefinedStep 3: Discover Brain Operators
步骤3:发现Brain算子
python
undefinedpython
undefinedList all available operators
列出所有可用算子
list_operators(builtin_only=False)
list_operators(builtin_only=False)
Get schema for compute_visualization
获取compute_visualization的架构
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefinedget_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefinedStep 4: Check for Existing Embeddings or Compute New Ones
步骤4:检查现有嵌入或计算新嵌入
First, check if the dataset already has embeddings by looking at the operator schema:
python
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")首先,通过查看算子架构检查数据集是否已有嵌入:
python
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")Look for existing embeddings fields in the "embeddings" choices
在"embeddings"选项中查找现有嵌入字段
(e.g., "clip_embeddings", "dinov2_embeddings")
(例如:"clip_embeddings", "dinov2_embeddings")
**If embeddings exist:** Skip to Step 5 and use the existing embeddings field.
**If no embeddings exist:** Compute them:
```python
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings", # Field name to store embeddings
"backend": "sklearn",
"metric": "cosine"
}
)Required parameters for compute_similarity:
- - Unique identifier for this brain run
brain_key - - Model from FiftyOne Model Zoo to generate embeddings
model - - Field name where embeddings will be stored
embeddings - - Similarity backend (use
backend)"sklearn" - - Distance metric (use
metricor"cosine")"euclidean"
Recommended embedding models:
- - Best for general visual + semantic similarity
clip-vit-base32-torch - - Best for visual similarity only
dinov2-vits14-torch - - Classic CNN features
resnet50-imagenet-torch - - Fast, lightweight option
mobilenet-v2-imagenet-torch
**如果已有嵌入:** 跳至步骤5,直接使用现有嵌入字段。
**如果没有嵌入:** 计算嵌入:
```python
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings", # 存储嵌入的字段名
"backend": "sklearn",
"metric": "cosine"
}
)compute_similarity的必填参数:
- - 本次Brain运行的唯一标识符
brain_key - - 用于生成嵌入的FiftyOne模型库中的模型
model - - 存储嵌入数据的字段名
embeddings - - 相似度计算后端(推荐使用
backend)"sklearn" - - 距离度量(推荐使用
metric或"cosine")"euclidean"
推荐的嵌入模型:
- - 通用视觉+语义相似度的最佳选择
clip-vit-base32-torch - - 仅视觉相似度的最佳选择
dinov2-vits14-torch - - 经典CNN特征提取模型
resnet50-imagenet-torch - - 快速轻量型模型
mobilenet-v2-imagenet-torch
Step 5: Compute 2D Visualization
步骤5:计算2D可视化
Use existing embeddings field OR the brain_key from Step 4:
python
undefined使用现有嵌入字段或步骤4中的:
brain_keypython
undefinedOption A: Use existing embeddings field (e.g., clip_embeddings)
选项A:使用现有嵌入字段(例如clip_embeddings)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz",
"embeddings": "clip_embeddings", # Use existing field
"method": "umap",
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz",
"embeddings": "clip_embeddings", # 使用现有字段
"method": "umap",
"num_dims": 2
}
)
Option B: Use brain_key from compute_similarity
选项B:使用compute_similarity中的brain_key
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz", # Same key used in compute_similarity
"method": "umap",
"num_dims": 2
}
)
**Dimensionality reduction methods:**
- `umap` - (Recommended) Preserves local and global structure, faster. Requires `umap-learn` package.
- `tsne` - Better local structure, slower on large datasets. No extra dependencies.
- `pca` - Linear reduction, fastest but less informativeexecute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "img_viz", # 与compute_similarity中使用的key一致
"method": "umap",
"num_dims": 2
}
)
**降维方法:**
- `umap` - (推荐)保留局部和全局结构,速度更快。需要安装`umap-learn`包。
- `tsne` - 局部结构表现更好,但在大型数据集上速度较慢,无需额外依赖。
- `pca` - 线性降维,速度最快但信息保留较少Step 6: Direct User to Embeddings Panel
步骤6:引导用户打开嵌入面板
After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:
- Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
- Select the brain key (e.g., ) from the dropdown
img_viz - Points represent samples in 2D embedding space
- Use the "Color by" dropdown to color points by a field (e.g., ,
ground_truth)predictions - Click points to select samples, use lasso tool to select groups
IMPORTANT: Do NOT use - this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.
set_view(exists=["brain_key"])计算完成后,引导用户打开FiftyOne App(地址:http://localhost:5151/)并执行以下操作:
- 点击顶部工具栏中的嵌入面板图标(散点图样式,类似点阵网格)
- 从下拉菜单中选择对应的brain key(例如)
img_viz - 每个点代表嵌入空间中的一个样本
- 使用**"按颜色区分"**下拉菜单,根据指定字段(如、
ground_truth)为点着色predictions - 点击点可选中样本,使用套索工具可选中样本组
重要提示: 请勿使用 - 该操作会过滤样本,可视化无需此步骤。嵌入面板会自动显示所有已计算坐标的样本。
set_view(exists=["brain_key"])Step 7: Explore and Filter (Optional)
步骤7:探索与过滤(可选)
To filter samples while viewing in the Embeddings panel:
python
undefined在嵌入面板中查看时,可通过以下方式过滤样本:
python
undefinedFilter to specific class
过滤指定类别的样本
set_view(filters={"ground_truth.label": "dog"})
set_view(filters={"ground_truth.label": "dog"})
Filter by tag
按标签过滤样本
set_view(tags=["validated"])
set_view(tags=["validated"])
Clear filter to show all
清除过滤器,显示所有样本
clear_view()
These filters will update the Embeddings panel to show only matching samples.clear_view()
这些过滤器会同步更新嵌入面板,仅显示匹配的样本。Step 8: Find Outliers
步骤8:查找异常值
Outliers appear as isolated points far from clusters:
python
undefined异常值表现为远离聚类的孤立点:
python
undefinedCompute uniqueness scores (higher = more unique/outlier)
计算唯一性分数(分数越高表示样本越独特/越可能是异常值)
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={
"brain_key": "img_viz"
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={
"brain_key": "img_viz"
}
)
View most unique samples (potential outliers)
查看最独特的样本(潜在异常值)
set_view(sort_by="uniqueness", reverse=True, limit=50)
undefinedset_view(sort_by="uniqueness", reverse=True, limit=50)
undefinedStep 9: Find Clusters
步骤9:查找聚类
Use the App's Embeddings panel to visually identify clusters, then:
Option A: Lasso selection in App
- Use lasso tool to select a cluster
- Selected samples are highlighted
- Tag or export selected samples
Option B: Use similarity to find cluster members
python
undefined使用App的嵌入面板可视化识别聚类,然后通过以下方式处理:
选项A:在App中使用套索选择
- 使用套索工具选中一个聚类
- 选中的样本会被高亮显示
- 可为选中样本添加标签或导出
选项B:通过相似度查找聚类成员
python
undefinedSort by similarity to a representative sample
按与代表性样本的相似度排序
execute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "img_viz",
"query_id": "sample_id_from_cluster",
"k": 100
}
)
undefinedexecute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "img_viz",
"query_id": "sample_id_from_cluster",
"k": 100
}
)
undefinedStep 10: Clean Up
步骤10:清理环境
python
close_app()python
close_app()Available Tools
可用工具
Session View Tools
会话视图工具
| Tool | Description |
|---|---|
| Filter samples by field values |
| Filter samples by tags |
| Sort samples by field |
| Limit to N samples |
| Clear filters, show all samples |
| 工具 | 描述 |
|---|---|
| 按字段值过滤样本 |
| 按标签过滤样本 |
| 按字段对样本排序 |
| 限制显示N个样本 |
| 清除过滤器,显示所有样本 |
Brain Operators for Visualization
用于可视化的Brain算子
Use to discover and to see parameters:
list_operators()get_operator_schema()| Operator | Description |
|---|---|
| Compute embeddings and similarity index |
| Reduce embeddings to 2D/3D for visualization |
| Score samples by uniqueness (outlier detection) |
| Sort by similarity to a query sample |
使用发现算子,使用查看参数:
list_operators()get_operator_schema()| 算子 | 描述 |
|---|---|
| 计算嵌入和相似度索引 |
| 将嵌入降维至2D/3D用于可视化 |
| 计算样本的唯一性分数(异常值检测) |
| 按与查询样本的相似度排序 |
Common Use Cases
常见使用场景
Use Case 1: Basic Dataset Exploration
场景1:基础数据集探索
Visualize dataset structure and explore clusters:
python
set_context(dataset_name="my-dataset")
launch_app()可视化数据集结构并探索聚类:
python
set_context(dataset_name="my-dataset")
launch_app()Check for existing embeddings in schema
检查架构中是否有现有嵌入
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
If embeddings exist (e.g., clip_embeddings), use them directly:
如果已有嵌入(例如clip_embeddings),直接使用:
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "exploration",
"embeddings": "clip_embeddings",
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "exploration",
"embeddings": "clip_embeddings",
"method": "umap", # 如果未安装umap-learn,可改用"tsne"
"num_dims": 2
}
)
Direct user to App Embeddings panel at http://localhost:5151/
引导用户打开App嵌入面板(地址:http://localhost:5151/)
1. Click Embeddings panel icon
1. 点击嵌入面板图标
2. Select "exploration" from dropdown
2. 从下拉菜单中选择"exploration"
3. Use "Color by" to color by ground_truth or predictions
3. 使用"按颜色区分"功能,根据ground_truth或predictions着色
undefinedundefinedUse Case 2: Find Outliers in Dataset
场景2:查找数据集中的异常值
Identify anomalous or mislabeled samples:
python
set_context(dataset_name="my-dataset")
launch_app()识别异常或标记错误的样本:
python
set_context(dataset_name="my-dataset")
launch_app()Check for existing embeddings in schema
检查架构中是否有现有嵌入
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
If no embeddings exist, compute them:
如果没有嵌入,先计算:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "outliers",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "outliers",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
Compute uniqueness scores
计算唯一性分数
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={"brain_key": "outliers"}
)
execute_operator(
operator_uri="@voxel51/brain/compute_uniqueness",
params={"brain_key": "outliers"}
)
Generate visualization (use existing embeddings field or brain_key)
生成可视化(使用现有嵌入字段或brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "outliers",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "outliers",
"embeddings": "clip_embeddings", # 如果有现有字段则使用
"method": "umap", # 如果未安装umap-learn,可改用"tsne"
"num_dims": 2
}
)
Direct user to App at http://localhost:5151/
引导用户打开App(地址:http://localhost:5151/)
1. Click Embeddings panel icon
1. 点击嵌入面板图标
2. Select "outliers" from dropdown
2. 从下拉菜单中选择"outliers"
3. Outliers appear as isolated points far from clusters
3. 异常值表现为远离聚类的孤立点
4. Optionally sort by uniqueness field in the App sidebar
4. 可在App侧边栏中按唯一性分数排序
undefinedundefinedUse Case 3: Compare Classes in Embedding Space
场景3:在嵌入空间中比较类别
See how different classes cluster:
python
set_context(dataset_name="my-dataset")
launch_app()查看不同类别的聚类情况:
python
set_context(dataset_name="my-dataset")
launch_app()Check for existing embeddings in schema
检查架构中是否有现有嵌入
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
If no embeddings exist, compute them:
如果没有嵌入,先计算:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "class_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "class_viz",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
Generate visualization (use existing embeddings field or brain_key)
生成可视化(使用现有嵌入字段或brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "class_viz",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "class_viz",
"embeddings": "clip_embeddings", # 如果有现有字段则使用
"method": "umap", # 如果未安装umap-learn,可改用"tsne"
"num_dims": 2
}
)
Direct user to App at http://localhost:5151/
引导用户打开App(地址:http://localhost:5151/)
1. Click Embeddings panel icon
1. 点击嵌入面板图标
2. Select "class_viz" from dropdown
2. 从下拉菜单中选择"class_viz"
3. Use "Color by" dropdown to color by ground_truth or predictions
3. 使用"按颜色区分"下拉菜单,根据ground_truth或predictions着色
Look for:
观察以下情况:
- Well-separated clusters = good class distinction
- 聚类分离清晰 = 类别区分度好
- Overlapping clusters = similar classes or confusion
- 聚类重叠 = 类别相似或模型易混淆
- Scattered points = high variance within class
- 点分布分散 = 类别内部差异大
undefinedundefinedUse Case 4: Analyze Model Predictions
场景4:分析模型预测结果
Compare ground truth vs predictions in embedding space:
python
set_context(dataset_name="my-dataset")
launch_app()在嵌入空间中比较真实标签与模型预测结果:
python
set_context(dataset_name="my-dataset")
launch_app()Check for existing embeddings in schema
检查架构中是否有现有嵌入
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
If no embeddings exist, compute them:
如果没有嵌入,先计算:
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "pred_analysis",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "pred_analysis",
"model": "clip-vit-base32-torch",
"embeddings": "clip_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
Generate visualization (use existing embeddings field or brain_key)
生成可视化(使用现有嵌入字段或brain_key)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "pred_analysis",
"embeddings": "clip_embeddings", # Use existing field if available
"method": "umap", # or "tsne" if umap-learn not installed
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "pred_analysis",
"embeddings": "clip_embeddings", # 如果有现有字段则使用
"method": "umap", # 如果未安装umap-learn,可改用"tsne"
"num_dims": 2
}
)
Direct user to App at http://localhost:5151/
引导用户打开App(地址:http://localhost:5151/)
1. Click Embeddings panel icon
1. 点击嵌入面板图标
2. Select "pred_analysis" from dropdown
2. 从下拉菜单中选择"pred_analysis"
3. Color by ground_truth - see true class distribution
3. 按ground_truth着色 - 查看真实类别分布
4. Color by predictions - see model's view
4. 按predictions着色 - 查看模型的分类视角
5. Look for mismatches to find errors
5. 查找不匹配的样本以定位模型错误
undefinedundefinedUse Case 5: t-SNE for Publication-Quality Plots
场景5:用于出版物级别的t-SNE可视化
Use t-SNE for better local structure (no extra dependencies):
python
set_context(dataset_name="my-dataset")
launch_app()使用t-SNE获得更优的局部结构表现(无需额外依赖):
python
set_context(dataset_name="my-dataset")
launch_app()Check for existing embeddings in schema
检查架构中是否有现有嵌入
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
If no embeddings exist, compute them (DINOv2 for visual similarity):
如果没有嵌入,先计算(使用DINOv2用于视觉相似度):
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "tsne_viz",
"model": "dinov2-vits14-torch",
"embeddings": "dinov2_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "tsne_viz",
"model": "dinov2-vits14-torch",
"embeddings": "dinov2_embeddings",
"backend": "sklearn",
"metric": "cosine"
}
)
Generate t-SNE visualization (no umap-learn dependency needed)
生成t-SNE可视化(无需安装umap-learn)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "tsne_viz",
"embeddings": "dinov2_embeddings", # Use existing field if available
"method": "tsne",
"num_dims": 2
}
)
execute_operator(
operator_uri="@voxel51/brain/compute_visualization",
params={
"brain_key": "tsne_viz",
"embeddings": "dinov2_embeddings", # 如果有现有字段则使用
"method": "tsne",
"num_dims": 2
}
)
Direct user to App at http://localhost:5151/
引导用户打开App(地址:http://localhost:5151/)
1. Click Embeddings panel icon
1. 点击嵌入面板图标
2. Select "tsne_viz" from dropdown
2. 从下拉菜单中选择"tsne_viz"
3. t-SNE provides better local cluster structure than UMAP
3. t-SNE相比UMAP能提供更优的局部聚类结构
undefinedundefinedTroubleshooting
故障排除
Error: "No executor available"
- Cause: Delegated operators require the App executor
- Solution: Ensure was called and wait 5-10 seconds
launch_app()
Error: "Brain key not found"
- Cause: Embeddings not computed
- Solution: Run first with a
compute_similaritybrain_key
Error: "Operator not found"
- Cause: Brain plugin not installed
- Solution: Install with and
download_plugin()enable_plugin()
Error: "You must install the package"
umap-learn>=0.5- Cause: UMAP method requires the package
umap-learn - Solutions:
- Install umap-learn: Ask user if they want to run
pip install umap-learn - Use t-SNE instead: Change to
method(no extra dependencies)"tsne" - Use PCA instead: Change to
method(fastest, no extra dependencies)"pca"
- Install umap-learn: Ask user if they want to run
- After installing umap-learn, restart Claude Code/MCP server and retry
Visualization is slow
- Use UMAP instead of t-SNE for large datasets
- Use faster embedding model:
mobilenet-v2-imagenet-torch - Process subset first:
set_view(limit=1000)
Embeddings panel not showing
- Ensure visualization was computed (not just embeddings)
- Check brain_key matches in both compute_similarity and compute_visualization
- Refresh the App page
Points not colored correctly
- Verify the field exists on samples
- Check field type is compatible (Classification, Detections, or string)
错误:"No executor available"
- 原因:委托算子需要依赖App执行器
- 解决方案:确保已调用并等待5-10秒
launch_app()
错误:"Brain key not found"
- 原因:未计算嵌入数据
- 解决方案:先使用算子并指定
compute_similaritybrain_key
错误:"Operator not found"
- 原因:未安装Brain插件
- 解决方案:使用和
download_plugin()安装插件enable_plugin()
错误:"You must install the package"
umap-learn>=0.5- 原因:使用UMAP方法需要安装包
umap-learn - 解决方案:
- 安装umap-learn:询问用户是否执行
pip install umap-learn - 改用t-SNE:将改为
method(无需额外依赖)"tsne" - 改用PCA:将改为
method(速度最快,无需额外依赖)"pca"
- 安装umap-learn:询问用户是否执行
- 安装umap-learn后,重启Claude Code/MCP服务器并重试
可视化速度慢
- 大型数据集使用UMAP替代t-SNE
- 使用更快的嵌入模型:
mobilenet-v2-imagenet-torch - 先处理子集:
set_view(limit=1000)
嵌入面板未显示
- 确保已计算可视化(而不仅仅是嵌入)
- 检查compute_similarity和compute_visualization中使用的brain_key是否一致
- 刷新App页面
点着色异常
- 验证样本中是否存在指定字段
- 检查字段类型是否兼容(分类、检测或字符串类型)
Best Practices
最佳实践
- Discover dynamically - Use and
list_operators()to get current operator names and parametersget_operator_schema() - Choose the right model - CLIP for semantic similarity, DINOv2 for visual similarity
- Start with UMAP - Faster and often better than t-SNE for exploration
- Use uniqueness for outliers - More reliable than visual inspection alone
- Store embeddings - Reuse for multiple visualizations via
brain_key - Subset large datasets - Compute on subset first, then full dataset
- 动态发现算子 - 使用和
list_operators()获取当前算子名称和参数get_operator_schema() - 选择合适的模型 - CLIP适用于语义相似度,DINOv2适用于视觉相似度
- 优先使用UMAP - 相比t-SNE,UMAP速度更快,探索效果更优
- 用唯一性分数检测异常值 - 比单纯的视觉检查更可靠
- 存储嵌入数据 - 通过复用嵌入数据进行多次可视化
brain_key - 先处理数据集子集 - 先在子集上验证流程,再处理完整数据集
Performance Notes
性能说明
Embedding computation time:
- 1,000 images: ~1-2 minutes
- 10,000 images: ~10-15 minutes
- 100,000 images: ~1-2 hours
Visualization computation time:
- UMAP: ~30 seconds for 10,000 samples
- t-SNE: ~5-10 minutes for 10,000 samples
- PCA: ~5 seconds for 10,000 samples
Memory requirements:
- ~2KB per image for embeddings
- ~16 bytes per image for 2D coordinates
嵌入计算时间:
- 1000张图片:约1-2分钟
- 10000张图片:约10-15分钟
- 100000张图片:约1-2小时
可视化计算时间:
- UMAP:10000个样本约30秒
- t-SNE:10000个样本约5-10分钟
- PCA:10000个样本约5秒
内存需求:
- 每张图片的嵌入约占2KB
- 2D坐标每张图片约占16字节