fiftyone-embeddings-visualization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Embeddings Visualization in FiftyOne

FiftyOne中的嵌入可视化

Key Directives

核心规则

ALWAYS follow these rules:
请始终遵循以下规则:

1. Set context first

1. 先设置上下文

python
set_context(dataset_name="my-dataset")
python
set_context(dataset_name="my-dataset")

2. Launch FiftyOne App

2. 启动FiftyOne App

Brain operators are delegated and require the app:
python
launch_app()
Wait 5-10 seconds for initialization.
Brain算子需要依赖App运行:
python
launch_app()
等待5-10秒完成初始化。

3. Discover operators dynamically

3. 动态发现算子

python
undefined
python
undefined

List all brain operators

列出所有Brain算子

list_operators(builtin_only=False)
list_operators(builtin_only=False)

Get schema for specific operator

获取指定算子的架构

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefined
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefined

4. Compute embeddings before visualization

4. 先计算嵌入再可视化

Embeddings are required for dimensionality reduction:
python
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_sim",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)
降维需要先有嵌入数据:
python
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_sim",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",
        "backend": "sklearn",
        "metric": "cosine"
    }
)

5. Close app when done

5. 完成后关闭App

python
close_app()
python
close_app()

Complete Workflow

完整工作流程

Step 1: Setup

步骤1:环境设置

python
undefined
python
undefined

Set context

设置上下文

set_context(dataset_name="my-dataset")
set_context(dataset_name="my-dataset")

Launch app (required for brain operators)

启动App(Brain算子必需)

launch_app()
undefined
launch_app()
undefined

Step 2: Verify Brain Plugin

步骤2:验证Brain插件

python
undefined
python
undefined

Check if brain plugin is available

检查Brain插件是否可用

list_plugins(enabled=True)
list_plugins(enabled=True)

If not installed:

如果未安装:

download_plugin( url_or_repo="voxel51/fiftyone-plugins", plugin_names=["@voxel51/brain"] ) enable_plugin(plugin_name="@voxel51/brain")
undefined
download_plugin( url_or_repo="voxel51/fiftyone-plugins", plugin_names=["@voxel51/brain"] ) enable_plugin(plugin_name="@voxel51/brain")
undefined

Step 3: Discover Brain Operators

步骤3:发现Brain算子

python
undefined
python
undefined

List all available operators

列出所有可用算子

list_operators(builtin_only=False)
list_operators(builtin_only=False)

Get schema for compute_visualization

获取compute_visualization的架构

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefined
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
undefined

Step 4: Check for Existing Embeddings or Compute New Ones

步骤4:检查现有嵌入或计算新嵌入

First, check if the dataset already has embeddings by looking at the operator schema:
python
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
首先,通过查看算子架构检查数据集是否已有嵌入:
python
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

Look for existing embeddings fields in the "embeddings" choices

在"embeddings"选项中查找现有嵌入字段

(e.g., "clip_embeddings", "dinov2_embeddings")

(例如:"clip_embeddings", "dinov2_embeddings")


**If embeddings exist:** Skip to Step 5 and use the existing embeddings field.

**If no embeddings exist:** Compute them:
```python
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",  # Field name to store embeddings
        "backend": "sklearn",
        "metric": "cosine"
    }
)
Required parameters for compute_similarity:
  • brain_key
    - Unique identifier for this brain run
  • model
    - Model from FiftyOne Model Zoo to generate embeddings
  • embeddings
    - Field name where embeddings will be stored
  • backend
    - Similarity backend (use
    "sklearn"
    )
  • metric
    - Distance metric (use
    "cosine"
    or
    "euclidean"
    )
Recommended embedding models:
  • clip-vit-base32-torch
    - Best for general visual + semantic similarity
  • dinov2-vits14-torch
    - Best for visual similarity only
  • resnet50-imagenet-torch
    - Classic CNN features
  • mobilenet-v2-imagenet-torch
    - Fast, lightweight option

**如果已有嵌入:** 跳至步骤5,直接使用现有嵌入字段。

**如果没有嵌入:** 计算嵌入:
```python
execute_operator(
    operator_uri="@voxel51/brain/compute_similarity",
    params={
        "brain_key": "img_viz",
        "model": "clip-vit-base32-torch",
        "embeddings": "clip_embeddings",  # 存储嵌入的字段名
        "backend": "sklearn",
        "metric": "cosine"
    }
)
compute_similarity的必填参数:
  • brain_key
    - 本次Brain运行的唯一标识符
  • model
    - 用于生成嵌入的FiftyOne模型库中的模型
  • embeddings
    - 存储嵌入数据的字段名
  • backend
    - 相似度计算后端(推荐使用
    "sklearn"
  • metric
    - 距离度量(推荐使用
    "cosine"
    "euclidean"
推荐的嵌入模型:
  • clip-vit-base32-torch
    - 通用视觉+语义相似度的最佳选择
  • dinov2-vits14-torch
    - 仅视觉相似度的最佳选择
  • resnet50-imagenet-torch
    - 经典CNN特征提取模型
  • mobilenet-v2-imagenet-torch
    - 快速轻量型模型

Step 5: Compute 2D Visualization

步骤5:计算2D可视化

Use existing embeddings field OR the brain_key from Step 4:
python
undefined
使用现有嵌入字段或步骤4中的
brain_key
python
undefined

Option A: Use existing embeddings field (e.g., clip_embeddings)

选项A:使用现有嵌入字段(例如clip_embeddings)

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "img_viz", "embeddings": "clip_embeddings", # Use existing field "method": "umap", "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "img_viz", "embeddings": "clip_embeddings", # 使用现有字段 "method": "umap", "num_dims": 2 } )

Option B: Use brain_key from compute_similarity

选项B:使用compute_similarity中的brain_key

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "img_viz", # Same key used in compute_similarity "method": "umap", "num_dims": 2 } )

**Dimensionality reduction methods:**
- `umap` - (Recommended) Preserves local and global structure, faster. Requires `umap-learn` package.
- `tsne` - Better local structure, slower on large datasets. No extra dependencies.
- `pca` - Linear reduction, fastest but less informative
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "img_viz", # 与compute_similarity中使用的key一致 "method": "umap", "num_dims": 2 } )

**降维方法:**
- `umap` - (推荐)保留局部和全局结构,速度更快。需要安装`umap-learn`包。
- `tsne` - 局部结构表现更好,但在大型数据集上速度较慢,无需额外依赖。
- `pca` - 线性降维,速度最快但信息保留较少

Step 6: Direct User to Embeddings Panel

步骤6:引导用户打开嵌入面板

After computing visualization, direct the user to open the FiftyOne App at http://localhost:5151/ and:
  1. Click the Embeddings panel icon (scatter plot icon, looks like a grid of dots) in the top toolbar
  2. Select the brain key (e.g.,
    img_viz
    ) from the dropdown
  3. Points represent samples in 2D embedding space
  4. Use the "Color by" dropdown to color points by a field (e.g.,
    ground_truth
    ,
    predictions
    )
  5. Click points to select samples, use lasso tool to select groups
IMPORTANT: Do NOT use
set_view(exists=["brain_key"])
- this filters samples and is not needed for visualization. The Embeddings panel automatically shows all samples with computed coordinates.
计算完成后,引导用户打开FiftyOne App(地址:http://localhost:5151/)并执行以下操作:
  1. 点击顶部工具栏中的嵌入面板图标(散点图样式,类似点阵网格)
  2. 从下拉菜单中选择对应的brain key(例如
    img_viz
  3. 每个点代表嵌入空间中的一个样本
  4. 使用**"按颜色区分"**下拉菜单,根据指定字段(如
    ground_truth
    predictions
    )为点着色
  5. 点击点可选中样本,使用套索工具可选中样本组
重要提示: 请勿使用
set_view(exists=["brain_key"])
- 该操作会过滤样本,可视化无需此步骤。嵌入面板会自动显示所有已计算坐标的样本。

Step 7: Explore and Filter (Optional)

步骤7:探索与过滤(可选)

To filter samples while viewing in the Embeddings panel:
python
undefined
在嵌入面板中查看时,可通过以下方式过滤样本:
python
undefined

Filter to specific class

过滤指定类别的样本

set_view(filters={"ground_truth.label": "dog"})
set_view(filters={"ground_truth.label": "dog"})

Filter by tag

按标签过滤样本

set_view(tags=["validated"])
set_view(tags=["validated"])

Clear filter to show all

清除过滤器,显示所有样本

clear_view()

These filters will update the Embeddings panel to show only matching samples.
clear_view()

这些过滤器会同步更新嵌入面板,仅显示匹配的样本。

Step 8: Find Outliers

步骤8:查找异常值

Outliers appear as isolated points far from clusters:
python
undefined
异常值表现为远离聚类的孤立点:
python
undefined

Compute uniqueness scores (higher = more unique/outlier)

计算唯一性分数(分数越高表示样本越独特/越可能是异常值)

execute_operator( operator_uri="@voxel51/brain/compute_uniqueness", params={ "brain_key": "img_viz" } )
execute_operator( operator_uri="@voxel51/brain/compute_uniqueness", params={ "brain_key": "img_viz" } )

View most unique samples (potential outliers)

查看最独特的样本(潜在异常值)

set_view(sort_by="uniqueness", reverse=True, limit=50)
undefined
set_view(sort_by="uniqueness", reverse=True, limit=50)
undefined

Step 9: Find Clusters

步骤9:查找聚类

Use the App's Embeddings panel to visually identify clusters, then:
Option A: Lasso selection in App
  1. Use lasso tool to select a cluster
  2. Selected samples are highlighted
  3. Tag or export selected samples
Option B: Use similarity to find cluster members
python
undefined
使用App的嵌入面板可视化识别聚类,然后通过以下方式处理:
选项A:在App中使用套索选择
  1. 使用套索工具选中一个聚类
  2. 选中的样本会被高亮显示
  3. 可为选中样本添加标签或导出
选项B:通过相似度查找聚类成员
python
undefined

Sort by similarity to a representative sample

按与代表性样本的相似度排序

execute_operator( operator_uri="@voxel51/brain/sort_by_similarity", params={ "brain_key": "img_viz", "query_id": "sample_id_from_cluster", "k": 100 } )
undefined
execute_operator( operator_uri="@voxel51/brain/sort_by_similarity", params={ "brain_key": "img_viz", "query_id": "sample_id_from_cluster", "k": 100 } )
undefined

Step 10: Clean Up

步骤10:清理环境

python
close_app()
python
close_app()

Available Tools

可用工具

Session View Tools

会话视图工具

ToolDescription
set_view(filters={...})
Filter samples by field values
set_view(tags=[...])
Filter samples by tags
set_view(sort_by="...", reverse=True)
Sort samples by field
set_view(limit=N)
Limit to N samples
clear_view()
Clear filters, show all samples
工具描述
set_view(filters={...})
按字段值过滤样本
set_view(tags=[...])
按标签过滤样本
set_view(sort_by="...", reverse=True)
按字段对样本排序
set_view(limit=N)
限制显示N个样本
clear_view()
清除过滤器,显示所有样本

Brain Operators for Visualization

用于可视化的Brain算子

Use
list_operators()
to discover and
get_operator_schema()
to see parameters:
OperatorDescription
@voxel51/brain/compute_similarity
Compute embeddings and similarity index
@voxel51/brain/compute_visualization
Reduce embeddings to 2D/3D for visualization
@voxel51/brain/compute_uniqueness
Score samples by uniqueness (outlier detection)
@voxel51/brain/sort_by_similarity
Sort by similarity to a query sample
使用
list_operators()
发现算子,使用
get_operator_schema()
查看参数:
算子描述
@voxel51/brain/compute_similarity
计算嵌入和相似度索引
@voxel51/brain/compute_visualization
将嵌入降维至2D/3D用于可视化
@voxel51/brain/compute_uniqueness
计算样本的唯一性分数(异常值检测)
@voxel51/brain/sort_by_similarity
按与查询样本的相似度排序

Common Use Cases

常见使用场景

Use Case 1: Basic Dataset Exploration

场景1:基础数据集探索

Visualize dataset structure and explore clusters:
python
set_context(dataset_name="my-dataset")
launch_app()
可视化数据集结构并探索聚类:
python
set_context(dataset_name="my-dataset")
launch_app()

Check for existing embeddings in schema

检查架构中是否有现有嵌入

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

If embeddings exist (e.g., clip_embeddings), use them directly:

如果已有嵌入(例如clip_embeddings),直接使用:

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "exploration", "embeddings": "clip_embeddings", "method": "umap", # or "tsne" if umap-learn not installed "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "exploration", "embeddings": "clip_embeddings", "method": "umap", # 如果未安装umap-learn,可改用"tsne" "num_dims": 2 } )

Direct user to App Embeddings panel at http://localhost:5151/

引导用户打开App嵌入面板(地址:http://localhost:5151/)

1. Click Embeddings panel icon

1. 点击嵌入面板图标

2. Select "exploration" from dropdown

2. 从下拉菜单中选择"exploration"

3. Use "Color by" to color by ground_truth or predictions

3. 使用"按颜色区分"功能,根据ground_truth或predictions着色

undefined
undefined

Use Case 2: Find Outliers in Dataset

场景2:查找数据集中的异常值

Identify anomalous or mislabeled samples:
python
set_context(dataset_name="my-dataset")
launch_app()
识别异常或标记错误的样本:
python
set_context(dataset_name="my-dataset")
launch_app()

Check for existing embeddings in schema

检查架构中是否有现有嵌入

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

If no embeddings exist, compute them:

如果没有嵌入,先计算:

execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "outliers", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )
execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "outliers", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )

Compute uniqueness scores

计算唯一性分数

execute_operator( operator_uri="@voxel51/brain/compute_uniqueness", params={"brain_key": "outliers"} )
execute_operator( operator_uri="@voxel51/brain/compute_uniqueness", params={"brain_key": "outliers"} )

Generate visualization (use existing embeddings field or brain_key)

生成可视化(使用现有嵌入字段或brain_key)

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "outliers", "embeddings": "clip_embeddings", # Use existing field if available "method": "umap", # or "tsne" if umap-learn not installed "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "outliers", "embeddings": "clip_embeddings", # 如果有现有字段则使用 "method": "umap", # 如果未安装umap-learn,可改用"tsne" "num_dims": 2 } )

Direct user to App at http://localhost:5151/

引导用户打开App(地址:http://localhost:5151/)

1. Click Embeddings panel icon

1. 点击嵌入面板图标

2. Select "outliers" from dropdown

2. 从下拉菜单中选择"outliers"

3. Outliers appear as isolated points far from clusters

3. 异常值表现为远离聚类的孤立点

4. Optionally sort by uniqueness field in the App sidebar

4. 可在App侧边栏中按唯一性分数排序

undefined
undefined

Use Case 3: Compare Classes in Embedding Space

场景3:在嵌入空间中比较类别

See how different classes cluster:
python
set_context(dataset_name="my-dataset")
launch_app()
查看不同类别的聚类情况:
python
set_context(dataset_name="my-dataset")
launch_app()

Check for existing embeddings in schema

检查架构中是否有现有嵌入

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

If no embeddings exist, compute them:

如果没有嵌入,先计算:

execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "class_viz", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )
execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "class_viz", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )

Generate visualization (use existing embeddings field or brain_key)

生成可视化(使用现有嵌入字段或brain_key)

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "class_viz", "embeddings": "clip_embeddings", # Use existing field if available "method": "umap", # or "tsne" if umap-learn not installed "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "class_viz", "embeddings": "clip_embeddings", # 如果有现有字段则使用 "method": "umap", # 如果未安装umap-learn,可改用"tsne" "num_dims": 2 } )

Direct user to App at http://localhost:5151/

引导用户打开App(地址:http://localhost:5151/)

1. Click Embeddings panel icon

1. 点击嵌入面板图标

2. Select "class_viz" from dropdown

2. 从下拉菜单中选择"class_viz"

3. Use "Color by" dropdown to color by ground_truth or predictions

3. 使用"按颜色区分"下拉菜单,根据ground_truth或predictions着色

Look for:

观察以下情况:

- Well-separated clusters = good class distinction

- 聚类分离清晰 = 类别区分度好

- Overlapping clusters = similar classes or confusion

- 聚类重叠 = 类别相似或模型易混淆

- Scattered points = high variance within class

- 点分布分散 = 类别内部差异大

undefined
undefined

Use Case 4: Analyze Model Predictions

场景4:分析模型预测结果

Compare ground truth vs predictions in embedding space:
python
set_context(dataset_name="my-dataset")
launch_app()
在嵌入空间中比较真实标签与模型预测结果:
python
set_context(dataset_name="my-dataset")
launch_app()

Check for existing embeddings in schema

检查架构中是否有现有嵌入

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

If no embeddings exist, compute them:

如果没有嵌入,先计算:

execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "pred_analysis", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )
execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "pred_analysis", "model": "clip-vit-base32-torch", "embeddings": "clip_embeddings", "backend": "sklearn", "metric": "cosine" } )

Generate visualization (use existing embeddings field or brain_key)

生成可视化(使用现有嵌入字段或brain_key)

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "pred_analysis", "embeddings": "clip_embeddings", # Use existing field if available "method": "umap", # or "tsne" if umap-learn not installed "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "pred_analysis", "embeddings": "clip_embeddings", # 如果有现有字段则使用 "method": "umap", # 如果未安装umap-learn,可改用"tsne" "num_dims": 2 } )

Direct user to App at http://localhost:5151/

引导用户打开App(地址:http://localhost:5151/)

1. Click Embeddings panel icon

1. 点击嵌入面板图标

2. Select "pred_analysis" from dropdown

2. 从下拉菜单中选择"pred_analysis"

3. Color by ground_truth - see true class distribution

3. 按ground_truth着色 - 查看真实类别分布

4. Color by predictions - see model's view

4. 按predictions着色 - 查看模型的分类视角

5. Look for mismatches to find errors

5. 查找不匹配的样本以定位模型错误

undefined
undefined

Use Case 5: t-SNE for Publication-Quality Plots

场景5:用于出版物级别的t-SNE可视化

Use t-SNE for better local structure (no extra dependencies):
python
set_context(dataset_name="my-dataset")
launch_app()
使用t-SNE获得更优的局部结构表现(无需额外依赖):
python
set_context(dataset_name="my-dataset")
launch_app()

Check for existing embeddings in schema

检查架构中是否有现有嵌入

get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")
get_operator_schema(operator_uri="@voxel51/brain/compute_visualization")

If no embeddings exist, compute them (DINOv2 for visual similarity):

如果没有嵌入,先计算(使用DINOv2用于视觉相似度):

execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "tsne_viz", "model": "dinov2-vits14-torch", "embeddings": "dinov2_embeddings", "backend": "sklearn", "metric": "cosine" } )
execute_operator( operator_uri="@voxel51/brain/compute_similarity", params={ "brain_key": "tsne_viz", "model": "dinov2-vits14-torch", "embeddings": "dinov2_embeddings", "backend": "sklearn", "metric": "cosine" } )

Generate t-SNE visualization (no umap-learn dependency needed)

生成t-SNE可视化(无需安装umap-learn)

execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "tsne_viz", "embeddings": "dinov2_embeddings", # Use existing field if available "method": "tsne", "num_dims": 2 } )
execute_operator( operator_uri="@voxel51/brain/compute_visualization", params={ "brain_key": "tsne_viz", "embeddings": "dinov2_embeddings", # 如果有现有字段则使用 "method": "tsne", "num_dims": 2 } )

Direct user to App at http://localhost:5151/

引导用户打开App(地址:http://localhost:5151/)

1. Click Embeddings panel icon

1. 点击嵌入面板图标

2. Select "tsne_viz" from dropdown

2. 从下拉菜单中选择"tsne_viz"

3. t-SNE provides better local cluster structure than UMAP

3. t-SNE相比UMAP能提供更优的局部聚类结构

undefined
undefined

Troubleshooting

故障排除

Error: "No executor available"
  • Cause: Delegated operators require the App executor
  • Solution: Ensure
    launch_app()
    was called and wait 5-10 seconds
Error: "Brain key not found"
  • Cause: Embeddings not computed
  • Solution: Run
    compute_similarity
    first with a
    brain_key
Error: "Operator not found"
  • Cause: Brain plugin not installed
  • Solution: Install with
    download_plugin()
    and
    enable_plugin()
Error: "You must install the
umap-learn>=0.5
package"
  • Cause: UMAP method requires the
    umap-learn
    package
  • Solutions:
    1. Install umap-learn: Ask user if they want to run
      pip install umap-learn
    2. Use t-SNE instead: Change
      method
      to
      "tsne"
      (no extra dependencies)
    3. Use PCA instead: Change
      method
      to
      "pca"
      (fastest, no extra dependencies)
  • After installing umap-learn, restart Claude Code/MCP server and retry
Visualization is slow
  • Use UMAP instead of t-SNE for large datasets
  • Use faster embedding model:
    mobilenet-v2-imagenet-torch
  • Process subset first:
    set_view(limit=1000)
Embeddings panel not showing
  • Ensure visualization was computed (not just embeddings)
  • Check brain_key matches in both compute_similarity and compute_visualization
  • Refresh the App page
Points not colored correctly
  • Verify the field exists on samples
  • Check field type is compatible (Classification, Detections, or string)
错误:"No executor available"
  • 原因:委托算子需要依赖App执行器
  • 解决方案:确保已调用
    launch_app()
    并等待5-10秒
错误:"Brain key not found"
  • 原因:未计算嵌入数据
  • 解决方案:先使用
    compute_similarity
    算子并指定
    brain_key
错误:"Operator not found"
  • 原因:未安装Brain插件
  • 解决方案:使用
    download_plugin()
    enable_plugin()
    安装插件
错误:"You must install the
umap-learn>=0.5
package"
  • 原因:使用UMAP方法需要安装
    umap-learn
  • 解决方案:
    1. 安装umap-learn:询问用户是否执行
      pip install umap-learn
    2. 改用t-SNE:将
      method
      改为
      "tsne"
      (无需额外依赖)
    3. 改用PCA:将
      method
      改为
      "pca"
      (速度最快,无需额外依赖)
  • 安装umap-learn后,重启Claude Code/MCP服务器并重试
可视化速度慢
  • 大型数据集使用UMAP替代t-SNE
  • 使用更快的嵌入模型:
    mobilenet-v2-imagenet-torch
  • 先处理子集:
    set_view(limit=1000)
嵌入面板未显示
  • 确保已计算可视化(而不仅仅是嵌入)
  • 检查compute_similarity和compute_visualization中使用的brain_key是否一致
  • 刷新App页面
点着色异常
  • 验证样本中是否存在指定字段
  • 检查字段类型是否兼容(分类、检测或字符串类型)

Best Practices

最佳实践

  1. Discover dynamically - Use
    list_operators()
    and
    get_operator_schema()
    to get current operator names and parameters
  2. Choose the right model - CLIP for semantic similarity, DINOv2 for visual similarity
  3. Start with UMAP - Faster and often better than t-SNE for exploration
  4. Use uniqueness for outliers - More reliable than visual inspection alone
  5. Store embeddings - Reuse for multiple visualizations via
    brain_key
  6. Subset large datasets - Compute on subset first, then full dataset
  1. 动态发现算子 - 使用
    list_operators()
    get_operator_schema()
    获取当前算子名称和参数
  2. 选择合适的模型 - CLIP适用于语义相似度,DINOv2适用于视觉相似度
  3. 优先使用UMAP - 相比t-SNE,UMAP速度更快,探索效果更优
  4. 用唯一性分数检测异常值 - 比单纯的视觉检查更可靠
  5. 存储嵌入数据 - 通过
    brain_key
    复用嵌入数据进行多次可视化
  6. 先处理数据集子集 - 先在子集上验证流程,再处理完整数据集

Performance Notes

性能说明

Embedding computation time:
  • 1,000 images: ~1-2 minutes
  • 10,000 images: ~10-15 minutes
  • 100,000 images: ~1-2 hours
Visualization computation time:
  • UMAP: ~30 seconds for 10,000 samples
  • t-SNE: ~5-10 minutes for 10,000 samples
  • PCA: ~5 seconds for 10,000 samples
Memory requirements:
  • ~2KB per image for embeddings
  • ~16 bytes per image for 2D coordinates
嵌入计算时间:
  • 1000张图片:约1-2分钟
  • 10000张图片:约10-15分钟
  • 100000张图片:约1-2小时
可视化计算时间:
  • UMAP:10000个样本约30秒
  • t-SNE:10000个样本约5-10分钟
  • PCA:10000个样本约5秒
内存需求:
  • 每张图片的嵌入约占2KB
  • 2D坐标每张图片约占16字节

Resources

参考资源