scvi-tools

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

scvi-tools

Overview

概述

scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.

scvi-tools是一个用于单细胞基因组学概率建模的综合性Python框架。它基于PyTorch和PyTorch Lightning构建，提供采用变分推断的深度生成模型，用于分析多种单细胞数据模态。

When to Use This Skill

何时使用该技能

Use this skill when:

Analyzing single-cell RNA-seq data (dimensionality reduction, batch correction, integration)
Working with single-cell ATAC-seq or chromatin accessibility data
Integrating multimodal data (CITE-seq, multiome, paired/unpaired datasets)
Analyzing spatial transcriptomics data (deconvolution, spatial mapping)
Performing differential expression analysis on single-cell data
Conducting cell type annotation or transfer learning tasks
Working with specialized single-cell modalities (methylation, cytometry, RNA velocity)
Building custom probabilistic models for single-cell analysis

在以下场景中使用该技能：

分析单细胞RNA-seq数据（降维、批次校正、整合）
处理单细胞ATAC-seq或染色质可及性数据
整合多模态数据（CITE-seq、多组学、配对/非配对数据集）
分析空间转录组学数据（反卷积、空间映射）
对单细胞数据进行差异表达分析
执行细胞类型注释或迁移学习任务
处理特殊单细胞模态数据（甲基化、流式细胞术、RNA velocity）
构建用于单细胞分析的自定义概率模型

Core Capabilities

核心功能

scvi-tools provides models organized by data modality:

scvi-tools提供按数据模态分类的模型：

1. Single-Cell RNA-seq Analysis

1. 单细胞RNA-seq分析

Core models for expression analysis, batch correction, and integration. See

references/models-scrna-seq.md

for:

scVI: Unsupervised dimensionality reduction and batch correction
scANVI: Semi-supervised cell type annotation and integration
AUTOZI: Zero-inflation detection and modeling
VeloVI: RNA velocity analysis
contrastiveVI: Perturbation effect isolation

用于表达分析、批次校正和整合的核心模型。详情见

references/models-scrna-seq.md

：

scVI: 无监督降维和批次校正
scANVI: 半监督细胞类型注释与整合
AUTOZI: 零膨胀检测与建模
VeloVI: RNA velocity分析
contrastiveVI: 扰动效应分离

2. Chromatin Accessibility (ATAC-seq)

2. 染色质可及性（ATAC-seq）

Models for analyzing single-cell chromatin data. See

references/models-atac-seq.md

for:

PeakVI: Peak-based ATAC-seq analysis and integration
PoissonVI: Quantitative fragment count modeling
scBasset: Deep learning approach with motif analysis

用于分析单细胞染色质数据的模型。详情见

references/models-atac-seq.md

：

PeakVI: 基于峰的ATAC-seq分析与整合
PoissonVI: 定量片段计数建模
scBasset: 结合基序分析的深度学习方法

3. Multimodal & Multi-omics Integration

3. 多模态与多组学整合

Joint analysis of multiple data types. See

references/models-multimodal.md

for:

totalVI: CITE-seq protein and RNA joint modeling
MultiVI: Paired and unpaired multi-omic integration
MrVI: Multi-resolution cross-sample analysis

多种数据类型的联合分析。详情见

references/models-multimodal.md

：

totalVI: CITE-seq蛋白与RNA联合建模
MultiVI: 配对与非配对多组学整合
MrVI: 多分辨率跨样本分析

4. Spatial Transcriptomics

4. 空间转录组学

Spatially-resolved transcriptomics analysis. See

references/models-spatial.md

for:

DestVI: Multi-resolution spatial deconvolution
Stereoscope: Cell type deconvolution
Tangram: Spatial mapping and integration
scVIVA: Cell-environment relationship analysis

空间分辨转录组学分析。详情见

references/models-spatial.md

：

DestVI: 多分辨率空间反卷积
Stereoscope: 细胞类型反卷积
Tangram: 空间映射与整合
scVIVA: 细胞-环境关系分析

5. Specialized Modalities

5. 特殊模态

Additional specialized analysis tools. See

references/models-specialized.md

for:

MethylVI/MethylANVI: Single-cell methylation analysis
CytoVI: Flow/mass cytometry batch correction
Solo: Doublet detection
CellAssign: Marker-based cell type annotation

额外的专用分析工具。详情见

references/models-specialized.md

：

MethylVI/MethylANVI: 单细胞甲基化分析
CytoVI: 流式/质谱细胞术批次校正
Solo: 双细胞检测
CellAssign: 基于标记物的细胞类型注释

Typical Workflow

典型工作流程

All scvi-tools models follow a consistent API pattern:

python

undefined

所有scvi-tools模型遵循一致的API模式：

python

undefined

1. Load and preprocess data (AnnData format)

import scvi import scanpy as sc

adata = scvi.data.heart_cell_atlas_subsampled() sc.pp.filter_genes(adata, min_counts=3) sc.pp.highly_variable_genes(adata, n_top_genes=1200)

import scvi import scanpy as sc

adata = scvi.data.heart_cell_atlas_subsampled() sc.pp.filter_genes(adata, min_counts=3) sc.pp.highly_variable_genes(adata, n_top_genes=1200)

2. Register data with model (specify layers, covariates)

scvi.model.SCVI.setup_anndata( adata, layer="counts", # Use raw counts, not log-normalized batch_key="batch", categorical_covariate_keys=["donor"], continuous_covariate_keys=["percent_mito"] )

3. Create and train model

model = scvi.model.SCVI(adata) model.train()

4. Extract latent representations and normalized values

latent = model.get_latent_representation() normalized = model.get_normalized_expression(library_size=1e4)

5. Store in AnnData for downstream analysis

adata.obsm["X_scVI"] = latent adata.layers["scvi_normalized"] = normalized

6. Downstream analysis with scanpy

sc.pp.neighbors(adata, use_rep="X_scVI") sc.tl.umap(adata) sc.tl.leiden(adata)


**Key Design Principles:**
- **Raw counts required**: Models expect unnormalized count data for optimal performance
- **Unified API**: Consistent interface across all models (setup → train → extract)
- **AnnData-centric**: Seamless integration with the scanpy ecosystem
- **GPU acceleration**: Automatic utilization of available GPUs
- **Batch correction**: Handle technical variation through covariate registration

sc.pp.neighbors(adata, use_rep="X_scVI") sc.tl.umap(adata) sc.tl.leiden(adata)


**核心设计原则：**
- **需要原始计数**：模型为获得最佳性能，要求输入未归一化的计数数据
- **统一API**：所有模型采用一致的接口（设置→训练→提取结果）
- **以AnnData为中心**：与scanpy生态系统无缝集成
- **GPU加速**：自动利用可用的GPU
- **批次校正**：通过协变量注册处理技术变异

Common Analysis Tasks

常见分析任务

Differential Expression

差异表达分析

Probabilistic DE analysis using the learned generative models:

python

de_results = model.differential_expression(
    groupby="cell_type",
    group1="TypeA",
    group2="TypeB",
    mode="change",  # Use composite hypothesis testing
    delta=0.25      # Minimum effect size threshold
)

See

references/differential-expression.md

for detailed methodology and interpretation.

使用学习到的生成模型进行概率性差异表达分析：

python

de_results = model.differential_expression(
    groupby="cell_type",
    group1="TypeA",
    group2="TypeB",
    mode="change",  # Use composite hypothesis testing
    delta=0.25      # Minimum effect size threshold
)

详情见

references/differential-expression.md

了解方法学细节及解读方式。

Model Persistence

模型持久化

Save and load trained models:

python

undefined

保存和加载已训练的模型：

python

undefined

Save model

model.save("./model_directory", overwrite=True)

Load model

model = scvi.model.SCVI.load("./model_directory", adata=adata)

undefined

model = scvi.model.SCVI.load("./model_directory", adata=adata)

undefined

Batch Correction and Integration

批次校正与整合

Integrate datasets across batches or studies:

python

undefined

整合不同批次或研究的数据集：

python

undefined

Register batch information

scvi.model.SCVI.setup_anndata(adata, batch_key="study")

Model automatically learns batch-corrected representations

model = scvi.model.SCVI(adata) model.train() latent = model.get_latent_representation() # Batch-corrected

undefined

model = scvi.model.SCVI(adata) model.train() latent = model.get_latent_representation() # Batch-corrected

undefined

Theoretical Foundations

理论基础

scvi-tools is built on:

Variational inference: Approximate posterior distributions for scalable Bayesian inference
Deep generative models: VAE architectures that learn complex data distributions
Amortized inference: Shared neural networks for efficient learning across cells
Probabilistic modeling: Principled uncertainty quantification and statistical testing

See

references/theoretical-foundations.md

for detailed background on the mathematical framework.

scvi-tools基于以下理论构建：

变分推断：近似后验分布以实现可扩展的贝叶斯推断
深度生成模型：学习复杂数据分布的VAE架构
摊销推断：共享神经网络以实现跨细胞的高效学习
概率建模：原则性的不确定性量化与统计检验

详情见

references/theoretical-foundations.md

了解数学框架的详细背景。

Additional Resources

额外资源

Workflows:
```
references/workflows.md
```
contains common workflows, best practices, hyperparameter tuning, and GPU optimization
Model References: Detailed documentation for each model category in the
```
references/
```
directory
Official Documentation: https://docs.scvi-tools.org/en/stable/
Tutorials: https://docs.scvi-tools.org/en/stable/tutorials/index.html
API Reference: https://docs.scvi-tools.org/en/stable/api/index.html

工作流：
```
references/workflows.md
```
包含常见工作流、最佳实践、超参数调优及GPU优化内容
模型参考：
```
references/
```
目录下包含各模型类别的详细文档
官方文档：https://docs.scvi-tools.org/en/stable/
教程：https://docs.scvi-tools.org/en/stable/tutorials/index.html
API参考：https://docs.scvi-tools.org/en/stable/api/index.html

Installation

安装

bash

uv pip install scvi-tools

bash

uv pip install scvi-tools

For GPU support

uv pip install scvi-tools[cuda]

undefined

uv pip install scvi-tools[cuda]

undefined

Best Practices

最佳实践

Use raw counts: Always provide unnormalized count data to models
Filter genes: Remove low-count genes before analysis (e.g.,
```
min_counts=3
```
)
Register covariates: Include known technical factors (batch, donor, etc.) in
```
setup_anndata
```
Feature selection: Use highly variable genes for improved performance
Model saving: Always save trained models to avoid retraining
GPU usage: Enable GPU acceleration for large datasets (
```
accelerator="gpu"
```
)
Scanpy integration: Store outputs in AnnData objects for downstream analysis

使用原始计数：始终向模型提供未归一化的计数数据
过滤基因：分析前移除低计数基因（例如
```
min_counts=3
```
）
注册协变量：在
```
setup_anndata
```
中纳入已知的技术因素（批次、供体等）
特征选择：使用高可变基因以提升性能
模型保存：始终保存已训练的模型以避免重复训练
GPU使用：对大型数据集启用GPU加速（
```
accelerator="gpu"
```
）
Scanpy整合：将输出存储在AnnData对象中以便后续分析