scvi-tools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesescvi-tools
scvi-tools
Overview
概述
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
scvi-tools是一个用于单细胞基因组学概率建模的综合性Python框架。它基于PyTorch和PyTorch Lightning构建,提供采用变分推断的深度生成模型,用于分析多种单细胞数据模态。
When to Use This Skill
何时使用该技能
Use this skill when:
- Analyzing single-cell RNA-seq data (dimensionality reduction, batch correction, integration)
- Working with single-cell ATAC-seq or chromatin accessibility data
- Integrating multimodal data (CITE-seq, multiome, paired/unpaired datasets)
- Analyzing spatial transcriptomics data (deconvolution, spatial mapping)
- Performing differential expression analysis on single-cell data
- Conducting cell type annotation or transfer learning tasks
- Working with specialized single-cell modalities (methylation, cytometry, RNA velocity)
- Building custom probabilistic models for single-cell analysis
在以下场景中使用该技能:
- 分析单细胞RNA-seq数据(降维、批次校正、整合)
- 处理单细胞ATAC-seq或染色质可及性数据
- 整合多模态数据(CITE-seq、多组学、配对/非配对数据集)
- 分析空间转录组学数据(反卷积、空间映射)
- 对单细胞数据进行差异表达分析
- 执行细胞类型注释或迁移学习任务
- 处理特殊单细胞模态数据(甲基化、流式细胞术、RNA velocity)
- 构建用于单细胞分析的自定义概率模型
Core Capabilities
核心功能
scvi-tools provides models organized by data modality:
scvi-tools提供按数据模态分类的模型:
1. Single-Cell RNA-seq Analysis
1. 单细胞RNA-seq分析
Core models for expression analysis, batch correction, and integration. See for:
references/models-scrna-seq.md- scVI: Unsupervised dimensionality reduction and batch correction
- scANVI: Semi-supervised cell type annotation and integration
- AUTOZI: Zero-inflation detection and modeling
- VeloVI: RNA velocity analysis
- contrastiveVI: Perturbation effect isolation
用于表达分析、批次校正和整合的核心模型。详情见:
references/models-scrna-seq.md- scVI: 无监督降维和批次校正
- scANVI: 半监督细胞类型注释与整合
- AUTOZI: 零膨胀检测与建模
- VeloVI: RNA velocity分析
- contrastiveVI: 扰动效应分离
2. Chromatin Accessibility (ATAC-seq)
2. 染色质可及性(ATAC-seq)
Models for analyzing single-cell chromatin data. See for:
references/models-atac-seq.md- PeakVI: Peak-based ATAC-seq analysis and integration
- PoissonVI: Quantitative fragment count modeling
- scBasset: Deep learning approach with motif analysis
用于分析单细胞染色质数据的模型。详情见:
references/models-atac-seq.md- PeakVI: 基于峰的ATAC-seq分析与整合
- PoissonVI: 定量片段计数建模
- scBasset: 结合基序分析的深度学习方法
3. Multimodal & Multi-omics Integration
3. 多模态与多组学整合
Joint analysis of multiple data types. See for:
references/models-multimodal.md- totalVI: CITE-seq protein and RNA joint modeling
- MultiVI: Paired and unpaired multi-omic integration
- MrVI: Multi-resolution cross-sample analysis
多种数据类型的联合分析。详情见:
references/models-multimodal.md- totalVI: CITE-seq蛋白与RNA联合建模
- MultiVI: 配对与非配对多组学整合
- MrVI: 多分辨率跨样本分析
4. Spatial Transcriptomics
4. 空间转录组学
Spatially-resolved transcriptomics analysis. See for:
references/models-spatial.md- DestVI: Multi-resolution spatial deconvolution
- Stereoscope: Cell type deconvolution
- Tangram: Spatial mapping and integration
- scVIVA: Cell-environment relationship analysis
空间分辨转录组学分析。详情见:
references/models-spatial.md- DestVI: 多分辨率空间反卷积
- Stereoscope: 细胞类型反卷积
- Tangram: 空间映射与整合
- scVIVA: 细胞-环境关系分析
5. Specialized Modalities
5. 特殊模态
Additional specialized analysis tools. See for:
references/models-specialized.md- MethylVI/MethylANVI: Single-cell methylation analysis
- CytoVI: Flow/mass cytometry batch correction
- Solo: Doublet detection
- CellAssign: Marker-based cell type annotation
额外的专用分析工具。详情见:
references/models-specialized.md- MethylVI/MethylANVI: 单细胞甲基化分析
- CytoVI: 流式/质谱细胞术批次校正
- Solo: 双细胞检测
- CellAssign: 基于标记物的细胞类型注释
Typical Workflow
典型工作流程
All scvi-tools models follow a consistent API pattern:
python
undefined所有scvi-tools模型遵循一致的API模式:
python
undefined1. Load and preprocess data (AnnData format)
1. Load and preprocess data (AnnData format)
import scvi
import scanpy as sc
adata = scvi.data.heart_cell_atlas_subsampled()
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.highly_variable_genes(adata, n_top_genes=1200)
import scvi
import scanpy as sc
adata = scvi.data.heart_cell_atlas_subsampled()
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.highly_variable_genes(adata, n_top_genes=1200)
2. Register data with model (specify layers, covariates)
2. Register data with model (specify layers, covariates)
scvi.model.SCVI.setup_anndata(
adata,
layer="counts", # Use raw counts, not log-normalized
batch_key="batch",
categorical_covariate_keys=["donor"],
continuous_covariate_keys=["percent_mito"]
)
scvi.model.SCVI.setup_anndata(
adata,
layer="counts", # Use raw counts, not log-normalized
batch_key="batch",
categorical_covariate_keys=["donor"],
continuous_covariate_keys=["percent_mito"]
)
3. Create and train model
3. Create and train model
model = scvi.model.SCVI(adata)
model.train()
model = scvi.model.SCVI(adata)
model.train()
4. Extract latent representations and normalized values
4. Extract latent representations and normalized values
latent = model.get_latent_representation()
normalized = model.get_normalized_expression(library_size=1e4)
latent = model.get_latent_representation()
normalized = model.get_normalized_expression(library_size=1e4)
5. Store in AnnData for downstream analysis
5. Store in AnnData for downstream analysis
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = normalized
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = normalized
6. Downstream analysis with scanpy
6. Downstream analysis with scanpy
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)
**Key Design Principles:**
- **Raw counts required**: Models expect unnormalized count data for optimal performance
- **Unified API**: Consistent interface across all models (setup → train → extract)
- **AnnData-centric**: Seamless integration with the scanpy ecosystem
- **GPU acceleration**: Automatic utilization of available GPUs
- **Batch correction**: Handle technical variation through covariate registrationsc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)
**核心设计原则:**
- **需要原始计数**:模型为获得最佳性能,要求输入未归一化的计数数据
- **统一API**:所有模型采用一致的接口(设置→训练→提取结果)
- **以AnnData为中心**:与scanpy生态系统无缝集成
- **GPU加速**:自动利用可用的GPU
- **批次校正**:通过协变量注册处理技术变异Common Analysis Tasks
常见分析任务
Differential Expression
差异表达分析
Probabilistic DE analysis using the learned generative models:
python
de_results = model.differential_expression(
groupby="cell_type",
group1="TypeA",
group2="TypeB",
mode="change", # Use composite hypothesis testing
delta=0.25 # Minimum effect size threshold
)See for detailed methodology and interpretation.
references/differential-expression.md使用学习到的生成模型进行概率性差异表达分析:
python
de_results = model.differential_expression(
groupby="cell_type",
group1="TypeA",
group2="TypeB",
mode="change", # Use composite hypothesis testing
delta=0.25 # Minimum effect size threshold
)详情见了解方法学细节及解读方式。
references/differential-expression.mdModel Persistence
模型持久化
Save and load trained models:
python
undefined保存和加载已训练的模型:
python
undefinedSave model
Save model
model.save("./model_directory", overwrite=True)
model.save("./model_directory", overwrite=True)
Load model
Load model
model = scvi.model.SCVI.load("./model_directory", adata=adata)
undefinedmodel = scvi.model.SCVI.load("./model_directory", adata=adata)
undefinedBatch Correction and Integration
批次校正与整合
Integrate datasets across batches or studies:
python
undefined整合不同批次或研究的数据集:
python
undefinedRegister batch information
Register batch information
scvi.model.SCVI.setup_anndata(adata, batch_key="study")
scvi.model.SCVI.setup_anndata(adata, batch_key="study")
Model automatically learns batch-corrected representations
Model automatically learns batch-corrected representations
model = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation() # Batch-corrected
undefinedmodel = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation() # Batch-corrected
undefinedTheoretical Foundations
理论基础
scvi-tools is built on:
- Variational inference: Approximate posterior distributions for scalable Bayesian inference
- Deep generative models: VAE architectures that learn complex data distributions
- Amortized inference: Shared neural networks for efficient learning across cells
- Probabilistic modeling: Principled uncertainty quantification and statistical testing
See for detailed background on the mathematical framework.
references/theoretical-foundations.mdscvi-tools基于以下理论构建:
- 变分推断:近似后验分布以实现可扩展的贝叶斯推断
- 深度生成模型:学习复杂数据分布的VAE架构
- 摊销推断:共享神经网络以实现跨细胞的高效学习
- 概率建模:原则性的不确定性量化与统计检验
详情见了解数学框架的详细背景。
references/theoretical-foundations.mdAdditional Resources
额外资源
- Workflows: contains common workflows, best practices, hyperparameter tuning, and GPU optimization
references/workflows.md - Model References: Detailed documentation for each model category in the directory
references/ - Official Documentation: https://docs.scvi-tools.org/en/stable/
- Tutorials: https://docs.scvi-tools.org/en/stable/tutorials/index.html
- API Reference: https://docs.scvi-tools.org/en/stable/api/index.html
- 工作流:包含常见工作流、最佳实践、超参数调优及GPU优化内容
references/workflows.md - 模型参考:目录下包含各模型类别的详细文档
references/ - 官方文档:https://docs.scvi-tools.org/en/stable/
- 教程:https://docs.scvi-tools.org/en/stable/tutorials/index.html
- API参考:https://docs.scvi-tools.org/en/stable/api/index.html
Installation
安装
bash
uv pip install scvi-toolsbash
uv pip install scvi-toolsFor GPU support
For GPU support
uv pip install scvi-tools[cuda]
undefineduv pip install scvi-tools[cuda]
undefinedBest Practices
最佳实践
- Use raw counts: Always provide unnormalized count data to models
- Filter genes: Remove low-count genes before analysis (e.g., )
min_counts=3 - Register covariates: Include known technical factors (batch, donor, etc.) in
setup_anndata - Feature selection: Use highly variable genes for improved performance
- Model saving: Always save trained models to avoid retraining
- GPU usage: Enable GPU acceleration for large datasets ()
accelerator="gpu" - Scanpy integration: Store outputs in AnnData objects for downstream analysis
- 使用原始计数:始终向模型提供未归一化的计数数据
- 过滤基因:分析前移除低计数基因(例如)
min_counts=3 - 注册协变量:在中纳入已知的技术因素(批次、供体等)
setup_anndata - 特征选择:使用高可变基因以提升性能
- 模型保存:始终保存已训练的模型以避免重复训练
- GPU使用:对大型数据集启用GPU加速()
accelerator="gpu" - Scanpy整合:将输出存储在AnnData对象中以便后续分析