tiledbvcf
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTileDB-VCF
TileDB-VCF
Overview
概述
TileDB-VCF is a high-performance C++ library with Python and CLI interfaces for efficient storage and retrieval of genomic variant-call data. Built on TileDB's sparse array technology, it enables scalable ingestion of VCF/BCF files, incremental sample addition without expensive merging operations, and efficient parallel queries of variant data stored locally or in the cloud.
TileDB-VCF是一个高性能C++库,提供Python和CLI接口,用于高效存储和检索基因组变异调用数据。它基于TileDB的稀疏数组技术构建,支持可扩展地导入VCF/BCF文件、无需昂贵合并操作即可增量添加样本,以及对本地或云端存储的变异数据进行高效并行查询。
When to Use This Skill
何时使用该工具
This skill should be used when:
- Learning TileDB-VCF concepts and workflows
- Prototyping genomics analyses and pipelines
- Working with small-to-medium datasets (< 1000 samples)
- Need incremental addition of new samples to existing datasets
- Require efficient querying of specific genomic regions across many samples
- Working with cloud-stored variant data (S3, Azure, GCS)
- Need to export subsets of large VCF datasets
- Building variant databases for cohort studies
- Educational projects and method development
- Performance is critical for variant data operations
在以下场景中应使用该工具:
- 学习TileDB-VCF的概念与工作流程
- 基因组分析和流程的原型开发
- 处理中小型数据集(<1000个样本)
- 需要向现有数据集增量添加新样本
- 需要高效查询多个样本中特定基因组区域的变异数据
- 处理存储在云端的变异数据(S3、Azure、GCS)
- 需要导出大型VCF数据集的子集
- 为队列研究构建变异数据库
- 教育项目和方法开发
- 变异数据操作的性能至关重要时
Quick Start
快速开始
Installation
安装
Preferred Method: Conda/Mamba
bash
undefined推荐方法:Conda/Mamba
bash
undefinedEnter the following two lines if you are on a M1 Mac
Enter the following two lines if you are on a M1 Mac
CONDA_SUBDIR=osx-64
conda config --env --set subdir osx-64
CONDA_SUBDIR=osx-64
conda config --env --set subdir osx-64
Create the conda environment
Create the conda environment
conda create -n tiledb-vcf "python<3.10"
conda activate tiledb-vcf
conda create -n tiledb-vcf "python<3.10"
conda activate tiledb-vcf
Mamba is a faster and more reliable alternative to conda
Mamba is a faster and more reliable alternative to conda
conda install -c conda-forge mamba
conda install -c conda-forge mamba
Install TileDB-Py and TileDB-VCF, align with other useful libraries
Install TileDB-Py and TileDB-VCF, align with other useful libraries
mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
**Alternative: Docker Images**
```bash
docker pull tiledb/tiledbvcf-py # Python interface
docker pull tiledb/tiledbvcf-cli # Command-line interfacemamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
**替代方法:Docker镜像**
```bash
docker pull tiledb/tiledbvcf-py # Python interface
docker pull tiledb/tiledbvcf-cli # Command-line interfaceBasic Examples
基础示例
Create and populate a dataset:
python
import tiledbvcf创建并填充数据集:
python
import tiledbvcfCreate a new dataset
Create a new dataset
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
Ingest VCF files (must be single-sample with indexes)
Ingest VCF files (must be single-sample with indexes)
Requirements:
Requirements:
- VCFs must be single-sample (not multi-sample)
- VCFs must be single-sample (not multi-sample)
- Must have indexes: .csi (bcftools) or .tbi (tabix)
- Must have indexes: .csi (bcftools) or .tbi (tabix)
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
**Query variant data:**
```pythonds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
**查询变异数据:**
```pythonOpen existing dataset for reading
Open existing dataset for reading
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
Query specific regions and samples
Query specific regions and samples
df = ds.read(
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
samples=["sample1", "sample2", "sample3"]
)
print(df.head())
**Export to VCF:**
```python
import osdf = ds.read(
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
samples=["sample1", "sample2", "sample3"]
)
print(df.head())
**导出为VCF格式:**
```python
import osExport two VCF samples
Export two VCF samples
ds.export(
regions=["chr21:8220186-8405573"],
samples=["HG00101", "HG00097"],
output_format="v",
output_dir=os.path.expanduser("~"),
)
undefinedds.export(
regions=["chr21:8220186-8405573"],
samples=["HG00101", "HG00097"],
output_format="v",
output_dir=os.path.expanduser("~"),
)
undefinedCore Capabilities
核心功能
1. Dataset Creation and Ingestion
1. 数据集创建与导入
Create TileDB-VCF datasets and incrementally ingest variant data from multiple VCF/BCF files. This is appropriate for building population genomics databases and cohort studies.
Requirements:
- Single-sample VCFs only: Multi-sample VCFs are not supported
- Index files required: VCF/BCF files must have indexes (.csi or .tbi)
Common operations:
- Create new datasets with optimized array schemas
- Ingest single or multiple VCF/BCF files in parallel
- Add new samples incrementally without re-processing existing data
- Configure memory usage and compression settings
- Handle various VCF formats and INFO/FORMAT fields
- Resume interrupted ingestion processes
- Validate data integrity during ingestion
创建TileDB-VCF数据集,并从多个VCF/BCF文件中增量导入变异数据。这适用于构建群体基因组学数据库和队列研究。
要求:
- 仅支持单样本VCF:不支持多样本VCF
- 需要索引文件:VCF/BCF文件必须带有索引(.csi或.tbi)
常见操作:
- 使用优化的数组模式创建新数据集
- 并行导入单个或多个VCF/BCF文件
- 增量添加新样本,无需重新处理现有数据
- 配置内存使用和压缩设置
- 处理各种VCF格式以及INFO/FORMAT字段
- 恢复中断的导入过程
- 在导入期间验证数据完整性
2. Efficient Querying and Filtering
2. 高效查询与过滤
Query variant data with high performance across genomic regions, samples, and variant attributes. This is appropriate for association studies, variant discovery, and population analysis.
Common operations:
- Query specific genomic regions (single or multiple)
- Filter by sample names or sample groups
- Extract specific variant attributes (position, alleles, genotypes, quality)
- Access INFO and FORMAT fields efficiently
- Combine spatial and attribute-based filtering
- Stream large query results
- Perform aggregations across samples or regions
针对基因组区域、样本和变异属性进行高性能查询。这适用于关联研究、变异发现和群体分析。
常见操作:
- 查询特定基因组区域(单个或多个)
- 按样本名称或样本组过滤
- 提取特定变异属性(位置、等位基因、基因型、质量)
- 高效访问INFO和FORMAT字段
- 结合空间过滤和基于属性的过滤
- 流式处理大型查询结果
- 跨样本或区域执行聚合操作
3. Data Export and Interoperability
3. 数据导出与互操作性
Export data in various formats for downstream analysis or integration with other genomics tools. This is appropriate for sharing datasets, creating analysis subsets, or feeding other pipelines.
Common operations:
- Export to standard VCF/BCF formats
- Generate TSV files with selected fields
- Create sample/region-specific subsets
- Maintain data provenance and metadata
- Lossless data export preserving all annotations
- Compressed output formats
- Streaming exports for large datasets
以多种格式导出数据,用于下游分析或与其他基因组学工具集成。这适用于共享数据集、创建分析子集或为其他流程提供数据。
常见操作:
- 导出为标准VCF/BCF格式
- 生成包含选定字段的TSV文件
- 创建特定样本/区域的子集
- 维护数据来源和元数据
- 无损导出数据,保留所有注释
- 压缩输出格式
- 流式导出大型数据集
4. Population Genomics Workflows
4. 群体基因组学工作流程
TileDB-VCF excels at large-scale population genomics analyses requiring efficient access to variant data across many samples and genomic regions.
Common workflows:
- Genome-wide association studies (GWAS) data preparation
- Rare variant burden testing
- Population stratification analysis
- Allele frequency calculations across populations
- Quality control across large cohorts
- Variant annotation and filtering
- Cross-population comparative analysis
TileDB-VCF擅长处理需要高效访问跨多个样本和基因组区域的变异数据的大规模群体基因组学分析。
常见工作流程:
- 全基因组关联研究(GWAS)的数据准备
- 罕见变异负荷测试
- 群体分层分析
- 跨群体的等位基因频率计算
- 大型队列的质量控制
- 变异注释与过滤
- 跨群体比较分析
Key Concepts
核心概念
Array Schema and Data Model
数组模式与数据模型
TileDB-VCF Data Model:
- Variants stored as sparse arrays with genomic coordinates as dimensions
- Samples stored as attributes allowing efficient sample-specific queries
- INFO and FORMAT fields preserved with original data types
- Automatic compression and chunking for optimal storage
Schema Configuration:
python
undefinedTileDB-VCF数据模型:
- 变异以稀疏数组形式存储,基因组坐标作为维度
- 样本作为属性存储,支持高效的样本特异性查询
- INFO和FORMAT字段保留原始数据类型
- 自动压缩和分块以优化存储
模式配置:
python
undefinedCustom schema with specific tile extents
Custom schema with specific tile extents
config = tiledbvcf.ReadConfig(
memory_budget=2048, # MB
region_partition=(0, 3095677412), # Full genome
sample_partition=(0, 10000) # Up to 10k samples
)
undefinedconfig = tiledbvcf.ReadConfig(
memory_budget=2048, # MB
region_partition=(0, 3095677412), # Full genome
sample_partition=(0, 10000) # Up to 10k samples
)
undefinedCoordinate Systems and Regions
坐标系统与区域
Critical: TileDB-VCF uses 1-based genomic coordinates following VCF standard:
- Positions are 1-based (first base is position 1)
- Ranges are inclusive on both ends
- Region "chr1:1000-2000" includes positions 1000-2000 (1001 bases total)
Region specification formats:
python
undefined重要提示: TileDB-VCF遵循VCF标准,使用1-based基因组坐标:
- 位置为1-based(第一个碱基是位置1)
- 范围两端均包含在内
- 区域"chr1:1000-2000"包含位置1000-2000(共1001个碱基)
区域指定格式:
python
undefinedSingle region
Single region
regions = ["chr1:1000000-2000000"]
regions = ["chr1:1000000-2000000"]
Multiple regions
Multiple regions
regions = ["chr1:1000000-2000000", "chr2:500000-1500000"]
regions = ["chr1:1000000-2000000", "chr2:500000-1500000"]
Whole chromosome
Whole chromosome
regions = ["chr1"]
regions = ["chr1"]
BED-style (0-based, half-open converted internally)
BED-style (0-based, half-open converted internally)
regions = ["chr1:999999-2000000"] # Equivalent to 1-based chr1:1000000-2000000
undefinedregions = ["chr1:999999-2000000"] # Equivalent to 1-based chr1:1000000-2000000
undefinedMemory Management
内存管理
Performance considerations:
- Set appropriate memory budget based on available system memory
- Use streaming queries for very large result sets
- Partition large ingestions to avoid memory exhaustion
- Configure tile cache for repeated region access
- Use parallel ingestion for multiple files
- Optimize region queries by combining nearby regions
性能注意事项:
- 根据可用系统内存设置合适的内存预算
- 对非常大的结果集使用流式查询
- 拆分大型导入任务以避免内存耗尽
- 为重复区域访问配置 tile 缓存
- 对多个文件使用并行导入
- 通过合并邻近区域优化区域查询
Cloud Storage Integration
云存储集成
TileDB-VCF seamlessly works with cloud storage:
python
undefinedTileDB-VCF可无缝对接云存储:
python
undefinedS3 dataset
S3 dataset
ds = tiledbvcf.Dataset(uri="s3://bucket/dataset", mode="r")
ds = tiledbvcf.Dataset(uri="s3://bucket/dataset", mode="r")
Azure Blob Storage
Azure Blob Storage
ds = tiledbvcf.Dataset(uri="azure://container/dataset", mode="r")
ds = tiledbvcf.Dataset(uri="azure://container/dataset", mode="r")
Google Cloud Storage
Google Cloud Storage
ds = tiledbvcf.Dataset(uri="gcs://bucket/dataset", mode="r")
undefinedds = tiledbvcf.Dataset(uri="gcs://bucket/dataset", mode="r")
undefinedCommon Pitfalls
常见陷阱
- Memory exhaustion during ingestion: Use appropriate memory budget and batch processing for large VCF files
- Inefficient region queries: Combine nearby regions instead of many separate queries
- Missing sample names: Ensure sample names in VCF headers match query sample specifications
- Coordinate system confusion: Remember TileDB-VCF uses 1-based coordinates like VCF standard
- Large result sets: Use streaming or pagination for queries returning millions of variants
- Cloud permissions: Ensure proper authentication for cloud storage access
- Concurrent access: Multiple writers to the same dataset can cause corruption—use appropriate locking
- 导入期间内存耗尽:为大型VCF文件设置合适的内存预算并使用批处理
- 低效的区域查询:合并邻近区域,而非执行大量单独查询
- 样本名称缺失:确保VCF头中的样本名称与查询中的样本规格匹配
- 坐标系统混淆:记住TileDB-VCF像VCF标准一样使用1-based坐标
- 大型结果集:对返回数百万变异的查询使用流式处理或分页
- 云权限:确保拥有云存储访问的正确认证
- 并发访问:多个写入者操作同一数据集可能导致损坏——使用适当的锁机制
CLI Usage
CLI使用
TileDB-VCF provides a command-line interface with the following subcommands:
Available Subcommands:
- - Creates an empty TileDB-VCF dataset
create - - Ingests samples into a TileDB-VCF dataset
store - - Exports data from a TileDB-VCF dataset
export - - Lists all sample names present in a TileDB-VCF dataset
list - - Prints high-level statistics about a TileDB-VCF dataset
stat - - Utils for working with a TileDB-VCF dataset
utils - - Print the version information and exit
version
bash
undefinedTileDB-VCF提供命令行接口,包含以下子命令:
可用子命令:
- - 创建空的TileDB-VCF数据集
create - - 将样本导入TileDB-VCF数据集
store - - 从TileDB-VCF数据集导出数据
export - - 列出TileDB-VCF数据集中的所有样本名称
list - - 打印TileDB-VCF数据集的高级统计信息
stat - - 用于处理TileDB-VCF数据集的工具
utils - - 打印版本信息并退出
version
bash
undefinedCreate empty dataset
Create empty dataset
tiledbvcf create --uri my_dataset
tiledbvcf create --uri my_dataset
Ingest samples (requires single-sample VCFs with indexes)
Ingest samples (requires single-sample VCFs with indexes)
tiledbvcf store --uri my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
tiledbvcf store --uri my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
Export data
Export data
tiledbvcf export --uri my_dataset
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"
tiledbvcf export --uri my_dataset
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"
List all samples
List all samples
tiledbvcf list --uri my_dataset
tiledbvcf list --uri my_dataset
Show dataset statistics
Show dataset statistics
tiledbvcf stat --uri my_dataset
undefinedtiledbvcf stat --uri my_dataset
undefinedAdvanced Features
高级功能
Allele Frequency Analysis
等位基因频率分析
python
undefinedpython
undefinedCalculate allele frequencies
Calculate allele frequencies
af_df = tiledbvcf.read_allele_frequency(
uri="my_dataset",
regions=["chr1:1000000-2000000"],
samples=["sample1", "sample2", "sample3"]
)
undefinedaf_df = tiledbvcf.read_allele_frequency(
uri="my_dataset",
regions=["chr1:1000000-2000000"],
samples=["sample1", "sample2", "sample3"]
)
undefinedSample Quality Control
样本质量控制
python
undefinedpython
undefinedPerform sample QC
Perform sample QC
qc_results = tiledbvcf.sample_qc(
uri="my_dataset",
samples=["sample1", "sample2"]
)
undefinedqc_results = tiledbvcf.sample_qc(
uri="my_dataset",
samples=["sample1", "sample2"]
)
undefinedCustom Configurations
自定义配置
python
undefinedpython
undefinedAdvanced configuration
Advanced configuration
config = tiledbvcf.ReadConfig(
memory_budget=4096,
tiledb_config={
"sm.tile_cache_size": "1000000000",
"vfs.s3.region": "us-east-1"
}
)
undefinedconfig = tiledbvcf.ReadConfig(
memory_budget=4096,
tiledb_config={
"sm.tile_cache_size": "1000000000",
"vfs.s3.region": "us-east-1"
}
)
undefinedResources
资源
Getting Help
获取帮助
Open Source TileDB-VCF Resources
开源TileDB-VCF资源
Open Source Documentation:
- TileDB Academy: https://cloud.tiledb.com/academy/
- Population Genomics Guide: https://cloud.tiledb.com/academy/structure/life-sciences/population-genomics/
- TileDB-VCF GitHub: https://github.com/TileDB-Inc/TileDB-VCF
开源文档:
- TileDB学院:https://cloud.tiledb.com/academy/
- 群体基因组学指南:https://cloud.tiledb.com/academy/structure/life-sciences/population-genomics/
- TileDB-VCF GitHub:https://github.com/TileDB-Inc/TileDB-VCF
TileDB-Cloud Resources
TileDB-Cloud资源
For Large-Scale/Production Genomics:
- TileDB-Cloud Platform: https://cloud.tiledb.com
- TileDB Academy (All Documentation): https://cloud.tiledb.com/academy/
Getting Started:
- Free account signup: https://cloud.tiledb.com
- Contact: sales@tiledb.com for enterprise needs
适用于大规模/生产级基因组学:
- TileDB-Cloud平台:https://cloud.tiledb.com
- TileDB学院(所有文档):https://cloud.tiledb.com/academy/
入门指南:
- 免费账户注册:https://cloud.tiledb.com
- 企业需求联系:sales@tiledb.com
Scaling to TileDB-Cloud
扩展到TileDB-Cloud
When your genomics workloads outgrow single-node processing, TileDB-Cloud provides enterprise-scale capabilities for production genomics pipelines.
Note: This section covers TileDB-Cloud capabilities based on available documentation. For complete API details and current functionality, consult the official TileDB-Cloud documentation and API reference.
当你的基因组学工作负载超出单节点处理能力时,TileDB-Cloud为生产级基因组学流程提供企业级扩展能力。
注意:本节基于现有文档介绍TileDB-Cloud的功能。如需完整的API细节和当前功能,请查阅官方TileDB-Cloud文档和API参考。
Setting Up TileDB-Cloud
设置TileDB-Cloud
1. Create Account and Get API Token
bash
undefined1. 创建账户并获取API令牌
bash
undefinedSign up at https://cloud.tiledb.com
Sign up at https://cloud.tiledb.com
Generate API token in your account settings
Generate API token in your account settings
**2. Install TileDB-Cloud Python Client**
```bash
**2. 安装TileDB-Cloud Python客户端**
```bashBase installation
Base installation
pip install tiledb-cloud
pip install tiledb-cloud
With genomics-specific functionality
With genomics-specific functionality
pip install tiledb-cloud[life-sciences]
**3. Configure Authentication**
```bashpip install tiledb-cloud[life-sciences]
**3. 配置认证**
```bashSet environment variable with your API token
Set environment variable with your API token
export TILEDB_REST_TOKEN="your_api_token"
```python
import tiledb.cloudexport TILEDB_REST_TOKEN="your_api_token"
```python
import tiledb.cloudAuthentication is automatic via TILEDB_REST_TOKEN
Authentication is automatic via TILEDB_REST_TOKEN
No explicit login required in code
No explicit login required in code
undefinedundefinedMigrating from Open Source to TileDB-Cloud
从开源版迁移到TileDB-Cloud
Large-Scale Ingestion
python
undefined大规模导入
python
undefinedTileDB-Cloud: Distributed VCF ingestion
TileDB-Cloud: Distributed VCF ingestion
import tiledb.cloud.vcf
import tiledb.cloud.vcf
Use specialized VCF ingestion module
Use specialized VCF ingestion module
Note: Exact API requires TileDB-Cloud documentation
Note: Exact API requires TileDB-Cloud documentation
This represents the available functionality structure
This represents the available functionality structure
tiledb.cloud.vcf.ingestion.ingest_vcf_dataset(
source="s3://my-bucket/vcf-files/",
output="tiledb://my-namespace/large-dataset",
namespace="my-namespace",
acn="my-s3-credentials",
ingest_resources={"cpu": "16", "memory": "64Gi"}
)
**Distributed Query Processing**
```pythontiledb.cloud.vcf.ingestion.ingest_vcf_dataset(
source="s3://my-bucket/vcf-files/",
output="tiledb://my-namespace/large-dataset",
namespace="my-namespace",
acn="my-s3-credentials",
ingest_resources={"cpu": "16", "memory": "64Gi"}
)
**分布式查询处理**
```pythonTileDB-Cloud: VCF querying across distributed storage
TileDB-Cloud: VCF querying across distributed storage
import tiledb.cloud.vcf
import tiledbvcf
import tiledb.cloud.vcf
import tiledbvcf
Define the dataset URI
Define the dataset URI
dataset_uri = "tiledb://TileDB-Inc/gvcf-1kg-dragen-v376"
dataset_uri = "tiledb://TileDB-Inc/gvcf-1kg-dragen-v376"
Get all samples from the dataset
Get all samples from the dataset
ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=cfg)
samples = ds.samples()
ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=cfg)
samples = ds.samples()
Define attributes and ranges to query on
Define attributes and ranges to query on
attrs = ["sample_name", "fmt_GT", "fmt_AD", "fmt_DP"]
regions = ["chr13:32396898-32397044", "chr13:32398162-32400268"]
attrs = ["sample_name", "fmt_GT", "fmt_AD", "fmt_DP"]
regions = ["chr13:32396898-32397044", "chr13:32398162-32400268"]
Perform the read, which is executed in a distributed fashion
Perform the read, which is executed in a distributed fashion
df = tiledb.cloud.vcf.read(
dataset_uri=dataset_uri,
regions=regions,
samples=samples,
attrs=attrs,
namespace="my-namespace", # specifies which account to charge
)
df.to_pandas()
undefineddf = tiledb.cloud.vcf.read(
dataset_uri=dataset_uri,
regions=regions,
samples=samples,
attrs=attrs,
namespace="my-namespace", # specifies which account to charge
)
df.to_pandas()
undefinedEnterprise Features
企业功能
Data Sharing and Collaboration
python
undefined数据共享与协作
python
undefinedTileDB-Cloud provides enterprise data sharing capabilities
TileDB-Cloud provides enterprise data sharing capabilities
through namespace-based permissions and group management
through namespace-based permissions and group management
Access shared datasets via TileDB-Cloud URIs
Access shared datasets via TileDB-Cloud URIs
dataset_uri = "tiledb://shared-namespace/population-study"
dataset_uri = "tiledb://shared-namespace/population-study"
Collaborate through shared notebooks and compute resources
Collaborate through shared notebooks and compute resources
(Specific API requires TileDB-Cloud documentation)
(Specific API requires TileDB-Cloud documentation)
**Cost Optimization**
- **Serverless Compute**: Pay only for actual compute time
- **Auto-scaling**: Automatically scale up/down based on workload
- **Spot Instances**: Use cost-optimized compute for batch jobs
- **Data Tiering**: Automatic hot/cold storage management
**Security and Compliance**
- **End-to-end Encryption**: Data encrypted in transit and at rest
- **Access Controls**: Fine-grained permissions and audit logs
- **HIPAA/SOC2 Compliance**: Enterprise security standards
- **VPC Support**: Deploy in private cloud environments
**成本优化**
- **无服务器计算**:仅为实际计算时间付费
- **自动扩缩容**:根据工作负载自动向上/向下扩展
- **Spot实例**:为批处理作业使用成本优化的计算资源
- **数据分层**:自动热/冷存储管理
**安全性与合规性**
- **端到端加密**:数据在传输和存储时均加密
- **访问控制**:细粒度权限和审计日志
- **HIPAA/SOC2合规**:企业级安全标准
- **VPC支持**:部署在私有云环境中When to Migrate Checklist
迁移时机检查清单
✅ Migrate to TileDB-Cloud if you have:
- Datasets > 1000 samples
- Need to process > 100GB of VCF data
- Require distributed computing
- Multiple team members need access
- Need enterprise security/compliance
- Want cost-optimized serverless compute
- Require 24/7 production uptime
✅ 如果满足以下条件,迁移到TileDB-Cloud:
- 数据集样本数>1000
- 需要处理>100GB的VCF数据
- 需要分布式计算
- 多个团队成员需要访问
- 需要企业级安全/合规性
- 想要成本优化的无服务器计算
- 需要7×24小时生产级可用性
Getting Started with TileDB-Cloud
TileDB-Cloud入门
- Start Free: TileDB-Cloud offers free tier for evaluation
- Migration Support: TileDB team provides migration assistance
- Training: Access to genomics-specific tutorials and examples
- Professional Services: Custom deployment and optimization
Next Steps:
- Visit https://cloud.tiledb.com to create account
- Review documentation at https://cloud.tiledb.com/academy/
- Contact sales@tiledb.com for enterprise needs
- 免费开始:TileDB-Cloud提供免费试用版用于评估
- 迁移支持:TileDB团队提供迁移协助
- 培训:获取基因组学特定教程和示例
- 专业服务:定制部署和优化
下一步:
- 访问https://cloud.tiledb.com创建账户
- 查看https://cloud.tiledb.com/academy/上的文档
- 企业需求请联系sales@tiledb.com