boltzgen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BoltzGen All-Atom Design

BoltzGen全原子蛋白质设计

Prerequisites

前置要求

RequirementMinimumRecommended
Python3.10+3.11
CUDA12.0+12.1+
GPU VRAM24GB48GB (L40S)
RAM32GB64GB
要求最低配置推荐配置
Python3.10及以上3.11
CUDA12.0及以上12.1及以上
GPU显存24GB48GB(L40S)
内存32GB64GB

How to run

运行方法

First time? See Installation Guide to set up Modal and biomodals.
**首次使用?**请查看安装指南设置Modal和biomodals。

Option 1: Modal (recommended)

选项1:Modal(推荐)

bash
undefined
bash
undefined

Clone biomodals

Clone biomodals

git clone https://github.com/hgbrian/biomodals && cd biomodals
git clone https://github.com/hgbrian/biomodals && cd biomodals

Run BoltzGen (requires YAML config file)

Run BoltzGen (requires YAML config file)

modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 50
modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 50

With custom GPU

With custom GPU

GPU=L40S modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 100

**GPU**: L40S (48GB) recommended | **Timeout**: 120min default

**Available protocols**: `protein-anything`, `peptide-anything`, `protein-small_molecule`, `nanobody-anything`, `antibody-anything`
GPU=L40S modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 100

**GPU**:推荐使用L40S(48GB) | **超时时间**:默认120分钟

**可用协议**:`protein-anything`、`peptide-anything`、`protein-small_molecule`、`nanobody-anything`、`antibody-anything`

Option 2: Local installation

选项2:本地安装

bash
git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .

python sample.py config=config.yaml
bash
git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .

python sample.py config=config.yaml

Option 3: Python API

选项3:Python API

python
from boltzgen import BoltzGen

model = BoltzGen.load_pretrained()
designs = model.sample(
    target_pdb="target.pdb",
    num_samples=50,
    binder_length=80
)
GPU: L40S (48GB) | Time: ~30-60s per design
python
from boltzgen import BoltzGen

model = BoltzGen.load_pretrained()
designs = model.sample(
    target_pdb="target.pdb",
    num_samples=50,
    binder_length=80
)
GPU:L40S(48GB) | 时间:每个设计约30-60秒

Key parameters (CLI)

关键参数(命令行界面)

ParameterDefaultDescription
--input-yaml
requiredPath to YAML design specification
--protocol
protein-anything
Design protocol
--num-designs
10Number of designs to generate
--steps
allPipeline steps to run (e.g.,
design inverse_folding
)
参数默认值描述
--input-yaml
必填YAML设计规范文件路径
--protocol
protein-anything
设计协议
--num-designs
10要生成的设计数量
--steps
全部要运行的流水线步骤(例如:
design inverse_folding

YAML configuration

YAML配置

BoltzGen uses an entity-based YAML format where you specify designed proteins and target structures as entities.
Important notes:
  • Residue indices use
    label_seq_id
    (1-indexed), not author residue numbers
  • File paths are relative to the YAML file location
  • Target files should be in CIF format (PDB also works but CIF preferred)
  • Run
    boltzgen check config.yaml
    to verify your specification before running
BoltzGen采用基于实体的YAML格式,你可以在其中将待设计的蛋白质和目标结构指定为实体。
重要说明
  • 残基索引使用
    label_seq_id
    (从1开始计数),而非作者定义的残基编号
  • 文件路径相对于YAML文件所在位置
  • 目标文件应为CIF格式(PDB格式也可使用,但优先推荐CIF)
  • 运行前请执行
    boltzgen check config.yaml
    验证你的配置

Basic Binder Config

基础结合剂配置

yaml
entities:
  # Designed protein (variable length 80-140 residues)
  - protein:
      id: B
      sequence: 80..140

  # Target from structure file
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      # Specify binding site residues (optional but recommended)
      binding_types:
        - chain:
            id: A
            binding: 45,67,89
yaml
entities:
  # Designed protein (variable length 80-140 residues)
  - protein:
      id: B
      sequence: 80..140

  # Target from structure file
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      # Specify binding site residues (optional but recommended)
      binding_types:
        - chain:
            id: A
            binding: 45,67,89

Binder with Specific Binding Site

带特定结合位点的结合剂配置

yaml
entities:
  - protein:
      id: G
      sequence: 60..100

  - file:
      path: 5cqg.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 343,344,251
      structure_groups: "all"
yaml
entities:
  - protein:
      id: G
      sequence: 60..100

  - file:
      path: 5cqg.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 343,344,251
      structure_groups: "all"

Peptide Design (Cyclic)

环状肽设计

yaml
entities:
  - protein:
      id: S
      sequence: 10..14C6C3  # With cysteines for disulfide

  - file:
      path: target.cif
      include:
        - chain:
            id: A

constraints:
  - bond:
      atom1: [S, 11, SG]
      atom2: [S, 18, SG]  # Disulfide bond
yaml
entities:
  - protein:
      id: S
      sequence: 10..14C6C3  # With cysteines for disulfide

  - file:
      path: target.cif
      include:
        - chain:
            id: A

constraints:
  - bond:
      atom1: [S, 11, SG]
      atom2: [S, 18, SG]  # Disulfide bond

Design protocols

设计协议

ProtocolUse Case
protein-anything
Design proteins to bind proteins or peptides
peptide-anything
Design cyclic peptides to bind proteins
protein-small_molecule
Design proteins to bind small molecules
nanobody-anything
Design nanobody CDRs
antibody-anything
Design antibody CDRs
协议适用场景
protein-anything
设计与蛋白质或肽结合的蛋白质
peptide-anything
设计与蛋白质结合的环状肽
protein-small_molecule
设计与小分子结合的蛋白质
nanobody-anything
设计纳米抗体CDR区域
antibody-anything
设计抗体CDR区域

Output format

输出格式

output/
├── sample_0/
│   ├── design.cif         # All-atom structure (CIF format)
│   ├── metrics.json       # Confidence scores
│   └── sequence.fasta     # Sequence
├── sample_1/
│   └── ...
└── summary.csv
Note: BoltzGen outputs CIF format. Convert to PDB if needed:
python
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")
output/
├── sample_0/
│   ├── design.cif         # All-atom structure (CIF format)
│   ├── metrics.json       # Confidence scores
│   └── sequence.fasta     # Sequence
├── sample_1/
│   └── ...
└── summary.csv
注意:BoltzGen输出CIF格式。如需转换为PDB格式,请使用以下代码:
python
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")

Sample output

示例输出

Successful run

运行成功示例

$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete

Results saved to: ./out/boltzgen/2501161234/
Output directory structure:
out/boltzgen/2501161234/
├── intermediate_designs/           # Raw diffusion outputs
│   ├── design_0.cif
│   └── design_0.npz
├── intermediate_designs_inverse_folded/
│   ├── refold_cif/                 # Refolded complexes
│   └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
    ├── final_10_designs/           # Top designs
    └── results_overview.pdf        # Summary plots
What good output looks like:
  • Refolding RMSD < 2.0A (design folds as predicted)
  • ipTM > 0.5 (confident interface)
  • All designs complete pipeline without errors
$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete

Results saved to: ./out/boltzgen/2501161234/
输出目录结构
out/boltzgen/2501161234/
├── intermediate_designs/           # Raw diffusion outputs
│   ├── design_0.cif
│   └── design_0.npz
├── intermediate_designs_inverse_folded/
│   ├── refold_cif/                 # Refolded complexes
│   └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
    ├── final_10_designs/           # Top designs
    └── results_overview.pdf        # Summary plots
优质输出特征
  • 重折叠RMSD < 2.0Å(设计结构与预测一致)
  • ipTM > 0.5(结合界面可信度高)
  • 所有设计均顺利完成流水线无错误

Decision tree

决策树

Should I use BoltzGen?
├─ What type of design?
│  ├─ All-atom precision needed → BoltzGen ✓
│  ├─ Ligand binding pocket → BoltzGen ✓
│  └─ Standard miniprotein → RFdiffusion (faster)
├─ What matters most?
│  ├─ Side-chain packing → BoltzGen ✓
│  ├─ Speed / diversity → RFdiffusion
│  ├─ Highest success rate → BindCraft
│  └─ AF2 optimization → ColabDesign
└─ Compute resources?
   ├─ Have L40S/A100 (48GB+) → BoltzGen ✓
   └─ Only A10G (24GB) → Consider RFdiffusion
Should I use BoltzGen?
├─ What type of design?
│  ├─ All-atom precision needed → BoltzGen ✓
│  ├─ Ligand binding pocket → BoltzGen ✓
│  └─ Standard miniprotein → RFdiffusion (faster)
├─ What matters most?
│  ├─ Side-chain packing → BoltzGen ✓
│  ├─ Speed / diversity → RFdiffusion
│  ├─ Highest success rate → BindCraft
│  └─ AF2 optimization → ColabDesign
└─ Compute resources?
   ├─ Have L40S/A100 (48GB+) → BoltzGen ✓
   └─ Only A10G (24GB) → Consider RFdiffusion

Typical performance

典型性能

Campaign SizeTime (L40S)Cost (Modal)Notes
50 designs30-45 min~$8Quick exploration
100 designs1-1.5h~$15Standard campaign
500 designs5-8h~$70Large campaign
Per-design: ~30-60s for typical binder.

任务规模耗时(L40S)成本(Modal)说明
50个设计30-45分钟约8美元快速探索
100个设计1-1.5小时约15美元标准任务
500个设计5-8小时约70美元大规模任务
单个设计:典型结合剂约30-60秒。

Verify

验证

bash
find output -name "*.cif" | wc -l  # Should match num_samples

bash
find output -name "*.cif" | wc -l  # Should match num_samples

Troubleshooting

故障排除

Verify config first: Always run
boltzgen check config.yaml
before running the full pipeline Slow generation: Use fewer designs for initial testing, then scale up OOM errors: Use A100-80GB or reduce
--num-designs
Wrong binding site: Residue indices use
label_seq_id
(1-indexed), check in Molstar viewer
先验证配置:运行完整流水线前,请始终执行
boltzgen check config.yaml
生成速度慢:初始测试时减少设计数量,再逐步扩容 内存不足错误:使用A100-80GB或减少设计数量 结合位点错误:残基索引使用
label_seq_id
(从1开始计数),请在Molstar查看器中确认

Error interpretation

错误解析

ErrorCauseFix
RuntimeError: CUDA out of memory
Large design or long proteinUse A100-80GB or reduce designs
FileNotFoundError: *.cif
Target file not foundFile paths are relative to YAML location
ValueError: invalid chain
Chain not in targetVerify chain IDs with Molstar or PyMOL
modal: command not found
Modal CLI not installedRun
pip install modal && modal setup

Next: Validate with
boltz
or
chai
protein-qc
for filtering.
错误原因解决方法
RuntimeError: CUDA out of memory
设计规模大或蛋白质序列长使用A100-80GB或减少设计数量
FileNotFoundError: *.cif
目标文件未找到文件路径相对于YAML文件所在位置
ValueError: invalid chain
目标中无该链使用Molstar或PyMOL验证链ID
modal: command not found
未安装Modal命令行工具执行
pip install modal && modal setup

下一步:使用
boltz
chai
protein-qc
进行过滤验证。