boltzgen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

BoltzGen All-Atom Design

BoltzGen全原子蛋白质设计

Prerequisites

前置要求

Requirement	Minimum	Recommended
Python	3.10+	3.11
CUDA	12.0+	12.1+
GPU VRAM	24GB	48GB (L40S)
RAM	32GB	64GB

要求	最低配置	推荐配置
Python	3.10及以上	3.11
CUDA	12.0及以上	12.1及以上
GPU显存	24GB	48GB（L40S）
内存	32GB	64GB

How to run

运行方法

First time? See Installation Guide to set up Modal and biomodals.

**首次使用？**请查看安装指南设置Modal和biomodals。

Option 1: Modal (recommended)

选项1：Modal（推荐）

bash

undefined

bash

undefined

Clone biomodals

git clone https://github.com/hgbrian/biomodals && cd biomodals

Run BoltzGen (requires YAML config file)

modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 50

With custom GPU

GPU=L40S modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 100


**GPU**: L40S (48GB) recommended | **Timeout**: 120min default

**Available protocols**: `protein-anything`, `peptide-anything`, `protein-small_molecule`, `nanobody-anything`, `antibody-anything`

GPU=L40S modal run modal_boltzgen.py
--input-yaml binder_config.yaml
--protocol protein-anything
--num-designs 100


**GPU**：推荐使用L40S（48GB） | **超时时间**：默认120分钟

**可用协议**：`protein-anything`、`peptide-anything`、`protein-small_molecule`、`nanobody-anything`、`antibody-anything`

Option 2: Local installation

选项2：本地安装

bash

git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .

python sample.py config=config.yaml

bash

git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .

python sample.py config=config.yaml

Option 3: Python API

选项3：Python API

python

from boltzgen import BoltzGen

model = BoltzGen.load_pretrained()
designs = model.sample(
    target_pdb="target.pdb",
    num_samples=50,
    binder_length=80
)

GPU: L40S (48GB) | Time: ~30-60s per design

python

from boltzgen import BoltzGen

model = BoltzGen.load_pretrained()
designs = model.sample(
    target_pdb="target.pdb",
    num_samples=50,
    binder_length=80
)

GPU：L40S（48GB） | 时间：每个设计约30-60秒

Key parameters (CLI)

关键参数（命令行界面）

Parameter	Default	Description
`--input-yaml`	required	Path to YAML design specification
`--protocol`	`protein-anything`	Design protocol
`--num-designs`	10	Number of designs to generate
`--steps`	all	Pipeline steps to run (e.g., `design inverse_folding` )

参数	默认值	描述
`--input-yaml`	必填	YAML设计规范文件路径
`--protocol`	`protein-anything`	设计协议
`--num-designs`	10	要生成的设计数量
`--steps`	全部	要运行的流水线步骤（例如： `design inverse_folding` ）

YAML configuration

YAML配置

BoltzGen uses an entity-based YAML format where you specify designed proteins and target structures as entities.

Important notes:

Residue indices use
```
label_seq_id
```
(1-indexed), not author residue numbers
File paths are relative to the YAML file location
Target files should be in CIF format (PDB also works but CIF preferred)
Run
```
boltzgen check config.yaml
```
to verify your specification before running

BoltzGen采用基于实体的YAML格式，你可以在其中将待设计的蛋白质和目标结构指定为实体。

重要说明：

残基索引使用
```
label_seq_id
```
（从1开始计数），而非作者定义的残基编号
文件路径相对于YAML文件所在位置
目标文件应为CIF格式（PDB格式也可使用，但优先推荐CIF）
运行前请执行
```
boltzgen check config.yaml
```
验证你的配置

Basic Binder Config

基础结合剂配置

yaml

entities:
  # Designed protein (variable length 80-140 residues)
  - protein:
      id: B
      sequence: 80..140

  # Target from structure file
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      # Specify binding site residues (optional but recommended)
      binding_types:
        - chain:
            id: A
            binding: 45,67,89

yaml

entities:
  # Designed protein (variable length 80-140 residues)
  - protein:
      id: B
      sequence: 80..140

  # Target from structure file
  - file:
      path: target.cif
      include:
        - chain:
            id: A
      # Specify binding site residues (optional but recommended)
      binding_types:
        - chain:
            id: A
            binding: 45,67,89

Binder with Specific Binding Site

带特定结合位点的结合剂配置

yaml

entities:
  - protein:
      id: G
      sequence: 60..100

  - file:
      path: 5cqg.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 343,344,251
      structure_groups: "all"

yaml

entities:
  - protein:
      id: G
      sequence: 60..100

  - file:
      path: 5cqg.cif
      include:
        - chain:
            id: A
      binding_types:
        - chain:
            id: A
            binding: 343,344,251
      structure_groups: "all"

Peptide Design (Cyclic)

环状肽设计

yaml

entities:
  - protein:
      id: S
      sequence: 10..14C6C3  # With cysteines for disulfide

  - file:
      path: target.cif
      include:
        - chain:
            id: A

constraints:
  - bond:
      atom1: [S, 11, SG]
      atom2: [S, 18, SG]  # Disulfide bond

yaml

entities:
  - protein:
      id: S
      sequence: 10..14C6C3  # With cysteines for disulfide

  - file:
      path: target.cif
      include:
        - chain:
            id: A

constraints:
  - bond:
      atom1: [S, 11, SG]
      atom2: [S, 18, SG]  # Disulfide bond

Design protocols

设计协议

Protocol	Use Case
`protein-anything`	Design proteins to bind proteins or peptides
`peptide-anything`	Design cyclic peptides to bind proteins
`protein-small_molecule`	Design proteins to bind small molecules
`nanobody-anything`	Design nanobody CDRs
`antibody-anything`	Design antibody CDRs

协议	适用场景
`protein-anything`	设计与蛋白质或肽结合的蛋白质
`peptide-anything`	设计与蛋白质结合的环状肽
`protein-small_molecule`	设计与小分子结合的蛋白质
`nanobody-anything`	设计纳米抗体CDR区域
`antibody-anything`	设计抗体CDR区域

Output format

输出格式

output/
├── sample_0/
│   ├── design.cif         # All-atom structure (CIF format)
│   ├── metrics.json       # Confidence scores
│   └── sequence.fasta     # Sequence
├── sample_1/
│   └── ...
└── summary.csv

Note: BoltzGen outputs CIF format. Convert to PDB if needed:

python

from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")

output/
├── sample_0/
│   ├── design.cif         # All-atom structure (CIF format)
│   ├── metrics.json       # Confidence scores
│   └── sequence.fasta     # Sequence
├── sample_1/
│   └── ...
└── summary.csv

注意：BoltzGen输出CIF格式。如需转换为PDB格式，请使用以下代码：

python

from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")

Sample output

示例输出

Successful run

运行成功示例

$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete

Results saved to: ./out/boltzgen/2501161234/

Output directory structure:

out/boltzgen/2501161234/
├── intermediate_designs/           # Raw diffusion outputs
│   ├── design_0.cif
│   └── design_0.npz
├── intermediate_designs_inverse_folded/
│   ├── refold_cif/                 # Refolded complexes
│   └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
    ├── final_10_designs/           # Top designs
    └── results_overview.pdf        # Summary plots

What good output looks like:

Refolding RMSD < 2.0A (design folds as predicted)
ipTM > 0.5 (confident interface)
All designs complete pipeline without errors

$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete

Results saved to: ./out/boltzgen/2501161234/

输出目录结构：

out/boltzgen/2501161234/
├── intermediate_designs/           # Raw diffusion outputs
│   ├── design_0.cif
│   └── design_0.npz
├── intermediate_designs_inverse_folded/
│   ├── refold_cif/                 # Refolded complexes
│   └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
    ├── final_10_designs/           # Top designs
    └── results_overview.pdf        # Summary plots

优质输出特征：

重折叠RMSD < 2.0Å（设计结构与预测一致）
ipTM > 0.5（结合界面可信度高）
所有设计均顺利完成流水线无错误

Decision tree

决策树

Should I use BoltzGen?
│
├─ What type of design?
│  ├─ All-atom precision needed → BoltzGen ✓
│  ├─ Ligand binding pocket → BoltzGen ✓
│  └─ Standard miniprotein → RFdiffusion (faster)
│
├─ What matters most?
│  ├─ Side-chain packing → BoltzGen ✓
│  ├─ Speed / diversity → RFdiffusion
│  ├─ Highest success rate → BindCraft
│  └─ AF2 optimization → ColabDesign
│
└─ Compute resources?
   ├─ Have L40S/A100 (48GB+) → BoltzGen ✓
   └─ Only A10G (24GB) → Consider RFdiffusion

Should I use BoltzGen?
│
├─ What type of design?
│  ├─ All-atom precision needed → BoltzGen ✓
│  ├─ Ligand binding pocket → BoltzGen ✓
│  └─ Standard miniprotein → RFdiffusion (faster)
│
├─ What matters most?
│  ├─ Side-chain packing → BoltzGen ✓
│  ├─ Speed / diversity → RFdiffusion
│  ├─ Highest success rate → BindCraft
│  └─ AF2 optimization → ColabDesign
│
└─ Compute resources?
   ├─ Have L40S/A100 (48GB+) → BoltzGen ✓
   └─ Only A10G (24GB) → Consider RFdiffusion

Typical performance

典型性能

Campaign Size	Time (L40S)	Cost (Modal)	Notes
50 designs	30-45 min	~$8	Quick exploration
100 designs	1-1.5h	~$15	Standard campaign
500 designs	5-8h	~$70	Large campaign

Per-design: ~30-60s for typical binder.

任务规模	耗时（L40S）	成本（Modal）	说明
50个设计	30-45分钟	约8美元	快速探索
100个设计	1-1.5小时	约15美元	标准任务
500个设计	5-8小时	约70美元	大规模任务

单个设计：典型结合剂约30-60秒。

Verify

验证

bash

find output -name "*.cif" | wc -l  # Should match num_samples

bash

find output -name "*.cif" | wc -l  # Should match num_samples

Troubleshooting

故障排除

Verify config first: Always run

boltzgen check config.yaml

before running the full pipeline Slow generation: Use fewer designs for initial testing, then scale up OOM errors: Use A100-80GB or reduce

--num-designs

Wrong binding site: Residue indices use

label_seq_id

(1-indexed), check in Molstar viewer

先验证配置：运行完整流水线前，请始终执行

boltzgen check config.yaml

生成速度慢：初始测试时减少设计数量，再逐步扩容 内存不足错误：使用A100-80GB或减少设计数量 结合位点错误：残基索引使用

label_seq_id

（从1开始计数），请在Molstar查看器中确认

Error interpretation

错误解析

Error	Cause	Fix
`RuntimeError: CUDA out of memory`	Large design or long protein	Use A100-80GB or reduce designs
`FileNotFoundError: *.cif`	Target file not found	File paths are relative to YAML location
`ValueError: invalid chain`	Chain not in target	Verify chain IDs with Molstar or PyMOL
`modal: command not found`	Modal CLI not installed	Run `pip install modal && modal setup`

Next: Validate with

boltz

chai

→

protein-qc

for filtering.

错误	原因	解决方法
`RuntimeError: CUDA out of memory`	设计规模大或蛋白质序列长	使用A100-80GB或减少设计数量
`FileNotFoundError: *.cif`	目标文件未找到	文件路径相对于YAML文件所在位置
`ValueError: invalid chain`	目标中无该链	使用Molstar或PyMOL验证链ID
`modal: command not found`	未安装Modal命令行工具	执行 `pip install modal && modal setup`

下一步：使用

boltz

或

chai

→

protein-qc

进行过滤验证。