tooluniverse-vaccine-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vaccine Design

疫苗设计

Computational pipeline for designing peptide/subunit vaccine candidates through epitope prediction, population coverage optimization, and immunogenicity assessment.
通过表位预测、人群覆盖优化和免疫原性评估设计肽/亚单位候选疫苗的计算流程。

Reasoning Strategy

推理策略

Vaccine design requires presenting the right epitopes to elicit protective immunity — not just any immune response, but one that is neutralizing, durable, and broadly applicable. For T-cell vaccines, the core tool is MHC binding prediction (IEDB tools): predict peptide-MHC affinity across multiple HLA alleles, then select epitopes with broad coverage of the target population. For antibody vaccines, prioritize surface-exposed conserved regions — a deeply buried or hypervariable region makes a poor antibody target. MHC binding does not equal immunogenicity; many good binders are not immunogenic in vivo due to tolerance, poor processing, or lack of T-cell help. A multi-epitope strategy (combining MHC-I for CD8+ CTL response, MHC-II for CD4+ helper response, and B-cell epitopes for antibody induction) is more robust than any single epitope. Conservation across pathogen strains is critical — an epitope that mutates under immune pressure (like HIV envelope hypervariable regions) is a poor vaccine target.
LOOK UP DON'T GUESS: Do not predict MHC binding or population coverage from memory — use
IEDB_predict_mhci_binding
and
IEDB_predict_mhcii_binding
for predictions and
IEDB_search_epitopes
for validated experimental data. Do not assume what's on the pathogen surface; retrieve annotated sequences from UniProt or BVBRC.
Key principles:
  1. Epitope-driven — vaccines work by presenting epitopes to T/B cells; start with epitope prediction
  2. Population coverage matters — HLA diversity means no single epitope covers everyone; design for breadth
  3. Multi-epitope is better — combine CD8+ (MHC-I) and CD4+ (MHC-II) epitopes for robust immunity
  4. Conservation = broad protection — conserved epitopes across strains provide cross-protective immunity
  5. Evidence grading — T1: clinical trial data, T2: in-vivo immunogenicity, T3: in-vitro binding, T4: computational prediction only

疫苗设计需要呈现正确的表位以诱导保护性免疫——并非任意免疫反应,而是具备中和性、持久性和广泛适用性的免疫反应。对于T细胞疫苗,核心工具是MHC结合预测(IEDB工具):预测多肽与多个HLA等位基因的MHC亲和力,然后选择能广泛覆盖目标人群的表位。对于抗体疫苗,优先选择暴露于表面的保守区域——深埋或高变区域不适宜作为抗体靶点。MHC结合不等于免疫原性;许多结合力强的表位由于免疫耐受、加工不良或缺乏T细胞辅助,在体内不具备免疫原性。多表位策略(结合用于CD8+细胞毒性T淋巴细胞反应的MHC-I表位、用于CD4+辅助T细胞反应的MHC-II表位,以及用于诱导抗体产生的B细胞表位)比单一表位策略更稳健。病原体菌株间的保守性至关重要——在免疫压力下发生突变的表位(如HIV包膜高变区)不适宜作为疫苗靶点。
查资料,勿臆测:不要凭记忆预测MHC结合或人群覆盖情况——使用
IEDB_predict_mhci_binding
IEDB_predict_mhcii_binding
进行预测,使用
IEDB_search_epitopes
获取经过验证的实验数据。不要假设病原体表面的结构;从UniProt或BVBRC检索带注释的序列。
核心原则:
  1. 表位驱动——疫苗通过向T/B细胞呈递表位发挥作用;从表位预测开始
  2. 人群覆盖至关重要——HLA多样性意味着没有单一表位能覆盖所有人;设计需兼顾广度
  3. 多表位更优——结合CD8+(MHC-I)和CD4+(MHC-II)表位以获得稳健免疫效果
  4. 保守性=广泛保护——跨菌株保守的表位提供交叉保护性免疫
  5. 证据分级——T1:临床试验数据,T2:体内免疫原性数据,T3:体外结合数据,T4:仅计算预测数据

When to Use

适用场景

  • "Design a vaccine against [pathogen]"
  • "Predict T-cell epitopes for [protein]"
  • "What MHC-I epitopes does [protein] have?"
  • "Assess population coverage of these epitopes"
  • "Find conserved epitopes across [pathogen] strains"
Not this skill: For HLA typing or allele frequency only, use
tooluniverse-hla-immunogenomics
. For antibody engineering, use
tooluniverse-antibody-engineering
.

  • "针对[病原体]设计疫苗"
  • "预测[蛋白质]的T细胞表位"
  • "[蛋白质]具有哪些MHC-I表位?"
  • "评估这些表位的人群覆盖情况"
  • "寻找[病原体]各菌株间的保守表位"
不适用场景:仅进行HLA分型或等位基因频率分析时,使用
tooluniverse-hla-immunogenomics
;进行抗体工程时,使用
tooluniverse-antibody-engineering

Core Tools

核心工具

ToolUse For
IEDB_search_epitopes
Search experimentally validated epitopes
IEDB_get_epitope
Get detailed epitope data (assay results, MHC restriction)
iedb_search_mhc
Search validated MHC binding assay data
IEDB_predict_mhci_binding
Predict MHC-I binding (NetMHCpan EL; rank < 0.5% = strong binder)
IEDB_predict_mhcii_binding
Predict MHC-II binding (NetMHCIIpan EL; CD4+ helper epitopes)
UniProt_get_entry_by_accession
Get antigen protein sequence
UniProt_search
Find pathogen protein sequences
BVBRC_search_genome_features
Search pathogen proteomes
alphafold_get_prediction
Get/predict antigen 3D structure
EnsemblVEP_annotate_hgvs
Check epitope conservation across variants
PubMed_search_articles
Find published vaccine studies
search_clinical_trials
Find ongoing vaccine clinical trials

工具用途
IEDB_search_epitopes
搜索经过实验验证的表位
IEDB_get_epitope
获取详细表位数据(检测结果、MHC限制性)
iedb_search_mhc
搜索经过验证的MHC结合检测数据
IEDB_predict_mhci_binding
预测MHC-I结合情况(NetMHCpan EL;排名<0.5%=强结合剂)
IEDB_predict_mhcii_binding
预测MHC-II结合情况(NetMHCIIpan EL;CD4+辅助表位)
UniProt_get_entry_by_accession
获取抗原蛋白质序列
UniProt_search
查找病原体蛋白质序列
BVBRC_search_genome_features
搜索病原体蛋白质组
alphafold_get_prediction
获取/预测抗原3D结构
EnsemblVEP_annotate_hgvs
检查表位在变异体中的保守性
PubMed_search_articles
查找已发表的疫苗研究
search_clinical_trials
查找正在进行的疫苗临床试验

Workflow

工作流程

Phase 0: Antigen Selection
  Pathogen → essential surface proteins → sequence retrieval
    |
Phase 1: T-Cell Epitope Prediction
  MHC-I (CD8+ CTL) and MHC-II (CD4+ helper) binding prediction
    |
Phase 2: B-Cell Epitope Prediction
  Linear and conformational B-cell epitopes for antibody response
    |
Phase 3: Population Coverage
  HLA allele frequencies → design for target population
    |
Phase 4: Conservation Analysis
  Cross-strain epitope conservation → broad protection
    |
Phase 5: Candidate Assembly & Report
  Multi-epitope construct design → immunogenicity assessment
阶段0: 抗原选择
  病原体 → 关键表面蛋白 → 序列检索
    |
阶段1: T细胞表位预测
  MHC-I(CD8+细胞毒性T细胞)和MHC-II(CD4+辅助T细胞)结合预测
    |
阶段2: B细胞表位预测
  用于诱导抗体反应的线性和构象B细胞表位
    |
阶段3: 人群覆盖分析
  HLA等位基因频率 → 针对目标人群设计
    |
阶段4: 保守性分析
  跨菌株表位保守性 → 广泛保护
    |
阶段5: 候选疫苗组装与报告
  多表位构建体设计 → 免疫原性评估

Phase 0: Antigen Selection

阶段0: 抗原选择

Best antigens for vaccines: Surface-exposed, essential for pathogen function, conserved across strains.
python
undefined
疫苗最优抗原:暴露于表面、对病原体功能至关重要、跨菌株保守。
python
undefined

Find pathogen surface proteins

查找病原体表面蛋白

UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")

Or search BVBRC for annotated pathogen proteomes

或搜索BVBRC获取带注释的病原体蛋白质组

BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")

**Antigen prioritization**: prefer surface-exposed (secreted/outer membrane) over cytoplasmic; >95% conserved across strains; essential for pathogen viability; known immunogen in natural infection. Use UniProt subcellular location annotations and PubMed to verify these properties.
BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")

**抗原优先级**:优先选择表面暴露(分泌/外膜)蛋白而非胞内蛋白;跨菌株保守性>95%;对病原体生存至关重要;在自然感染中为已知免疫原。使用UniProt亚细胞定位注释和PubMed验证这些特性。

Phase 1: T-Cell Epitope Prediction

阶段1: T细胞表位预测

MHC-I epitopes (CD8+ cytotoxic T cells — kill infected cells):
python
undefined
MHC-I表位(CD8+细胞毒性T细胞——杀伤受感染细胞):
python
undefined

Option A: Search for KNOWN validated epitopes from IEDB

选项A: 从IEDB搜索已知的验证表位

iedb_search_mhc( mhc_class="I", qualitative_measure="Positive", filters={"source_organism_iri": "eq.NCBITaxon:2697049"}, # SARS-CoV-2 select=["linear_sequence", "mhc_restriction", "qualitative_measure"], limit=50 )
iedb_search_mhc( mhc_class="I", qualitative_measure="Positive", filters={"source_organism_iri": "eq.NCBITaxon:2697049"}, # SARS-CoV-2 select=["linear_sequence", "mhc_restriction", "qualitative_measure"], limit=50 )

Option B: PREDICT novel peptide binding (recommended for new proteins)

选项B: 预测新型多肽结合情况(推荐用于新蛋白质)

IEDB_predict_mhci_binding( sequence="YOUR_PROTEIN_SEQUENCE", # full protein or peptide allele="HLA-A*02:01", # or H-2-Kd for mouse method="netmhcpan_el", # EL = eluted ligand (recommended) length=9 # 8-11 for MHC-I )
IEDB_predict_mhci_binding( sequence="YOUR_PROTEIN_SEQUENCE", # 完整蛋白质或多肽 allele="HLA-A*02:01", # 小鼠使用H-2-Kd method="netmhcpan_el", # EL = 洗脱配体(推荐方法) length=9 # MHC-I表位长度为8-11 )

Returns peptides ranked by percentile_rank:

返回按percentile_rank排序的多肽:

< 0.5% = strong binder (include in vaccine)

<0.5% = 强结合剂(纳入疫苗)

0.5-2% = moderate binder (consider)

0.5-2% = 中等结合剂(考虑纳入)

> 2% = weak/non-binder (exclude)

>2% = 弱/非结合剂(排除)


**MHC-II epitopes** (CD4+ helper T cells — activate B cells and CD8+ T cells):

```python
iedb_search_mhc(
    mhc_class="II",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
    limit=50
)
Binding affinity interpretation:
IC50 (nM)ClassificationVaccine Relevance
< 50Strong binderInclude — high presentation probability
50-500Moderate binderConsider — may contribute to response
500-5000Weak binderExclude — unlikely to be presented
> 5000Non-binderExclude
HLA supertype strategy: For broad coverage, predict against HLA supertypes:
  • A2 supertype (A02:01, A02:06, A*68:02) — covers ~40% globally
  • A3 supertype (A03:01, A11:01, A*31:01) — covers ~25%
  • B7 supertype (B07:02, B35:01, B*51:01) — covers ~25%
  • A2 + A3 + B7 + B44 combined — covers >90% of most populations

**MHC-II表位**(CD4+辅助T细胞——激活B细胞和CD8+T细胞):

```python
iedb_search_mhc(
    mhc_class="II",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
    limit=50
)
结合亲和力解读:
IC50 (nM)分类疫苗相关性
<50强结合剂纳入——呈递概率高
50-500中等结合剂考虑纳入——可能有助于免疫反应
500-5000弱结合剂排除——呈递可能性低
>5000非结合剂排除
HLA超型策略:为实现广泛覆盖,针对HLA超型进行预测:
  • A2超型(A02:01, A02:06, A*68:02)——覆盖全球约40%人群
  • A3超型(A03:01, A11:01, A*31:01)——覆盖约25%
  • B7超型(B07:02, B35:01, B*51:01)——覆盖约25%
  • A2 + A3 + B7 + B44组合——覆盖大多数人群的90%以上

Phase 2: B-Cell Epitope Prediction

阶段2: B细胞表位预测

B-cell epitopes trigger antibody production. Look for:
  • Linear epitopes: Continuous peptide sequences (easier to synthesize)
  • Conformational epitopes: 3D surface patches (requires structural data)
python
undefined
B细胞表位触发抗体产生。需关注:
  • 线性表位:连续多肽序列(易于合成)
  • 构象表位:3D表面区域(需要结构数据)
python
undefined

Check for known B-cell epitopes

查找已知B细胞表位

IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")
IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")

Get structure for conformational epitope prediction

获取结构以进行构象表位预测

alphafold_get_prediction(uniprot_id="[accession]")

**B-cell epitope criteria**: Surface-exposed loops, hydrophilic regions, flexible regions (high B-factor). Combine computational prediction with structural analysis.
alphafold_get_prediction(uniprot_id="[accession]")

**B细胞表位标准**:暴露于表面的环区、亲水区域、柔性区域(高B因子)。结合计算预测与结构分析。

Phase 3: Population Coverage

阶段3: 人群覆盖分析

python
undefined
python
undefined

Search for epitopes restricted to common HLA alleles in target population

搜索针对目标人群常见HLA等位基因的表位

NOTE: No HLA frequency tool exists in ToolUniverse. For population coverage:

注意:ToolUniverse中无HLA频率工具。如需进行人群覆盖分析:

1. Use IEDB Analysis Resource (tools.iedb.org/population) for population coverage calculation

1. 使用IEDB分析资源(tools.iedb.org/population)计算人群覆盖情况

2. Use the HLA supertype strategy (see above) to ensure broad coverage

2. 使用上述HLA超型策略确保广泛覆盖

3. Search PubMed for published HLA frequency data: PubMed_search_articles(query="HLA allele frequency [population]")

3. 搜索PubMed获取已发表的HLA频率数据:PubMed_search_articles(query="HLA allele frequency [population]")


**Population coverage targets**:

| Coverage Level | Interpretation | Action |
|---------------|---------------|--------|
| >90% | Excellent — vaccine will work in most individuals | Proceed to development |
| 70-90% | Good — most people covered; some populations underserved | Add more epitopes for uncovered HLA types |
| 50-70% | Moderate — significant gaps | Redesign with broader HLA coverage |
| <50% | Poor — vaccine will miss too many people | Fundamental redesign needed |

**人群覆盖目标**:

| 覆盖水平 | 解读 | 行动 |
|---------------|---------------|--------|
| >90% | 极佳——疫苗对大多数个体有效 | 推进开发 |
| 70-90% | 良好——覆盖大多数人群;部分人群未充分覆盖 | 添加针对未覆盖HLA类型的更多表位 |
| 50-70% | 中等——存在显著缺口 | 重新设计以扩大HLA覆盖范围 |
| <50% | 较差——疫苗会遗漏过多人群 | 需要根本性重新设计 |

Phase 4: Conservation Analysis

阶段4: 保守性分析

Check if epitopes are conserved across pathogen strains/variants:
python
undefined
检查表位在病原体菌株/变异体中的保守性:
python
undefined

Search for protein variants across strains

搜索跨菌株的蛋白质变异情况

PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")

Check specific mutations in epitope regions

检查表位区域的特定突变

EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")

**Conservation interpretation**:
- **100% conserved** across all known strains → ideal vaccine target
- **>95% conserved** → good target; monitor emerging variants
- **80-95% conserved** → may need strain-specific variants in construct
- **<80% conserved** → avoid; pathogen evolves to escape this epitope
EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")

**保守性解读**:
- **100%保守**于所有已知菌株 → 理想疫苗靶点
- **>95%保守** → 良好靶点;监测新出现的变异体
- **80-95%保守** → 构建体中可能需要包含菌株特异性变异体
- **<80%保守** → 避免使用;病原体可通过进化逃逸该表位

Phase 5: Candidate Assembly & Report

阶段5: 候选疫苗组装与报告

Multi-epitope construct design principles:
  1. Include 3-5 MHC-I epitopes (CD8+ response)
  2. Include 2-3 MHC-II epitopes (CD4+ helper response)
  3. Include 1-2 B-cell epitopes (antibody response)
  4. Connect with appropriate linkers (AAY for MHC-I, GPGPG for MHC-II)
  5. Add adjuvant sequence if needed (e.g., flagellin domain for TLR5)
Report structure:
  1. Antigen Selection — rationale, conservation, essentiality
  2. Epitope Map — all predicted epitopes with binding affinities and HLA restrictions
  3. Top Epitopes — ranked by binding strength × conservation × population coverage
  4. Population Coverage — % coverage per major world population
  5. Conservation Analysis — strain coverage, escape risk assessment
  6. Construct Design — multi-epitope sequence with linkers
  7. Clinical Precedent — existing vaccines/trials for related antigens
  8. Limitations — predicted only (T4 evidence); needs experimental validation

多表位构建体设计原则:
  1. 包含3-5个MHC-I表位(CD8+免疫反应)
  2. 包含2-3个MHC-II表位(CD4+辅助免疫反应)
  3. 包含1-2个B细胞表位(抗体反应)
  4. 使用合适的连接肽连接(MHC-I使用AAY,MHC-II使用GPGPG)
  5. 必要时添加佐剂序列(如用于TLR5的鞭毛蛋白结构域)
报告结构:
  1. 抗原选择——依据、保守性、必要性
  2. 表位图谱——所有预测表位及其结合亲和力和HLA限制性
  3. 顶级表位——按结合强度×保守性×人群覆盖情况排名
  4. 人群覆盖——主要人群的覆盖百分比
  5. 保守性分析——菌株覆盖范围、逃逸风险评估
  6. 构建体设计——带连接肽的多表位序列
  7. 临床先例——相关抗原的现有疫苗/试验
  8. 局限性——仅为预测结果(T4证据);需实验验证

Limitations

局限性

  • All predictions are computational (T4 evidence) — experimental validation (binding assays, immunogenicity studies) is required before any clinical development
  • No immunogenicity guarantee — MHC binding ≠ immunogenicity; many good binders are not immunogenic in vivo
  • B-cell epitope prediction is less reliable than T-cell; conformational epitopes require accurate structures
  • No adjuvant optimization — adjuvant selection requires empirical testing
  • Pathogen evasion — rapidly evolving pathogens (HIV, influenza) may escape epitope-based vaccines
  • 所有预测均为计算结果(T4证据)——临床开发前需进行实验验证(结合检测、免疫原性研究)
  • 无法保证免疫原性——MHC结合≠免疫原性;许多结合力强的表位在体内不具备免疫原性
  • B细胞表位预测的可靠性低于T细胞表位;构象表位需要准确的结构数据
  • 未涉及佐剂优化——佐剂选择需经验测试
  • 病原体逃逸——快速进化的病原体(如HIV、流感)可能逃逸基于表位的疫苗