tooluniverse-vaccine-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVaccine Design
疫苗设计
Computational pipeline for designing peptide/subunit vaccine candidates through epitope prediction, population coverage optimization, and immunogenicity assessment.
通过表位预测、人群覆盖优化和免疫原性评估设计肽/亚单位候选疫苗的计算流程。
Reasoning Strategy
推理策略
Vaccine design requires presenting the right epitopes to elicit protective immunity — not just any immune response, but one that is neutralizing, durable, and broadly applicable. For T-cell vaccines, the core tool is MHC binding prediction (IEDB tools): predict peptide-MHC affinity across multiple HLA alleles, then select epitopes with broad coverage of the target population. For antibody vaccines, prioritize surface-exposed conserved regions — a deeply buried or hypervariable region makes a poor antibody target. MHC binding does not equal immunogenicity; many good binders are not immunogenic in vivo due to tolerance, poor processing, or lack of T-cell help. A multi-epitope strategy (combining MHC-I for CD8+ CTL response, MHC-II for CD4+ helper response, and B-cell epitopes for antibody induction) is more robust than any single epitope. Conservation across pathogen strains is critical — an epitope that mutates under immune pressure (like HIV envelope hypervariable regions) is a poor vaccine target.
LOOK UP DON'T GUESS: Do not predict MHC binding or population coverage from memory — use and for predictions and for validated experimental data. Do not assume what's on the pathogen surface; retrieve annotated sequences from UniProt or BVBRC.
IEDB_predict_mhci_bindingIEDB_predict_mhcii_bindingIEDB_search_epitopesKey principles:
- Epitope-driven — vaccines work by presenting epitopes to T/B cells; start with epitope prediction
- Population coverage matters — HLA diversity means no single epitope covers everyone; design for breadth
- Multi-epitope is better — combine CD8+ (MHC-I) and CD4+ (MHC-II) epitopes for robust immunity
- Conservation = broad protection — conserved epitopes across strains provide cross-protective immunity
- Evidence grading — T1: clinical trial data, T2: in-vivo immunogenicity, T3: in-vitro binding, T4: computational prediction only
疫苗设计需要呈现正确的表位以诱导保护性免疫——并非任意免疫反应,而是具备中和性、持久性和广泛适用性的免疫反应。对于T细胞疫苗,核心工具是MHC结合预测(IEDB工具):预测多肽与多个HLA等位基因的MHC亲和力,然后选择能广泛覆盖目标人群的表位。对于抗体疫苗,优先选择暴露于表面的保守区域——深埋或高变区域不适宜作为抗体靶点。MHC结合不等于免疫原性;许多结合力强的表位由于免疫耐受、加工不良或缺乏T细胞辅助,在体内不具备免疫原性。多表位策略(结合用于CD8+细胞毒性T淋巴细胞反应的MHC-I表位、用于CD4+辅助T细胞反应的MHC-II表位,以及用于诱导抗体产生的B细胞表位)比单一表位策略更稳健。病原体菌株间的保守性至关重要——在免疫压力下发生突变的表位(如HIV包膜高变区)不适宜作为疫苗靶点。
查资料,勿臆测:不要凭记忆预测MHC结合或人群覆盖情况——使用和进行预测,使用获取经过验证的实验数据。不要假设病原体表面的结构;从UniProt或BVBRC检索带注释的序列。
IEDB_predict_mhci_bindingIEDB_predict_mhcii_bindingIEDB_search_epitopes核心原则:
- 表位驱动——疫苗通过向T/B细胞呈递表位发挥作用;从表位预测开始
- 人群覆盖至关重要——HLA多样性意味着没有单一表位能覆盖所有人;设计需兼顾广度
- 多表位更优——结合CD8+(MHC-I)和CD4+(MHC-II)表位以获得稳健免疫效果
- 保守性=广泛保护——跨菌株保守的表位提供交叉保护性免疫
- 证据分级——T1:临床试验数据,T2:体内免疫原性数据,T3:体外结合数据,T4:仅计算预测数据
When to Use
适用场景
- "Design a vaccine against [pathogen]"
- "Predict T-cell epitopes for [protein]"
- "What MHC-I epitopes does [protein] have?"
- "Assess population coverage of these epitopes"
- "Find conserved epitopes across [pathogen] strains"
Not this skill: For HLA typing or allele frequency only, use . For antibody engineering, use .
tooluniverse-hla-immunogenomicstooluniverse-antibody-engineering- "针对[病原体]设计疫苗"
- "预测[蛋白质]的T细胞表位"
- "[蛋白质]具有哪些MHC-I表位?"
- "评估这些表位的人群覆盖情况"
- "寻找[病原体]各菌株间的保守表位"
不适用场景:仅进行HLA分型或等位基因频率分析时,使用;进行抗体工程时,使用。
tooluniverse-hla-immunogenomicstooluniverse-antibody-engineeringCore Tools
核心工具
| Tool | Use For |
|---|---|
| Search experimentally validated epitopes |
| Get detailed epitope data (assay results, MHC restriction) |
| Search validated MHC binding assay data |
| Predict MHC-I binding (NetMHCpan EL; rank < 0.5% = strong binder) |
| Predict MHC-II binding (NetMHCIIpan EL; CD4+ helper epitopes) |
| Get antigen protein sequence |
| Find pathogen protein sequences |
| Search pathogen proteomes |
| Get/predict antigen 3D structure |
| Check epitope conservation across variants |
| Find published vaccine studies |
| Find ongoing vaccine clinical trials |
| 工具 | 用途 |
|---|---|
| 搜索经过实验验证的表位 |
| 获取详细表位数据(检测结果、MHC限制性) |
| 搜索经过验证的MHC结合检测数据 |
| 预测MHC-I结合情况(NetMHCpan EL;排名<0.5%=强结合剂) |
| 预测MHC-II结合情况(NetMHCIIpan EL;CD4+辅助表位) |
| 获取抗原蛋白质序列 |
| 查找病原体蛋白质序列 |
| 搜索病原体蛋白质组 |
| 获取/预测抗原3D结构 |
| 检查表位在变异体中的保守性 |
| 查找已发表的疫苗研究 |
| 查找正在进行的疫苗临床试验 |
Workflow
工作流程
Phase 0: Antigen Selection
Pathogen → essential surface proteins → sequence retrieval
|
Phase 1: T-Cell Epitope Prediction
MHC-I (CD8+ CTL) and MHC-II (CD4+ helper) binding prediction
|
Phase 2: B-Cell Epitope Prediction
Linear and conformational B-cell epitopes for antibody response
|
Phase 3: Population Coverage
HLA allele frequencies → design for target population
|
Phase 4: Conservation Analysis
Cross-strain epitope conservation → broad protection
|
Phase 5: Candidate Assembly & Report
Multi-epitope construct design → immunogenicity assessment阶段0: 抗原选择
病原体 → 关键表面蛋白 → 序列检索
|
阶段1: T细胞表位预测
MHC-I(CD8+细胞毒性T细胞)和MHC-II(CD4+辅助T细胞)结合预测
|
阶段2: B细胞表位预测
用于诱导抗体反应的线性和构象B细胞表位
|
阶段3: 人群覆盖分析
HLA等位基因频率 → 针对目标人群设计
|
阶段4: 保守性分析
跨菌株表位保守性 → 广泛保护
|
阶段5: 候选疫苗组装与报告
多表位构建体设计 → 免疫原性评估Phase 0: Antigen Selection
阶段0: 抗原选择
Best antigens for vaccines: Surface-exposed, essential for pathogen function, conserved across strains.
python
undefined疫苗最优抗原:暴露于表面、对病原体功能至关重要、跨菌株保守。
python
undefinedFind pathogen surface proteins
查找病原体表面蛋白
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
Or search BVBRC for annotated pathogen proteomes
或搜索BVBRC获取带注释的病原体蛋白质组
BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")
**Antigen prioritization**: prefer surface-exposed (secreted/outer membrane) over cytoplasmic; >95% conserved across strains; essential for pathogen viability; known immunogen in natural infection. Use UniProt subcellular location annotations and PubMed to verify these properties.BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")
**抗原优先级**:优先选择表面暴露(分泌/外膜)蛋白而非胞内蛋白;跨菌株保守性>95%;对病原体生存至关重要;在自然感染中为已知免疫原。使用UniProt亚细胞定位注释和PubMed验证这些特性。Phase 1: T-Cell Epitope Prediction
阶段1: T细胞表位预测
MHC-I epitopes (CD8+ cytotoxic T cells — kill infected cells):
python
undefinedMHC-I表位(CD8+细胞毒性T细胞——杀伤受感染细胞):
python
undefinedOption A: Search for KNOWN validated epitopes from IEDB
选项A: 从IEDB搜索已知的验证表位
iedb_search_mhc(
mhc_class="I",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"}, # SARS-CoV-2
select=["linear_sequence", "mhc_restriction", "qualitative_measure"],
limit=50
)
iedb_search_mhc(
mhc_class="I",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"}, # SARS-CoV-2
select=["linear_sequence", "mhc_restriction", "qualitative_measure"],
limit=50
)
Option B: PREDICT novel peptide binding (recommended for new proteins)
选项B: 预测新型多肽结合情况(推荐用于新蛋白质)
IEDB_predict_mhci_binding(
sequence="YOUR_PROTEIN_SEQUENCE", # full protein or peptide
allele="HLA-A*02:01", # or H-2-Kd for mouse
method="netmhcpan_el", # EL = eluted ligand (recommended)
length=9 # 8-11 for MHC-I
)
IEDB_predict_mhci_binding(
sequence="YOUR_PROTEIN_SEQUENCE", # 完整蛋白质或多肽
allele="HLA-A*02:01", # 小鼠使用H-2-Kd
method="netmhcpan_el", # EL = 洗脱配体(推荐方法)
length=9 # MHC-I表位长度为8-11
)
Returns peptides ranked by percentile_rank:
返回按percentile_rank排序的多肽:
< 0.5% = strong binder (include in vaccine)
<0.5% = 强结合剂(纳入疫苗)
0.5-2% = moderate binder (consider)
0.5-2% = 中等结合剂(考虑纳入)
> 2% = weak/non-binder (exclude)
>2% = 弱/非结合剂(排除)
**MHC-II epitopes** (CD4+ helper T cells — activate B cells and CD8+ T cells):
```python
iedb_search_mhc(
mhc_class="II",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
limit=50
)Binding affinity interpretation:
| IC50 (nM) | Classification | Vaccine Relevance |
|---|---|---|
| < 50 | Strong binder | Include — high presentation probability |
| 50-500 | Moderate binder | Consider — may contribute to response |
| 500-5000 | Weak binder | Exclude — unlikely to be presented |
| > 5000 | Non-binder | Exclude |
HLA supertype strategy: For broad coverage, predict against HLA supertypes:
- A2 supertype (A02:01, A02:06, A*68:02) — covers ~40% globally
- A3 supertype (A03:01, A11:01, A*31:01) — covers ~25%
- B7 supertype (B07:02, B35:01, B*51:01) — covers ~25%
- A2 + A3 + B7 + B44 combined — covers >90% of most populations
**MHC-II表位**(CD4+辅助T细胞——激活B细胞和CD8+T细胞):
```python
iedb_search_mhc(
mhc_class="II",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
limit=50
)结合亲和力解读:
| IC50 (nM) | 分类 | 疫苗相关性 |
|---|---|---|
| <50 | 强结合剂 | 纳入——呈递概率高 |
| 50-500 | 中等结合剂 | 考虑纳入——可能有助于免疫反应 |
| 500-5000 | 弱结合剂 | 排除——呈递可能性低 |
| >5000 | 非结合剂 | 排除 |
HLA超型策略:为实现广泛覆盖,针对HLA超型进行预测:
- A2超型(A02:01, A02:06, A*68:02)——覆盖全球约40%人群
- A3超型(A03:01, A11:01, A*31:01)——覆盖约25%
- B7超型(B07:02, B35:01, B*51:01)——覆盖约25%
- A2 + A3 + B7 + B44组合——覆盖大多数人群的90%以上
Phase 2: B-Cell Epitope Prediction
阶段2: B细胞表位预测
B-cell epitopes trigger antibody production. Look for:
- Linear epitopes: Continuous peptide sequences (easier to synthesize)
- Conformational epitopes: 3D surface patches (requires structural data)
python
undefinedB细胞表位触发抗体产生。需关注:
- 线性表位:连续多肽序列(易于合成)
- 构象表位:3D表面区域(需要结构数据)
python
undefinedCheck for known B-cell epitopes
查找已知B细胞表位
IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")
IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")
Get structure for conformational epitope prediction
获取结构以进行构象表位预测
alphafold_get_prediction(uniprot_id="[accession]")
**B-cell epitope criteria**: Surface-exposed loops, hydrophilic regions, flexible regions (high B-factor). Combine computational prediction with structural analysis.alphafold_get_prediction(uniprot_id="[accession]")
**B细胞表位标准**:暴露于表面的环区、亲水区域、柔性区域(高B因子)。结合计算预测与结构分析。Phase 3: Population Coverage
阶段3: 人群覆盖分析
python
undefinedpython
undefinedSearch for epitopes restricted to common HLA alleles in target population
搜索针对目标人群常见HLA等位基因的表位
NOTE: No HLA frequency tool exists in ToolUniverse. For population coverage:
注意:ToolUniverse中无HLA频率工具。如需进行人群覆盖分析:
1. Use IEDB Analysis Resource (tools.iedb.org/population) for population coverage calculation
1. 使用IEDB分析资源(tools.iedb.org/population)计算人群覆盖情况
2. Use the HLA supertype strategy (see above) to ensure broad coverage
2. 使用上述HLA超型策略确保广泛覆盖
3. Search PubMed for published HLA frequency data: PubMed_search_articles(query="HLA allele frequency [population]")
3. 搜索PubMed获取已发表的HLA频率数据:PubMed_search_articles(query="HLA allele frequency [population]")
**Population coverage targets**:
| Coverage Level | Interpretation | Action |
|---------------|---------------|--------|
| >90% | Excellent — vaccine will work in most individuals | Proceed to development |
| 70-90% | Good — most people covered; some populations underserved | Add more epitopes for uncovered HLA types |
| 50-70% | Moderate — significant gaps | Redesign with broader HLA coverage |
| <50% | Poor — vaccine will miss too many people | Fundamental redesign needed |
**人群覆盖目标**:
| 覆盖水平 | 解读 | 行动 |
|---------------|---------------|--------|
| >90% | 极佳——疫苗对大多数个体有效 | 推进开发 |
| 70-90% | 良好——覆盖大多数人群;部分人群未充分覆盖 | 添加针对未覆盖HLA类型的更多表位 |
| 50-70% | 中等——存在显著缺口 | 重新设计以扩大HLA覆盖范围 |
| <50% | 较差——疫苗会遗漏过多人群 | 需要根本性重新设计 |Phase 4: Conservation Analysis
阶段4: 保守性分析
Check if epitopes are conserved across pathogen strains/variants:
python
undefined检查表位在病原体菌株/变异体中的保守性:
python
undefinedSearch for protein variants across strains
搜索跨菌株的蛋白质变异情况
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
Check specific mutations in epitope regions
检查表位区域的特定突变
EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")
**Conservation interpretation**:
- **100% conserved** across all known strains → ideal vaccine target
- **>95% conserved** → good target; monitor emerging variants
- **80-95% conserved** → may need strain-specific variants in construct
- **<80% conserved** → avoid; pathogen evolves to escape this epitopeEnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")
**保守性解读**:
- **100%保守**于所有已知菌株 → 理想疫苗靶点
- **>95%保守** → 良好靶点;监测新出现的变异体
- **80-95%保守** → 构建体中可能需要包含菌株特异性变异体
- **<80%保守** → 避免使用;病原体可通过进化逃逸该表位Phase 5: Candidate Assembly & Report
阶段5: 候选疫苗组装与报告
Multi-epitope construct design principles:
- Include 3-5 MHC-I epitopes (CD8+ response)
- Include 2-3 MHC-II epitopes (CD4+ helper response)
- Include 1-2 B-cell epitopes (antibody response)
- Connect with appropriate linkers (AAY for MHC-I, GPGPG for MHC-II)
- Add adjuvant sequence if needed (e.g., flagellin domain for TLR5)
Report structure:
- Antigen Selection — rationale, conservation, essentiality
- Epitope Map — all predicted epitopes with binding affinities and HLA restrictions
- Top Epitopes — ranked by binding strength × conservation × population coverage
- Population Coverage — % coverage per major world population
- Conservation Analysis — strain coverage, escape risk assessment
- Construct Design — multi-epitope sequence with linkers
- Clinical Precedent — existing vaccines/trials for related antigens
- Limitations — predicted only (T4 evidence); needs experimental validation
多表位构建体设计原则:
- 包含3-5个MHC-I表位(CD8+免疫反应)
- 包含2-3个MHC-II表位(CD4+辅助免疫反应)
- 包含1-2个B细胞表位(抗体反应)
- 使用合适的连接肽连接(MHC-I使用AAY,MHC-II使用GPGPG)
- 必要时添加佐剂序列(如用于TLR5的鞭毛蛋白结构域)
报告结构:
- 抗原选择——依据、保守性、必要性
- 表位图谱——所有预测表位及其结合亲和力和HLA限制性
- 顶级表位——按结合强度×保守性×人群覆盖情况排名
- 人群覆盖——主要人群的覆盖百分比
- 保守性分析——菌株覆盖范围、逃逸风险评估
- 构建体设计——带连接肽的多表位序列
- 临床先例——相关抗原的现有疫苗/试验
- 局限性——仅为预测结果(T4证据);需实验验证
Limitations
局限性
- All predictions are computational (T4 evidence) — experimental validation (binding assays, immunogenicity studies) is required before any clinical development
- No immunogenicity guarantee — MHC binding ≠ immunogenicity; many good binders are not immunogenic in vivo
- B-cell epitope prediction is less reliable than T-cell; conformational epitopes require accurate structures
- No adjuvant optimization — adjuvant selection requires empirical testing
- Pathogen evasion — rapidly evolving pathogens (HIV, influenza) may escape epitope-based vaccines
- 所有预测均为计算结果(T4证据)——临床开发前需进行实验验证(结合检测、免疫原性研究)
- 无法保证免疫原性——MHC结合≠免疫原性;许多结合力强的表位在体内不具备免疫原性
- B细胞表位预测的可靠性低于T细胞表位;构象表位需要准确的结构数据
- 未涉及佐剂优化——佐剂选择需经验测试
- 病原体逃逸——快速进化的病原体(如HIV、流感)可能逃逸基于表位的疫苗