tooluniverse-hla-immunogenomics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHLA & Immunogenomics Analysis
HLA与免疫基因组学分析
Pipeline for exploring HLA gene families, MHC-peptide binding, epitope associations, and their clinical implications in transplantation, vaccine development, and cancer immunotherapy. Bridges immunogenetic databases (IMGT, IEDB) with functional annotation (UniProt) and druggability data (DGIdb).
用于探索HLA基因家族、MHC-肽结合、表位关联及其在移植、疫苗开发和癌症免疫治疗中临床意义的流程。将免疫遗传学数据库(IMGT、IEDB)与功能注释(UniProt)和成药性数据(DGIdb)相结合。
Reasoning Strategy
推理策略
HLA analysis is fundamentally about peptide presentation: the polymorphism of HLA molecules determines which peptides are displayed to T cells, which in turn governs disease susceptibility, transplant rejection, drug hypersensitivity, and vaccine immunogenicity. HLA type affects disease susceptibility for autoimmune conditions (HLA-B27 and ankylosing spondylitis), transplant rejection (HLA mismatch drives alloresponse), drug hypersensitivity (abacavir causes severe hypersensitivity reactions only in HLA-B*57:01 carriers), and vaccine design (epitopes must be presented by the recipient's HLA alleles to elicit a T-cell response). Class I and Class II HLA molecules have fundamentally different binding grooves, peptide lengths, and T-cell partners — never conflate them. The absence of an epitope from IEDB means it has not been tested, not that it cannot bind.
LOOK UP DON'T GUESS: Never assume an allele's binding properties or population frequency — query IEDB for experimental binding data and IMGT for allele annotation. Do not guess which HLA alleles are common in a population; look up published frequency data via PubMed.
Guiding principles:
- HLA nomenclature precision -- HLA allele names follow strict conventions (e.g., HLA-A*02:01); get the resolution level right
- MHC class awareness -- Class I (A, B, C) and Class II (DR, DQ, DP) have different binding properties and clinical roles
- Species context -- most queries target human HLA, but MHC exists across vertebrates; confirm species early
- Evidence layering -- combine binding data (IEDB) with gene annotation (IMGT) and structural context (UniProt)
- Clinical translation -- connect molecular findings to transplant matching, vaccine targets, or immunotherapy response
- English-first queries -- use English terms in all tool calls; respond in the user's language
HLA分析的核心是肽呈递:HLA分子的多态性决定了哪些肽会被呈递给T细胞,进而影响疾病易感性、移植排斥、药物超敏反应和疫苗免疫原性。HLA分型会影响自身免疫性疾病的易感性(如HLA-B27与强直性脊柱炎)、移植排斥(HLA错配驱动同种异体反应)、药物超敏反应(阿巴卡韦仅在HLA-B*57:01携带者中引发严重超敏反应)以及疫苗设计(表位必须被受者的HLA等位基因呈递才能引发T细胞反应)。I类和II类HLA分子具有完全不同的结合槽、肽长度和T细胞伴侣——切勿混淆两者。IEDB中没有某一表位的数据仅表示该表位未被测试,而非无法结合。
查资料,勿猜测:切勿假设等位基因的结合特性或群体频率——查询IEDB获取实验结合数据,查询IMGT获取等位基因注释信息。不要猜测某个人群中常见的HLA等位基因;通过PubMed查找已发表的频率数据。
指导原则:
- HLA命名精准性——HLA等位基因名称遵循严格的规范(例如HLA-A*02:01);确保分辨率级别正确
- MHC类别意识——I类(A、B、C)和II类(DR、DQ、DP)具有不同的结合特性和临床作用
- 物种背景——大多数查询针对人类HLA,但MHC存在于所有脊椎动物中;尽早确认物种
- 证据分层——结合结合数据(IEDB)、基因注释(IMGT)和结构背景(UniProt)
- 临床转化——将分子研究结果与移植匹配、疫苗靶点或免疫治疗反应关联起来
- 优先英文查询——在所有工具调用中使用英文术语;以用户使用的语言回复
COMPUTE, DON'T DESCRIBE
计算,而非描述
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你会做什么——直接执行并报告实际结果。使用ToolUniverse工具检索数据,然后用Python(pandas、scipy、statsmodels、matplotlib)进行分析。
When to Use
使用场景
Typical triggers:
- "Look up HLA-A*02:01 binding peptides"
- "What epitopes are presented by MHC class I for [pathogen]?"
- "Find HLA gene information for [allele]"
- "What MHC molecules bind [peptide/antigen]?"
- "Assess HLA associations for [disease]"
- "Find immunogenic epitopes for [virus/protein]"
- "What drugs target HLA-related pathways?"
Not this skill: For full neoantigen prediction pipelines, use . For general gene function lookup, use .
tooluniverse-immunotherapy-response-predictiontooluniverse-drug-target-validation典型触发场景:
- "查找HLA-A*02:01的结合肽"
- "[病原体]的I类MHC呈递表位有哪些?"
- "查找[等位基因]的HLA基因信息"
- "哪些MHC分子能结合[肽/抗原]?"
- "评估[疾病]的HLA关联"
- "查找[病毒/蛋白质]的免疫原性表位"
- "哪些药物靶向HLA相关通路?"
不属于本技能范畴:如需完整的新抗原预测流程,请使用。如需通用基因功能查询,请使用。
tooluniverse-immunotherapy-response-predictiontooluniverse-drug-target-validationCore Databases
核心数据库
| Database | Scope | Best For |
|---|---|---|
| IMGT | International ImMunoGeneTics; HLA/MHC gene nomenclature and sequences | Authoritative HLA gene info, allele nomenclature, sequence data |
| IEDB | Immune Epitope Database; experimentally validated epitope-MHC data | Epitope binding, MHC restriction, T-cell assay results |
| BVBRC | BV-BRC (formerly PATRIC/IRD); pathogen epitopes | Pathogen-derived epitopes with host MHC context |
| UniProt | Protein function and structure annotations | HLA protein features, domains, variants |
| DGIdb | Drug-Gene Interaction Database | Druggability of HLA-pathway genes |
| PubMed | Biomedical literature | Clinical HLA studies, transplant outcomes |
| 数据库 | 范围 | 最佳用途 |
|---|---|---|
| IMGT | 国际免疫遗传学数据库;HLA/MHC基因命名和序列数据 | 权威HLA基因信息、等位基因命名、序列数据 |
| IEDB | 免疫表位数据库;经实验验证的表位-MHC数据 | 表位结合、MHC限制性、T细胞检测结果 |
| BVBRC | BV-BRC(原PATRIC/IRD);病原体表位 | 带有宿主MHC背景的病原体衍生表位 |
| UniProt | 蛋白质功能和结构注释 | HLA蛋白质特征、结构域、变异体 |
| DGIdb | 药物-基因相互作用数据库 | HLA通路基因的成药性 |
| PubMed | 生物医学文献 | HLA临床研究、移植结果 |
Workflow Overview
工作流程概述
Phase 0: Query Parsing & HLA Disambiguation
Resolve allele names, identify MHC class, confirm species
|
Phase 1: HLA Gene Lookup
IMGT gene info, allele details, sequence data
|
Phase 2: MHC Binding & Restriction
IEDB MHC binding data, allele-specific peptide repertoire
|
Phase 3: Epitope-MHC Associations
IEDB/BVBRC epitope search, pathogen-specific epitopes
|
Phase 4: Functional Annotation
UniProt protein features, structural domains
|
Phase 5: Clinical & Therapeutic Context
DGIdb druggability, PubMed clinical evidence
|
Phase 6: Report Synthesis
Integrated immunogenomics reportPhase 0: Query Parsing & HLA Disambiguation
Resolve allele names, identify MHC class, confirm species
|
Phase 1: HLA Gene Lookup
IMGT gene info, allele details, sequence data
|
Phase 2: MHC Binding & Restriction
IEDB MHC binding data, allele-specific peptide repertoire
|
Phase 3: Epitope-MHC Associations
IEDB/BVBRC epitope search, pathogen-specific epitopes
|
Phase 4: Functional Annotation
UniProt protein features, structural domains
|
Phase 5: Clinical & Therapeutic Context
DGIdb druggability, PubMed clinical evidence
|
Phase 6: Report Synthesis
Integrated immunogenomics reportPhase Details
阶段详情
Phase 0: Query Parsing & HLA Disambiguation
Phase 0: 查询解析与HLA歧义消除
Parse the user's input to identify:
- HLA allele (e.g., HLA-A02:01, HLA-DRB104:01) -- note resolution level (2-digit vs 4-digit)
- MHC class (I or II) -- determines binding groove structure and peptide length
- Pathogen or antigen (e.g., SARS-CoV-2 spike, influenza HA)
- Clinical context (transplant, vaccine, autoimmunity, cancer)
HLA nomenclature quick reference:
- = gene A, allele group 02, specific protein 01
HLA-A*02:01 - Class I: HLA-A, HLA-B, HLA-C (present to CD8+ T cells, peptides 8-11 aa)
- Class II: HLA-DR, HLA-DQ, HLA-DP (present to CD4+ T cells, peptides 13-25 aa)
解析用户输入以确定:
- HLA等位基因(例如HLA-A02:01、HLA-DRB104:01)——注意分辨率级别(2位 vs 4位)
- MHC类别(I类或II类)——决定结合槽结构和肽长度
- 病原体或抗原(例如SARS-CoV-2刺突蛋白、流感HA)
- 临床背景(移植、疫苗、自身免疫、癌症)
HLA命名快速参考:
- = A基因,等位基因群02,特异性蛋白01
HLA-A*02:01 - I类:HLA-A、HLA-B、HLA-C(呈递给CD8+ T细胞,肽长度8-11氨基酸)
- II类:HLA-DR、HLA-DQ、HLA-DP(呈递给CD4+ T细胞,肽长度13-25氨基酸)
Phase 1: HLA Gene Lookup
Phase 1: HLA基因查询
Objective: Get authoritative gene and allele information from IMGT.
Tools:
- -- search for HLA/MHC genes
IMGT_search_genes- Input: (gene name or keyword), optional
query,specieslocus - Output: gene list with nomenclature, locus, species
- Input:
- -- get detailed gene/allele information
IMGT_get_gene_info- Input: (IMGT gene name)
gene_name - Output: allele sequences, functional status, reference sequences
- Input:
Workflow:
- Search IMGT for the target HLA gene or allele
- Retrieve full gene details including functional status and sequence
- Note the number of known alleles (HLA-A has >7,000; HLA-B has >8,000)
- Identify whether the allele is commonly studied or rare
If allele not found: Check nomenclature -- older names may have been reassigned. Try searching by the gene name alone (e.g., "HLA-A") and filtering results.
目标:从IMGT获取权威的基因和等位基因信息。
工具:
- ——搜索HLA/MHC基因
IMGT_search_genes- 输入:(基因名称或关键词),可选
query、specieslocus - 输出:包含命名、基因座、物种的基因列表
- 输入:
- ——获取详细的基因/等位基因信息
IMGT_get_gene_info- 输入:(IMGT基因名称)
gene_name - 输出:等位基因序列、功能状态、参考序列
- 输入:
工作流程:
- 在IMGT中搜索目标HLA基因或等位基因
- 获取包括功能状态和序列在内的完整基因详情
- 记录已知等位基因的数量(HLA-A有超过7000个;HLA-B有超过8000个)
- 确定该等位基因是常见研究对象还是稀有等位基因
若未找到等位基因:检查命名规范——旧名称可能已被重新分配。尝试仅通过基因名称搜索(例如“HLA-A”)并筛选结果。
Phase 2: MHC Binding & Restriction
Phase 2: MHC结合与限制性分析
Objective: Find what peptides bind to a specific MHC molecule, or what MHC molecules present a given peptide.
Tools:
- -- search for MHC molecules in IEDB
iedb_search_mhc- Input: (allele name), optional
mhc_restrictionmhc_class - Output: MHC molecules with binding data counts
- Input:
- -- get MHC binding details for an epitope
iedb_get_epitope_mhc- Input: (IEDB epitope ID)
epitope_id - Output: MHC restriction data, binding assay results, IC50 values
- Input:
Workflow:
- Search IEDB for the target MHC allele to see available binding data
- For specific epitope-MHC pairs, retrieve binding assay details
- Note binding affinity (IC50 < 500 nM is typically considered a binder for class I)
- Distinguish between binding assays (in vitro) and T-cell assays (functional)
Binding affinity interpretation (Class I):
- Strong binder: IC50 < 50 nM
- Moderate binder: IC50 50-500 nM
- Weak binder: IC50 500-5000 nM
- Non-binder: IC50 > 5000 nM
目标:找出哪些肽能结合特定MHC分子,或哪些MHC分子能呈递给定肽。
工具:
- ——在IEDB中搜索MHC分子
iedb_search_mhc- 输入:(等位基因名称),可选
mhc_restrictionmhc_class - 输出:带有结合数据计数的MHC分子列表
- 输入:
- ——获取表位的MHC结合详情
iedb_get_epitope_mhc- 输入:(IEDB表位ID)
epitope_id - 输出:MHC限制性数据、结合检测结果、IC50值
- 输入:
工作流程:
- 在IEDB中搜索目标MHC等位基因以查看可用的结合数据
- 针对特定表位-MHC对,获取结合检测详情
- 记录结合亲和力(I类通常认为IC50 < 500 nM为结合者)
- 区分结合检测(体外)和T细胞检测(功能性)
结合亲和力解读(I类):
- 强结合者:IC50 < 50 nM
- 中等结合者:IC50 50-500 nM
- 弱结合者:IC50 500-5000 nM
- 非结合者:IC50 > 5000 nM
Phase 3: Epitope-MHC Associations
Phase 3: 表位-MHC关联分析
Objective: Find epitopes from specific pathogens or antigens and their MHC restriction.
Tools:
- -- search for experimentally validated epitopes
iedb_search_epitopes- Input: (source organism),
organism_name(protein name)source_antigen_name - Output: epitope list with sequence, MHC restriction, assay results
- Input:
- -- search pathogen-derived epitopes
BVBRC_search_epitopes- Input: (pathogen or antigen keyword), optional
query,hostlimit - Output: epitopes with host MHC context, assay type
- Input:
Workflow:
- Search IEDB for epitopes from the target pathogen/antigen
- Supplement with BVBRC for additional pathogen-specific epitopes
- Filter by the MHC allele of interest if specified
- Categorize by assay type: binding assay, T-cell assay (IFN-gamma, cytotoxicity), MHC multimer
Important: IEDB epitopes are experimentally validated, not predicted. The absence of an epitope does not mean it won't bind -- it may simply be untested.
Population coverage for vaccine design: When selecting epitopes for a vaccine, check how common the restricting HLA allele is in the target population. An epitope restricted to HLA-A*02:01 covers ~50% of Europeans but <15% of some African populations. For broad population coverage, select epitopes across multiple HLA supertypes (A2, A3, B7, B44 cover >95% of most populations).
目标:找出特定病原体或抗原的表位及其MHC限制性。
工具:
- ——搜索经实验验证的表位
iedb_search_epitopes- 输入:(来源生物)、
organism_name(蛋白质名称)source_antigen_name - 输出:包含序列、MHC限制性、检测结果的表位列表
- 输入:
- ——搜索病原体衍生表位
BVBRC_search_epitopes- 输入:(病原体或抗原关键词),可选
query、hostlimit - 输出:带有宿主MHC背景、检测类型的表位
- 输入:
工作流程:
- 在IEDB中搜索目标病原体/抗原的表位
- 补充使用BVBRC获取额外的病原体特异性表位
- 若指定,按目标MHC等位基因筛选
- 按检测类型分类:结合检测、T细胞检测(IFN-γ、细胞毒性)、MHC多聚体
重要提示:IEDB表位均经过实验验证,而非预测。某表位不存在并不意味着它无法结合——可能只是尚未被测试。
疫苗设计的人群覆盖率:选择疫苗表位时,检查限制性HLA等位基因在目标人群中的常见程度。受HLA-A*02:01限制的表位覆盖约50%的欧洲人群,但在部分非洲人群中覆盖率不足15%。为实现广泛人群覆盖,需选择跨多个HLA超型(A2、A3、B7、B44覆盖大多数人群的95%以上)的表位。
Phase 4: Functional Annotation
Phase 4: 功能注释
Objective: Get protein-level features for HLA molecules and related proteins.
Tools:
- -- search for HLA protein entries
UniProt_search- Input: (protein/gene name), optional
query,organismlimit - Output: protein entries with accession, function, features
- Input:
Workflow:
- Search UniProt for the HLA protein (e.g., "HLA-A human")
- Extract functional domains: signal peptide, alpha chains, transmembrane region
- Note polymorphic positions that define allele specificity
- Check for structural data (PDB cross-references)
目标:获取HLA分子及相关蛋白的蛋白质层面特征。
工具:
- ——搜索HLA蛋白质条目
UniProt_search- 输入:(蛋白质/基因名称),可选
query、organismlimit - 输出:包含登录号、功能、特征的蛋白质条目
- 输入:
工作流程:
- 在UniProt中搜索HLA蛋白质(例如“HLA-A human”)
- 提取功能结构域:信号肽、α链、跨膜区
- 记录定义等位基因特异性的多态性位点
- 检查结构数据(PDB交叉引用)
Phase 5: Clinical & Therapeutic Context
Phase 5: 临床与治疗背景分析
Objective: Connect HLA findings to drug interactions and clinical evidence.
Tools:
- -- find drugs targeting HLA-pathway genes
DGIdb_get_drug_gene_interactions- Input: (list of gene names, e.g., ["HLA-A", "B2M"])
genes - Output: drug-gene interactions, interaction types, sources
- Input:
- -- find clinical HLA studies
PubMed_search_articles- Input: (search term), optional
querylimit - Output: articles with title, abstract, PMID
- Input:
Workflow:
- Query DGIdb for drug interactions with relevant HLA genes
- Search PubMed for clinical studies (transplant outcomes, pharmacogenomics, disease associations)
- For transplant queries, look for HLA matching guidelines and outcomes data
- For pharmacogenomics, note HLA alleles linked to drug hypersensitivity (e.g., HLA-B*57:01 and abacavir)
Well-known HLA-drug associations (for context, always verify with current data):
- HLA-B*57:01: abacavir hypersensitivity
- HLA-B*15:02: carbamazepine SJS/TEN (Southeast Asian populations)
- HLA-B*58:01: allopurinol hypersensitivity
- HLA-A*31:01: carbamazepine drug reaction (European populations)
目标:将HLA研究结果与药物相互作用和临床证据关联起来。
工具:
- ——找出靶向HLA通路基因的药物
DGIdb_get_drug_gene_interactions- 输入:(基因名称列表,例如["HLA-A", "B2M"])
genes - 输出:药物-基因相互作用、相互作用类型、来源
- 输入:
- ——查找HLA临床研究
PubMed_search_articles- 输入:(搜索词),可选
querylimit - 输出:包含标题、摘要、PMID的文章列表
- 输入:
工作流程:
- 查询DGIdb获取相关HLA基因的药物相互作用信息
- 搜索PubMed获取临床研究(移植结果、药物基因组学、疾病关联)
- 针对移植查询,查找HLA匹配指南和结果数据
- 针对药物基因组学,记录与药物超敏反应相关的HLA等位基因(例如HLA-B*57:01与阿巴卡韦)
知名HLA-药物关联(仅供参考,需始终使用最新数据验证):
- HLA-B*57:01:阿巴卡韦超敏反应
- HLA-B*15:02:卡马西平引发SJS/TEN(东南亚人群)
- HLA-B*58:01:别嘌醇超敏反应
- HLA-A*31:01:卡马西平药物反应(欧洲人群)
Phase 6: Report Synthesis
Phase 6: 报告合成
Structure the report as:
- HLA Context -- gene/allele identification, MHC class, population frequency if available
- Binding Profile -- peptide repertoire, binding affinity distribution
- Epitope Landscape -- pathogen-specific epitopes, assay evidence
- Protein Features -- structural domains, polymorphic sites
- Clinical Relevance -- transplant implications, drug associations, disease links
- Evidence Summary -- graded by source (IEDB experimental > computational prediction > literature mention)
报告结构如下:
- HLA背景——基因/等位基因识别、MHC类别、可用的人群频率
- 结合特征——肽库、结合亲和力分布
- 表位图谱——病原体特异性表位、检测证据
- 蛋白质特征——结构域、多态性位点
- 临床相关性——移植意义、药物关联、疾病关联
- 证据总结——按来源分级(IEDB实验数据 > 计算预测 > 文献提及)
Edge Cases & Fallbacks
边缘情况与备选方案
- Ambiguous allele name: Ask user for resolution level. "HLA-A2" could mean HLA-A02:01 or the broader A02 group
- No IEDB data for allele: Common for rare alleles. Note the gap; suggest computational prediction tools
- Cross-species MHC: IMGT covers multiple species. Confirm species context for non-human queries (e.g., H-2 for mouse)
- BVBRC empty results: Try broader organism name or use IEDB as primary source
- 等位基因名称模糊:询问用户分辨率级别。“HLA-A2”可能指HLA-A02:01或更广泛的A02组
- 等位基因无IEDB数据:稀有等位基因常见此情况。注明数据缺口;建议使用计算预测工具
- 跨物种MHC:IMGT涵盖多个物种。针对非人类查询确认物种背景(例如小鼠的H-2)
- BVBRC无结果:尝试更宽泛的生物名称,或使用IEDB作为主要数据源
Limitations
局限性
- No binding prediction: This skill queries experimental databases, not prediction algorithms (NetMHCpan, MHCflurry). It tells you what has been measured, not what might bind
- Population frequency gaps: HLA allele frequencies vary dramatically by ethnicity; databases may not cover all populations equally
- Class II complexity: Class II molecules are heterodimers (alpha + beta chains); binding prediction and data are less mature than for Class I
- Epitope completeness: IEDB coverage is biased toward well-studied pathogens (HIV, influenza, SARS-CoV-2) and common HLA alleles
- 无结合预测功能:本技能查询实验数据库,而非预测算法(NetMHCpan、MHCflurry)。它提供已测得的数据,而非可能结合的结果
- 人群频率缺口:HLA等位基因频率因种族差异巨大;数据库可能无法平等覆盖所有人群
- II类复杂性:II类分子为异二聚体(α链+β链);结合预测和数据的成熟度低于I类
- 表位完整性不足:IEDB的覆盖偏向于研究充分的病原体(HIV、流感、SARS-CoV-2)和常见HLA等位基因