tooluniverse-hla-immunogenomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HLA & Immunogenomics Analysis

HLA与免疫基因组学分析

Pipeline for exploring HLA gene families, MHC-peptide binding, epitope associations, and their clinical implications in transplantation, vaccine development, and cancer immunotherapy. Bridges immunogenetic databases (IMGT, IEDB) with functional annotation (UniProt) and druggability data (DGIdb).
用于探索HLA基因家族、MHC-肽结合、表位关联及其在移植、疫苗开发和癌症免疫治疗中临床意义的流程。将免疫遗传学数据库(IMGT、IEDB)与功能注释(UniProt)和成药性数据(DGIdb)相结合。

Reasoning Strategy

推理策略

HLA analysis is fundamentally about peptide presentation: the polymorphism of HLA molecules determines which peptides are displayed to T cells, which in turn governs disease susceptibility, transplant rejection, drug hypersensitivity, and vaccine immunogenicity. HLA type affects disease susceptibility for autoimmune conditions (HLA-B27 and ankylosing spondylitis), transplant rejection (HLA mismatch drives alloresponse), drug hypersensitivity (abacavir causes severe hypersensitivity reactions only in HLA-B*57:01 carriers), and vaccine design (epitopes must be presented by the recipient's HLA alleles to elicit a T-cell response). Class I and Class II HLA molecules have fundamentally different binding grooves, peptide lengths, and T-cell partners — never conflate them. The absence of an epitope from IEDB means it has not been tested, not that it cannot bind.
LOOK UP DON'T GUESS: Never assume an allele's binding properties or population frequency — query IEDB for experimental binding data and IMGT for allele annotation. Do not guess which HLA alleles are common in a population; look up published frequency data via PubMed.
Guiding principles:
  1. HLA nomenclature precision -- HLA allele names follow strict conventions (e.g., HLA-A*02:01); get the resolution level right
  2. MHC class awareness -- Class I (A, B, C) and Class II (DR, DQ, DP) have different binding properties and clinical roles
  3. Species context -- most queries target human HLA, but MHC exists across vertebrates; confirm species early
  4. Evidence layering -- combine binding data (IEDB) with gene annotation (IMGT) and structural context (UniProt)
  5. Clinical translation -- connect molecular findings to transplant matching, vaccine targets, or immunotherapy response
  6. English-first queries -- use English terms in all tool calls; respond in the user's language

HLA分析的核心是肽呈递:HLA分子的多态性决定了哪些肽会被呈递给T细胞,进而影响疾病易感性、移植排斥、药物超敏反应和疫苗免疫原性。HLA分型会影响自身免疫性疾病的易感性(如HLA-B27与强直性脊柱炎)、移植排斥(HLA错配驱动同种异体反应)、药物超敏反应(阿巴卡韦仅在HLA-B*57:01携带者中引发严重超敏反应)以及疫苗设计(表位必须被受者的HLA等位基因呈递才能引发T细胞反应)。I类和II类HLA分子具有完全不同的结合槽、肽长度和T细胞伴侣——切勿混淆两者。IEDB中没有某一表位的数据仅表示该表位未被测试,而非无法结合。
查资料,勿猜测:切勿假设等位基因的结合特性或群体频率——查询IEDB获取实验结合数据,查询IMGT获取等位基因注释信息。不要猜测某个人群中常见的HLA等位基因;通过PubMed查找已发表的频率数据。
指导原则
  1. HLA命名精准性——HLA等位基因名称遵循严格的规范(例如HLA-A*02:01);确保分辨率级别正确
  2. MHC类别意识——I类(A、B、C)和II类(DR、DQ、DP)具有不同的结合特性和临床作用
  3. 物种背景——大多数查询针对人类HLA,但MHC存在于所有脊椎动物中;尽早确认物种
  4. 证据分层——结合结合数据(IEDB)、基因注释(IMGT)和结构背景(UniProt)
  5. 临床转化——将分子研究结果与移植匹配、疫苗靶点或免疫治疗反应关联起来
  6. 优先英文查询——在所有工具调用中使用英文术语;以用户使用的语言回复

COMPUTE, DON'T DESCRIBE

计算,而非描述

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你会做什么——直接执行并报告实际结果。使用ToolUniverse工具检索数据,然后用Python(pandas、scipy、statsmodels、matplotlib)进行分析。

When to Use

使用场景

Typical triggers:
  • "Look up HLA-A*02:01 binding peptides"
  • "What epitopes are presented by MHC class I for [pathogen]?"
  • "Find HLA gene information for [allele]"
  • "What MHC molecules bind [peptide/antigen]?"
  • "Assess HLA associations for [disease]"
  • "Find immunogenic epitopes for [virus/protein]"
  • "What drugs target HLA-related pathways?"
Not this skill: For full neoantigen prediction pipelines, use
tooluniverse-immunotherapy-response-prediction
. For general gene function lookup, use
tooluniverse-drug-target-validation
.

典型触发场景:
  • "查找HLA-A*02:01的结合肽"
  • "[病原体]的I类MHC呈递表位有哪些?"
  • "查找[等位基因]的HLA基因信息"
  • "哪些MHC分子能结合[肽/抗原]?"
  • "评估[疾病]的HLA关联"
  • "查找[病毒/蛋白质]的免疫原性表位"
  • "哪些药物靶向HLA相关通路?"
不属于本技能范畴:如需完整的新抗原预测流程,请使用
tooluniverse-immunotherapy-response-prediction
。如需通用基因功能查询,请使用
tooluniverse-drug-target-validation

Core Databases

核心数据库

DatabaseScopeBest For
IMGTInternational ImMunoGeneTics; HLA/MHC gene nomenclature and sequencesAuthoritative HLA gene info, allele nomenclature, sequence data
IEDBImmune Epitope Database; experimentally validated epitope-MHC dataEpitope binding, MHC restriction, T-cell assay results
BVBRCBV-BRC (formerly PATRIC/IRD); pathogen epitopesPathogen-derived epitopes with host MHC context
UniProtProtein function and structure annotationsHLA protein features, domains, variants
DGIdbDrug-Gene Interaction DatabaseDruggability of HLA-pathway genes
PubMedBiomedical literatureClinical HLA studies, transplant outcomes

数据库范围最佳用途
IMGT国际免疫遗传学数据库;HLA/MHC基因命名和序列数据权威HLA基因信息、等位基因命名、序列数据
IEDB免疫表位数据库;经实验验证的表位-MHC数据表位结合、MHC限制性、T细胞检测结果
BVBRCBV-BRC(原PATRIC/IRD);病原体表位带有宿主MHC背景的病原体衍生表位
UniProt蛋白质功能和结构注释HLA蛋白质特征、结构域、变异体
DGIdb药物-基因相互作用数据库HLA通路基因的成药性
PubMed生物医学文献HLA临床研究、移植结果

Workflow Overview

工作流程概述

Phase 0: Query Parsing & HLA Disambiguation
  Resolve allele names, identify MHC class, confirm species
    |
Phase 1: HLA Gene Lookup
  IMGT gene info, allele details, sequence data
    |
Phase 2: MHC Binding & Restriction
  IEDB MHC binding data, allele-specific peptide repertoire
    |
Phase 3: Epitope-MHC Associations
  IEDB/BVBRC epitope search, pathogen-specific epitopes
    |
Phase 4: Functional Annotation
  UniProt protein features, structural domains
    |
Phase 5: Clinical & Therapeutic Context
  DGIdb druggability, PubMed clinical evidence
    |
Phase 6: Report Synthesis
  Integrated immunogenomics report

Phase 0: Query Parsing & HLA Disambiguation
  Resolve allele names, identify MHC class, confirm species
    |
Phase 1: HLA Gene Lookup
  IMGT gene info, allele details, sequence data
    |
Phase 2: MHC Binding & Restriction
  IEDB MHC binding data, allele-specific peptide repertoire
    |
Phase 3: Epitope-MHC Associations
  IEDB/BVBRC epitope search, pathogen-specific epitopes
    |
Phase 4: Functional Annotation
  UniProt protein features, structural domains
    |
Phase 5: Clinical & Therapeutic Context
  DGIdb druggability, PubMed clinical evidence
    |
Phase 6: Report Synthesis
  Integrated immunogenomics report

Phase Details

阶段详情

Phase 0: Query Parsing & HLA Disambiguation

Phase 0: 查询解析与HLA歧义消除

Parse the user's input to identify:
  • HLA allele (e.g., HLA-A02:01, HLA-DRB104:01) -- note resolution level (2-digit vs 4-digit)
  • MHC class (I or II) -- determines binding groove structure and peptide length
  • Pathogen or antigen (e.g., SARS-CoV-2 spike, influenza HA)
  • Clinical context (transplant, vaccine, autoimmunity, cancer)
HLA nomenclature quick reference:
  • HLA-A*02:01
    = gene A, allele group 02, specific protein 01
  • Class I: HLA-A, HLA-B, HLA-C (present to CD8+ T cells, peptides 8-11 aa)
  • Class II: HLA-DR, HLA-DQ, HLA-DP (present to CD4+ T cells, peptides 13-25 aa)
解析用户输入以确定:
  • HLA等位基因(例如HLA-A02:01、HLA-DRB104:01)——注意分辨率级别(2位 vs 4位)
  • MHC类别(I类或II类)——决定结合槽结构和肽长度
  • 病原体或抗原(例如SARS-CoV-2刺突蛋白、流感HA)
  • 临床背景(移植、疫苗、自身免疫、癌症)
HLA命名快速参考:
  • HLA-A*02:01
    = A基因,等位基因群02,特异性蛋白01
  • I类:HLA-A、HLA-B、HLA-C(呈递给CD8+ T细胞,肽长度8-11氨基酸)
  • II类:HLA-DR、HLA-DQ、HLA-DP(呈递给CD4+ T细胞,肽长度13-25氨基酸)

Phase 1: HLA Gene Lookup

Phase 1: HLA基因查询

Objective: Get authoritative gene and allele information from IMGT.
Tools:
  • IMGT_search_genes
    -- search for HLA/MHC genes
    • Input:
      query
      (gene name or keyword), optional
      species
      ,
      locus
    • Output: gene list with nomenclature, locus, species
  • IMGT_get_gene_info
    -- get detailed gene/allele information
    • Input:
      gene_name
      (IMGT gene name)
    • Output: allele sequences, functional status, reference sequences
Workflow:
  1. Search IMGT for the target HLA gene or allele
  2. Retrieve full gene details including functional status and sequence
  3. Note the number of known alleles (HLA-A has >7,000; HLA-B has >8,000)
  4. Identify whether the allele is commonly studied or rare
If allele not found: Check nomenclature -- older names may have been reassigned. Try searching by the gene name alone (e.g., "HLA-A") and filtering results.
目标:从IMGT获取权威的基因和等位基因信息。
工具
  • IMGT_search_genes
    ——搜索HLA/MHC基因
    • 输入:
      query
      (基因名称或关键词),可选
      species
      locus
    • 输出:包含命名、基因座、物种的基因列表
  • IMGT_get_gene_info
    ——获取详细的基因/等位基因信息
    • 输入:
      gene_name
      (IMGT基因名称)
    • 输出:等位基因序列、功能状态、参考序列
工作流程
  1. 在IMGT中搜索目标HLA基因或等位基因
  2. 获取包括功能状态和序列在内的完整基因详情
  3. 记录已知等位基因的数量(HLA-A有超过7000个;HLA-B有超过8000个)
  4. 确定该等位基因是常见研究对象还是稀有等位基因
若未找到等位基因:检查命名规范——旧名称可能已被重新分配。尝试仅通过基因名称搜索(例如“HLA-A”)并筛选结果。

Phase 2: MHC Binding & Restriction

Phase 2: MHC结合与限制性分析

Objective: Find what peptides bind to a specific MHC molecule, or what MHC molecules present a given peptide.
Tools:
  • iedb_search_mhc
    -- search for MHC molecules in IEDB
    • Input:
      mhc_restriction
      (allele name), optional
      mhc_class
    • Output: MHC molecules with binding data counts
  • iedb_get_epitope_mhc
    -- get MHC binding details for an epitope
    • Input:
      epitope_id
      (IEDB epitope ID)
    • Output: MHC restriction data, binding assay results, IC50 values
Workflow:
  1. Search IEDB for the target MHC allele to see available binding data
  2. For specific epitope-MHC pairs, retrieve binding assay details
  3. Note binding affinity (IC50 < 500 nM is typically considered a binder for class I)
  4. Distinguish between binding assays (in vitro) and T-cell assays (functional)
Binding affinity interpretation (Class I):
  • Strong binder: IC50 < 50 nM
  • Moderate binder: IC50 50-500 nM
  • Weak binder: IC50 500-5000 nM
  • Non-binder: IC50 > 5000 nM
目标:找出哪些肽能结合特定MHC分子,或哪些MHC分子能呈递给定肽。
工具
  • iedb_search_mhc
    ——在IEDB中搜索MHC分子
    • 输入:
      mhc_restriction
      (等位基因名称),可选
      mhc_class
    • 输出:带有结合数据计数的MHC分子列表
  • iedb_get_epitope_mhc
    ——获取表位的MHC结合详情
    • 输入:
      epitope_id
      (IEDB表位ID)
    • 输出:MHC限制性数据、结合检测结果、IC50值
工作流程
  1. 在IEDB中搜索目标MHC等位基因以查看可用的结合数据
  2. 针对特定表位-MHC对,获取结合检测详情
  3. 记录结合亲和力(I类通常认为IC50 < 500 nM为结合者)
  4. 区分结合检测(体外)和T细胞检测(功能性)
结合亲和力解读(I类):
  • 强结合者:IC50 < 50 nM
  • 中等结合者:IC50 50-500 nM
  • 弱结合者:IC50 500-5000 nM
  • 非结合者:IC50 > 5000 nM

Phase 3: Epitope-MHC Associations

Phase 3: 表位-MHC关联分析

Objective: Find epitopes from specific pathogens or antigens and their MHC restriction.
Tools:
  • iedb_search_epitopes
    -- search for experimentally validated epitopes
    • Input:
      organism_name
      (source organism),
      source_antigen_name
      (protein name)
    • Output: epitope list with sequence, MHC restriction, assay results
  • BVBRC_search_epitopes
    -- search pathogen-derived epitopes
    • Input:
      query
      (pathogen or antigen keyword), optional
      host
      ,
      limit
    • Output: epitopes with host MHC context, assay type
Workflow:
  1. Search IEDB for epitopes from the target pathogen/antigen
  2. Supplement with BVBRC for additional pathogen-specific epitopes
  3. Filter by the MHC allele of interest if specified
  4. Categorize by assay type: binding assay, T-cell assay (IFN-gamma, cytotoxicity), MHC multimer
Important: IEDB epitopes are experimentally validated, not predicted. The absence of an epitope does not mean it won't bind -- it may simply be untested.
Population coverage for vaccine design: When selecting epitopes for a vaccine, check how common the restricting HLA allele is in the target population. An epitope restricted to HLA-A*02:01 covers ~50% of Europeans but <15% of some African populations. For broad population coverage, select epitopes across multiple HLA supertypes (A2, A3, B7, B44 cover >95% of most populations).
目标:找出特定病原体或抗原的表位及其MHC限制性。
工具
  • iedb_search_epitopes
    ——搜索经实验验证的表位
    • 输入:
      organism_name
      (来源生物)、
      source_antigen_name
      (蛋白质名称)
    • 输出:包含序列、MHC限制性、检测结果的表位列表
  • BVBRC_search_epitopes
    ——搜索病原体衍生表位
    • 输入:
      query
      (病原体或抗原关键词),可选
      host
      limit
    • 输出:带有宿主MHC背景、检测类型的表位
工作流程
  1. 在IEDB中搜索目标病原体/抗原的表位
  2. 补充使用BVBRC获取额外的病原体特异性表位
  3. 若指定,按目标MHC等位基因筛选
  4. 按检测类型分类:结合检测、T细胞检测(IFN-γ、细胞毒性)、MHC多聚体
重要提示:IEDB表位均经过实验验证,而非预测。某表位不存在并不意味着它无法结合——可能只是尚未被测试。
疫苗设计的人群覆盖率:选择疫苗表位时,检查限制性HLA等位基因在目标人群中的常见程度。受HLA-A*02:01限制的表位覆盖约50%的欧洲人群,但在部分非洲人群中覆盖率不足15%。为实现广泛人群覆盖,需选择跨多个HLA超型(A2、A3、B7、B44覆盖大多数人群的95%以上)的表位。

Phase 4: Functional Annotation

Phase 4: 功能注释

Objective: Get protein-level features for HLA molecules and related proteins.
Tools:
  • UniProt_search
    -- search for HLA protein entries
    • Input:
      query
      (protein/gene name), optional
      organism
      ,
      limit
    • Output: protein entries with accession, function, features
Workflow:
  1. Search UniProt for the HLA protein (e.g., "HLA-A human")
  2. Extract functional domains: signal peptide, alpha chains, transmembrane region
  3. Note polymorphic positions that define allele specificity
  4. Check for structural data (PDB cross-references)
目标:获取HLA分子及相关蛋白的蛋白质层面特征。
工具
  • UniProt_search
    ——搜索HLA蛋白质条目
    • 输入:
      query
      (蛋白质/基因名称),可选
      organism
      limit
    • 输出:包含登录号、功能、特征的蛋白质条目
工作流程
  1. 在UniProt中搜索HLA蛋白质(例如“HLA-A human”)
  2. 提取功能结构域:信号肽、α链、跨膜区
  3. 记录定义等位基因特异性的多态性位点
  4. 检查结构数据(PDB交叉引用)

Phase 5: Clinical & Therapeutic Context

Phase 5: 临床与治疗背景分析

Objective: Connect HLA findings to drug interactions and clinical evidence.
Tools:
  • DGIdb_get_drug_gene_interactions
    -- find drugs targeting HLA-pathway genes
    • Input:
      genes
      (list of gene names, e.g., ["HLA-A", "B2M"])
    • Output: drug-gene interactions, interaction types, sources
  • PubMed_search_articles
    -- find clinical HLA studies
    • Input:
      query
      (search term), optional
      limit
    • Output: articles with title, abstract, PMID
Workflow:
  1. Query DGIdb for drug interactions with relevant HLA genes
  2. Search PubMed for clinical studies (transplant outcomes, pharmacogenomics, disease associations)
  3. For transplant queries, look for HLA matching guidelines and outcomes data
  4. For pharmacogenomics, note HLA alleles linked to drug hypersensitivity (e.g., HLA-B*57:01 and abacavir)
Well-known HLA-drug associations (for context, always verify with current data):
  • HLA-B*57:01: abacavir hypersensitivity
  • HLA-B*15:02: carbamazepine SJS/TEN (Southeast Asian populations)
  • HLA-B*58:01: allopurinol hypersensitivity
  • HLA-A*31:01: carbamazepine drug reaction (European populations)
目标:将HLA研究结果与药物相互作用和临床证据关联起来。
工具
  • DGIdb_get_drug_gene_interactions
    ——找出靶向HLA通路基因的药物
    • 输入:
      genes
      (基因名称列表,例如["HLA-A", "B2M"])
    • 输出:药物-基因相互作用、相互作用类型、来源
  • PubMed_search_articles
    ——查找HLA临床研究
    • 输入:
      query
      (搜索词),可选
      limit
    • 输出:包含标题、摘要、PMID的文章列表
工作流程
  1. 查询DGIdb获取相关HLA基因的药物相互作用信息
  2. 搜索PubMed获取临床研究(移植结果、药物基因组学、疾病关联)
  3. 针对移植查询,查找HLA匹配指南和结果数据
  4. 针对药物基因组学,记录与药物超敏反应相关的HLA等位基因(例如HLA-B*57:01与阿巴卡韦)
知名HLA-药物关联(仅供参考,需始终使用最新数据验证):
  • HLA-B*57:01:阿巴卡韦超敏反应
  • HLA-B*15:02:卡马西平引发SJS/TEN(东南亚人群)
  • HLA-B*58:01:别嘌醇超敏反应
  • HLA-A*31:01:卡马西平药物反应(欧洲人群)

Phase 6: Report Synthesis

Phase 6: 报告合成

Structure the report as:
  1. HLA Context -- gene/allele identification, MHC class, population frequency if available
  2. Binding Profile -- peptide repertoire, binding affinity distribution
  3. Epitope Landscape -- pathogen-specific epitopes, assay evidence
  4. Protein Features -- structural domains, polymorphic sites
  5. Clinical Relevance -- transplant implications, drug associations, disease links
  6. Evidence Summary -- graded by source (IEDB experimental > computational prediction > literature mention)

报告结构如下:
  1. HLA背景——基因/等位基因识别、MHC类别、可用的人群频率
  2. 结合特征——肽库、结合亲和力分布
  3. 表位图谱——病原体特异性表位、检测证据
  4. 蛋白质特征——结构域、多态性位点
  5. 临床相关性——移植意义、药物关联、疾病关联
  6. 证据总结——按来源分级(IEDB实验数据 > 计算预测 > 文献提及)

Edge Cases & Fallbacks

边缘情况与备选方案

  • Ambiguous allele name: Ask user for resolution level. "HLA-A2" could mean HLA-A02:01 or the broader A02 group
  • No IEDB data for allele: Common for rare alleles. Note the gap; suggest computational prediction tools
  • Cross-species MHC: IMGT covers multiple species. Confirm species context for non-human queries (e.g., H-2 for mouse)
  • BVBRC empty results: Try broader organism name or use IEDB as primary source

  • 等位基因名称模糊:询问用户分辨率级别。“HLA-A2”可能指HLA-A02:01或更广泛的A02组
  • 等位基因无IEDB数据:稀有等位基因常见此情况。注明数据缺口;建议使用计算预测工具
  • 跨物种MHC:IMGT涵盖多个物种。针对非人类查询确认物种背景(例如小鼠的H-2)
  • BVBRC无结果:尝试更宽泛的生物名称,或使用IEDB作为主要数据源

Limitations

局限性

  • No binding prediction: This skill queries experimental databases, not prediction algorithms (NetMHCpan, MHCflurry). It tells you what has been measured, not what might bind
  • Population frequency gaps: HLA allele frequencies vary dramatically by ethnicity; databases may not cover all populations equally
  • Class II complexity: Class II molecules are heterodimers (alpha + beta chains); binding prediction and data are less mature than for Class I
  • Epitope completeness: IEDB coverage is biased toward well-studied pathogens (HIV, influenza, SARS-CoV-2) and common HLA alleles
  • 无结合预测功能:本技能查询实验数据库,而非预测算法(NetMHCpan、MHCflurry)。它提供已测得的数据,而非可能结合的结果
  • 人群频率缺口:HLA等位基因频率因种族差异巨大;数据库可能无法平等覆盖所有人群
  • II类复杂性:II类分子为异二聚体(α链+β链);结合预测和数据的成熟度低于I类
  • 表位完整性不足:IEDB的覆盖偏向于研究充分的病原体(HIV、流感、SARS-CoV-2)和常见HLA等位基因