tooluniverse-regulatory-variant-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCOMPUTE, DON'T DESCRIBE
计算,而非描述
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你会做什么——直接执行并报告实际结果。使用ToolUniverse工具检索数据,然后使用Python(pandas、scipy、statsmodels、matplotlib)进行分析。
Regulatory Variant Analysis Skill
调控变异分析技能
Systematic regulatory variant interpretation: discover trait associations from GWAS, map eQTL effects, annotate chromatin context, assess regulatory element overlap, and produce evidence-graded functional impact predictions for non-coding variants.
系统性调控变异解读:从GWAS中发现性状关联、绘制eQTL效应图谱、注释染色质背景、评估调控元件重叠情况,并为非编码变异生成基于证据分级的功能影响预测。
When to Use
使用场景
- "What GWAS associations exist for rs12913832?"
- "Find eQTLs for the APOE locus in brain tissue"
- "What regulatory elements overlap this variant region?"
- "Which SNPs are associated with type 2 diabetes from GWAS?"
- "Is this intronic variant in an active enhancer?"
- "What is the RegulomeDB score for rs429358?"
- "Find ENCODE histone marks at the BRCA1 promoter region"
- "Map trait ontology terms for 'blood pressure' to EFO IDs"
NOT for (use other skills instead):
- Coding variant pathogenicity -> Use
tooluniverse-variant-interpretation - Full clinical variant classification (ACMG) -> Use
tooluniverse-variant-interpretation - Gene-disease associations (not variant-specific) -> Use
tooluniverse-gene-disease-association - Pharmacogenomic variant annotation -> Use
tooluniverse-pharmacogenomics - Epigenomics data processing (BED/narrowPeak files) -> Use
tooluniverse-epigenomics
- "rs12913832存在哪些GWAS关联?"
- "查找大脑组织中APOE基因座的eQTL"
- "哪些调控元件与该变异区域重叠?"
- "GWAS中哪些SNP与2型糖尿病相关?"
- "这个内含子变异是否位于活跃增强子中?"
- "rs429358的RegulomeDB评分是多少?"
- "查找BRCA1启动子区域的ENCODE组蛋白标记"
- "将'血压'的性状本体术语映射到EFO ID"
不适用场景(请使用其他技能):
- 编码变异致病性 -> 使用
tooluniverse-variant-interpretation - 完整临床变异分类(ACMG标准) -> 使用
tooluniverse-variant-interpretation - 基因-疾病关联(非变异特异性) -> 使用
tooluniverse-gene-disease-association - 药物基因组学变异注释 -> 使用
tooluniverse-pharmacogenomics - 表观基因组学数据处理(BED/narrowPeak文件) -> 使用
tooluniverse-epigenomics
Non-Coding Variant Impact Reasoning
非编码变异影响推理
When evaluating a non-coding variant, build evidence across four questions:
1. Is the variant in a regulatory element?
Use RegulomeDB to assess whether the variant overlaps TF binding sites, chromatin accessibility peaks, or known regulatory annotations. A low RegulomeDB score (categories 1a-2a) indicates strong evidence that the position is functionally active. Confirm with ENCODE histone marks: H3K27ac signals active enhancers and active promoters; H3K4me1 alone marks poised enhancers; H3K4me3 marks active promoters; H3K27me3 marks silenced regions.
2. Does it alter a transcription factor binding site?
Check RegulomeDB's TF binding evidence and ENCODE TF ChIP-seq experiments. A variant that falls within a TF footprint and disrupts the consensus motif is mechanistically actionable, especially if the TF is known to be relevant in the disease tissue.
3. Is there eQTL evidence linking it to a gene?
Query GTEx to determine whether the variant (or variants in tight LD) modulates expression of a nearby gene in a tissue-specific or ubiquitous manner. A tissue-specific eQTL suggests cell-type-specific regulation; a ubiquitous eQTL suggests a core regulatory element. The direction of the NES (positive = alternative allele increases expression, negative = decreases) and effect size matter for interpretation.
4. Is there GWAS evidence for trait association?
Search the GWAS Catalog for the rsID or the surrounding locus. Genome-wide significant associations (p < 5×10⁻⁸) in relevant traits anchor the variant's biological importance. Cross-reference with OpenTargets for locus-to-gene mapping from multiple GWAS studies.
Synthesizing the evidence: Build a multi-layer case. A variant with GWAS significance + eQTL evidence + RegulomeDB score 1a-2a + active chromatin (H3K27ac) in the relevant tissue represents high-confidence regulatory impact. Two or three converging lines of evidence (e.g., eQTL plus active enhancer) constitute moderate confidence. A single line, or a variant only in a poised but not active regulatory context, represents lower confidence.
评估非编码变异时,围绕四个问题构建证据链:
1. 变异是否位于调控元件中?
使用RegulomeDB评估变异是否重叠转录因子结合位点、染色质可及性峰或已知调控注释。较低的RegulomeDB评分(类别1a-2a)表明该位置具有强功能活性证据。结合ENCODE组蛋白标记确认:H3K27ac标记活跃增强子和活跃启动子;仅H3K4me1标记待激活增强子;H3K4me3标记活跃启动子;H3K27me3标记沉默区域。
2. 它是否改变转录因子结合位点?
检查RegulomeDB的转录因子结合证据和ENCODE转录因子ChIP-seq实验。落在转录因子足迹内并破坏共有基序的变异具有机制层面的可操作性,尤其是当该转录因子与疾病组织相关时。
3. 是否存在将其与基因关联的eQTL证据?
查询GTEx以确定该变异(或紧密连锁不平衡的变异)是否以组织特异性或普遍方式调控邻近基因的表达。组织特异性eQTL提示细胞类型特异性调控;普遍存在的eQTL提示核心调控元件。NES的方向(正=替代等位基因增加表达,负=降低表达)和效应量对解读至关重要。
4. 是否存在性状关联的GWAS证据?
在GWAS Catalog中搜索rsID或周围基因座。全基因组显著关联(p < 5×10⁻⁸)的相关性状确立了变异的生物学重要性。结合OpenTargets的多GWAS研究基因座-基因映射数据进行交叉验证。
证据合成:构建多层证据链。具有GWAS显著性 + eQTL证据 + RegulomeDB评分1a-2a + 相关组织中活跃染色质(H3K27ac)的变异代表高置信度调控影响。两到三条一致的证据链(如eQTL加活跃增强子)构成中等置信度。单一证据链,或仅位于待激活而非活跃调控背景中的变异,代表低置信度。
Workflow Overview
工作流程概述
Input (rsID, genomic coordinates, trait/disease, gene)
|
v
Phase 0: Variant/Trait Resolution
Resolve rsIDs, map trait names to EFO/MONDO IDs via OLS
|
v
Phase 1: GWAS Association Lookup
GWAS Catalog associations, p-values, effect sizes, study metadata
|
v
Phase 2: eQTL Analysis
GTEx tissue-specific eQTLs, target gene identification
|
v
Phase 3: Regulatory Element Annotation
ENCODE histone marks, RegulomeDB scores, chromatin state
|
v
Phase 4: OpenTargets GWAS Integration
OpenTargets GWAS study aggregation, locus-to-gene mapping
|
v
Phase 5: Functional Impact Synthesis
Integrate all evidence, assign regulatory impact level
|
v
Phase 6: Report
Evidence-graded regulatory variant report输入(rsID、基因组坐标、性状/疾病、基因)
|
v
阶段0:变异/性状解析
解析rsID,通过OLS将性状名称映射到EFO/MONDO ID
|
v
阶段1:GWAS关联查询
GWAS Catalog关联信息、p值、效应量、研究元数据
|
v
阶段2:eQTL分析
GTEx组织特异性eQTL、靶基因识别
|
v
阶段3:调控元件注释
ENCODE组蛋白标记、RegulomeDB评分、染色质状态
|
v
阶段4:OpenTargets GWAS整合
OpenTargets GWAS研究聚合、基因座-基因映射
|
v
阶段5:功能影响合成
整合所有证据,分配调控影响等级
|
v
阶段6:报告
基于证据分级的调控变异报告Phase 0: Variant/Trait Resolution
阶段0:变异/性状解析
Use to resolve trait names to ontology IDs before GWAS queries. Restrict to for GWAS traits; OpenTargets prefers MONDO IDs (e.g., MONDO_0005148 for type 2 diabetes rather than EFO_0001360). Use (param is , not ) for initial consequence annotation and nearest gene identification.
ols_search_termsontology="efo"EnsemblVEP_annotate_rsidvariant_idrsid在GWAS查询前,使用将性状名称解析为本体ID。GWAS性状限制使用;OpenTargets偏好MONDO ID(例如2型糖尿病使用MONDO_0005148而非EFO_0001360)。使用(参数为而非)进行初始后果注释和邻近基因识别。
ols_search_termsontology="efo"EnsemblVEP_annotate_rsidvariant_idrsidPhase 1: GWAS Association Lookup
阶段1:GWAS关联查询
gwas_search_associationsdisease_traitefo_idrs_idp_valuep_value=5e-8gwas_get_variants_for_traitgwas_get_snps_for_geneReasoning tip: When GWAS Catalog returns empty for a free-text trait, switch to the parameter — the catalog uses controlled vocabulary and free-text matching is imprecise.
efo_idgwas_search_associationsdisease_traitefo_idrs_idp_valuep_value=5e-8gwas_get_variants_for_traitgwas_get_snps_for_gene推理提示:当GWAS Catalog对自由文本性状返回空结果时,切换为参数——目录使用受控词汇,自由文本匹配不够精确。
efo_idPhase 2: eQTL Analysis
阶段2:eQTL分析
GTEx_query_eqtlWhen interpreting results, ask: does the eQTL effect occur in the tissue most relevant to the disease? A brain-specific eQTL for a neurodegenerative disease variant is more compelling than a ubiquitous one. Use to confirm that the target gene is actually expressed in the relevant tissue before placing weight on eQTL evidence.
GTEx_get_median_gene_expressionNote: GTEx API uses v8 data; gtex_v10 endpoints may return empty for some queries.
GTEx_query_eqtl解读结果时需问:eQTL效应是否发生在与疾病最相关的组织中?神经退行性疾病变异的脑特异性eQTL比普遍存在的eQTL更具说服力。在重视eQTL证据之前,使用确认靶基因确实在相关组织中表达。
GTEx_get_median_gene_expression注意:GTEx API使用v8数据;gtex_v10端点可能对部分查询返回空结果。
Phase 3: Regulatory Element Annotation
阶段3:调控元件注释
RegulomeDB_query_variantrsidENCODE_search_histone_experimentshistone_markbiosample_term_nameassay_title="TF ChIP-seq"Reasoning tip: RegulomeDB aggregates ENCODE, Roadmap, and other data. If ENCODE doesn't have the specific biosample, RegulomeDB may still have aggregate evidence from related cell types.
RegulomeDB_query_variantrsidENCODE_search_histone_experimentshistone_markbiosample_term_nameassay_title="TF ChIP-seq"推理提示:RegulomeDB整合了ENCODE、Roadmap和其他数据。如果ENCODE没有特定生物样本的数据,RegulomeDB可能仍有来自相关细胞类型的聚合证据。
Phase 4: OpenTargets GWAS Integration
阶段4:OpenTargets GWAS整合
OpenTargets_search_gwas_studies_by_diseasediseaseIdsOpenTargets_multi_entity_searchOpenTargets_get_disease_id_description_by_nameOpenTargets_search_gwas_studies_by_diseasediseaseIdsOpenTargets_multi_entity_searchOpenTargets_get_disease_id_description_by_namePhase 5: Functional Impact Synthesis
阶段5:功能影响合成
After collecting evidence, reason through the layers:
- High impact: GWAS genome-wide significant + eQTL with meaningful NES + RegulomeDB score ≤ 2 + active chromatin (H3K27ac) in relevant tissue. Multiple independent lines converge on the same locus and gene.
- Moderate impact: Two to three lines of evidence (e.g., eQTL + active enhancer overlap, or GWAS significant + RegulomeDB ≤ 3) without full convergence.
- Low impact: Single line of evidence, or only computational annotation (VEP consequence category) without functional data.
- No evidence: No regulatory annotations in any source; the variant may be in a non-functional region or the relevant cell type is not represented in available datasets.
收集证据后,逐层推理:
- 高影响:GWAS全基因组显著 + 具有显著NES的eQTL + RegulomeDB评分≤2 + 相关组织中活跃染色质(H3K27ac)。多条独立证据链汇聚于同一基因座和基因。
- 中等影响:两到三条证据链(如eQTL+活跃增强子重叠,或GWAS显著+RegulomeDB≤3)但未完全汇聚。
- 低影响:单一证据链,或仅有机算注释(VEP后果类别)而无功能数据。
- 无证据:所有来源均无调控注释;变异可能位于非功能区域,或相关细胞类型未在可用数据集中体现。
Fallback Strategies
fallback策略
- GWAS Catalog returns empty: Switch from free-text to
disease_trait; broaden the trait term.efo_id - GTEx eQTL empty for gene: Verify gene symbol spelling; try Ensembl ID; increase parameter.
size - RegulomeDB returns no data: Query ENCODE directly; the variant may lack regulatory annotations in available data.
- OpenTargets GWAS returns None: Verify MONDO/EFO ID format; try first to confirm the correct ID.
OpenTargets_multi_entity_search - ENCODE tissue not found: ENCODE uses specific biosample names; RegulomeDB aggregates data from many cell types and may cover the gap.
- GWAS Catalog返回空:从自由文本切换为
disease_trait;扩大性状术语范围。efo_id - GTEx基因eQTL返回空:验证基因符号拼写;尝试Ensembl ID;增大参数。
size - RegulomeDB无数据返回:直接查询ENCODE;该变异可能在可用数据中缺乏调控注释。
- OpenTargets GWAS返回None:验证MONDO/EFO ID格式;先使用确认正确ID。
OpenTargets_multi_entity_search - ENCODE组织未找到:ENCODE使用特定生物样本名称;RegulomeDB整合了多种细胞类型的数据,可能填补空白。
Example Workflows
示例工作流程
GWAS Variant Functional Annotation (rs429358 / APOE)
GWAS变异功能注释(rs429358 / APOE)
Step 1: gwas_search_associations(rs_id="rs429358")
-> All trait associations (Alzheimer's disease, LDL cholesterol, etc.)
Step 2: GTEx_query_eqtl(gene_symbol="APOE")
-> Tissue-specific eQTL evidence; note effect in brain vs liver
Step 3: RegulomeDB_query_variant(rsid="rs429358")
-> Regulatory score and TF binding annotations
Step 4: ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name="brain")
-> Active enhancer context near the variant
Step 5: Synthesize: does GWAS significance + eQTL + active chromatin converge on one gene?步骤1:gwas_search_associations(rs_id="rs429358")
-> 所有性状关联(阿尔茨海默病、LDL胆固醇等)
步骤2:GTEx_query_eqtl(gene_symbol="APOE")
-> 组织特异性eQTL证据;注意大脑与肝脏中的效应差异
步骤3:RegulomeDB_query_variant(rsid="rs429358")
-> 调控评分和转录因子结合注释
步骤4:ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name="brain")
-> 变异附近的活跃增强子背景
步骤5:合成:GWAS显著性+eQTL+活跃染色质是否汇聚于同一基因?Non-Coding Variant Assessment (Intronic/UTR Variant)
非编码变异评估(内含子/UTR变异)
Step 1: EnsemblVEP_annotate_rsid(variant_id="rs12345678")
-> Confirm non-coding consequence, identify nearest gene
Step 2: RegulomeDB_query_variant(rsid="rs12345678")
-> Is this position in a regulatory context?
Step 3: gwas_search_associations(rs_id="rs12345678")
-> Any GWAS associations in relevant traits?
Step 4: GTEx_query_eqtl(gene_symbol=nearest_gene)
-> Does this variant or nearby variants modulate expression?
Step 5: ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name=relevant_tissue)
-> Active chromatin confirmation
Step 6: Classify impact based on convergence of evidence lines步骤1:EnsemblVEP_annotate_rsid(variant_id="rs12345678")
-> 确认非编码后果,识别邻近基因
步骤2:RegulomeDB_query_variant(rsid="rs12345678")
-> 该位置是否处于调控背景中?
步骤3:gwas_search_associations(rs_id="rs12345678")
-> 是否存在相关性状的GWAS关联?
步骤4:GTEx_query_eqtl(gene_symbol=nearest_gene)
-> 该变异或邻近变异是否调控表达?
步骤5:ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name=relevant_tissue)
-> 活跃染色质确认
步骤6:根据证据链的一致性分类影响等级Limitations
局限性
- GWAS Catalog covers published GWAS only; unpublished studies are not included.
- GTEx eQTL data is from v8; v10 endpoints may return empty.
- RegulomeDB annotations depend on available ENCODE/Roadmap data for the specific cell type.
- eQTL analysis identifies correlation, not causation; fine-mapping is needed to identify causal variants.
- RegulomeDB scores are heuristic; a score of 1a does not guarantee functional impact.
- GWAS associations are population-level; individual variant effects depend on genetic background.
- GWAS Catalog仅涵盖已发表的GWAS;未发表研究未被纳入。
- GTEx eQTL数据来自v8;v10端点可能返回空结果。
- RegulomeDB注释依赖于特定细胞类型的可用ENCODE/Roadmap数据。
- eQTL分析识别相关性而非因果关系;需要精细定位来识别因果变异。
- RegulomeDB评分为启发式评分;1a评分不能保证功能影响。
- GWAS关联是群体水平的;个体变异效应取决于遗传背景。