tooluniverse-epigenomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Epigenomics & Gene Regulation Analysis

表观基因组与基因调控分析

Comprehensive analysis of the regulatory genome integrating functional genomics experiments, transcription factor binding data, cis-regulatory element catalogs, chromatin conformation, and variant regulatory scoring. Generates structured regulatory landscape reports with evidence grading.
整合功能基因组学实验、转录因子结合数据、顺式调控元件目录、染色质构象和变异调控评分的调控基因组综合性分析。生成带有证据分级的结构化调控图谱报告。

When to Use This Skill

何时使用该技能

Triggers:
  • "What regulates [gene]?" / "Show the regulatory landscape of [gene]"
  • "What transcription factors bind to [gene/region]?"
  • "Find enhancers near [gene]"
  • "What is the regulatory impact of variant [rsID]?"
  • "Find ENCODE experiments for [histone mark/TF] in [cell type]"
  • "What is the chromatin structure around [gene/region]?"
  • "Analyze the epigenetic regulation of [gene]"
  • "Find transcription factor binding motifs for [TF]"
  • "Regulatory element analysis for [genomic region]"
Use Cases:
  1. Gene Regulatory Landscape: Comprehensive view of all regulatory elements, TF binding, and chromatin around a gene
  2. Transcription Factor Profiling: TF binding motifs (JASPAR), binding sites (ReMap), and target gene identification
  3. Regulatory Variant Interpretation: Assess non-coding variant impact using RegulomeDB, SCREEN, and ENCODE
  4. Functional Genomics Data Discovery: Find ChIP-seq, ATAC-seq, Hi-C experiments from ENCODE and 4DN
  5. Enhancer/Promoter Cataloging: Identify and characterize cis-regulatory elements using SCREEN
  6. Chromatin Conformation: 3D genome organization from 4D Nucleome Hi-C data
  7. Epigenetic Profiling: Histone modification patterns, DNA methylation, chromatin accessibility

触发场景:
  • "[基因]的调控机制是什么?" / "展示[基因]的调控图谱"
  • "哪些转录因子会结合到[基因/区域]?"
  • "寻找[基因]附近的增强子"
  • "变异[rsID]的调控影响是什么?"
  • "查找[细胞类型]中[组蛋白标记/转录因子]的ENCODE实验数据"
  • "[基因/区域]周边的染色质结构是什么样的?"
  • "分析[基因]的表观遗传调控机制"
  • "查找[转录因子]的转录因子结合基序"
  • "[基因组区域]的调控元件分析"
使用场景:
  1. 基因调控图谱: 基因周边所有调控元件、转录因子结合情况和染色质状态的全面视图
  2. 转录因子分析: 转录因子结合基序(JASPAR)、结合位点(ReMap)和靶基因识别
  3. 调控变异解读: 利用RegulomeDB、SCREEN和ENCODE评估非编码变异的影响
  4. 功能基因组学数据发现: 从ENCODE和4DN查找ChIP-seq、ATAC-seq、Hi-C实验数据
  5. 增强子/启动子分类: 利用SCREEN识别并表征顺式调控元件
  6. 染色质构象: 来自4D Nucleome Hi-C数据的3D基因组组织信息
  7. 表观遗传分析: 组蛋白修饰模式、DNA甲基化、染色质可及性

KEY PRINCIPLES

核心原则

  1. Report-first approach - Create report file FIRST, then populate progressively
  2. Tool parameter verification - Verify params via
    get_tool_info
    before calling unfamiliar tools
  3. Evidence grading - Grade all regulatory findings by evidence strength (T1-T4)
  4. Citation requirements - Every finding must have inline source attribution (database, experiment ID)
  5. Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
  6. Gene disambiguation first - Resolve gene symbol/coordinates before analysis
  7. Cell-type context - Always note cell type specificity of regulatory data
  8. Negative results documented - "No enhancers found in region" is data; empty sections are failures
  9. English-first queries - Always use English gene names and standard nomenclature in tool calls

  1. 报告优先原则 - 先创建报告文件,再逐步填充内容
  2. 工具参数验证 - 在调用不熟悉的工具前,通过
    get_tool_info
    验证参数
  3. 证据分级 - 所有调控发现按证据强度分级(T1-T4)
  4. 引用要求 - 每个发现必须有内联来源标注(数据库、实验ID)
  5. 强制完整性 - 所有章节必须存在,至少包含基础数据或明确标注“无数据”
  6. 基因消歧优先 - 分析前先解析基因符号/坐标
  7. 细胞类型上下文 - 始终标注调控数据的细胞类型特异性
  8. 阴性结果记录 - “区域内未发现增强子”属于有效数据;空章节视为失败
  9. 英文查询优先 - 工具调用中始终使用英文基因名称和标准命名法

Evidence Grading System (MANDATORY)

证据分级系统(强制要求)

Grade every regulatory finding by evidence strength:
TierSymbolCriteriaExamples
T1[T1]Direct experimental validation, functional assayCRISPR-validated enhancer, reporter assay, luciferase
T2[T2]High-quality experimental data, curatedENCODE ChIP-seq peak, SCREEN cCRE, ReMap binding site
T3[T3]Computational prediction, motif matchJASPAR motif score, RegulomeDB score, Ensembl regulatory prediction
T4[T4]Association, text-mined, low confidenceLiterature mention, low-score motif match, inferred regulation

所有调控发现按证据强度分级:
等级标识标准示例
T1[T1]直接实验验证、功能测定CRISPR验证的增强子、报告基因实验、荧光素酶实验
T2[T2]高质量实验数据、经过整理ENCODE ChIP-seq峰、SCREEN cCRE、ReMap结合位点
T3[T3]计算预测、基序匹配JASPAR基序评分、RegulomeDB评分、Ensembl调控预测
T4[T4]关联分析、文本挖掘、低置信度文献提及、低评分基序匹配、推断调控关系

Core Strategy: 7 Research Dimensions

核心策略:7个研究维度

Gene / Region / Variant Query
|
+-- PHASE 0: Gene/Region Resolution (ALWAYS FIRST)
|   +-- Resolve gene symbol -> Ensembl ID, coordinates, aliases
|   +-- Define genomic region of interest (+/- 500kb flanking)
|
+-- PHASE 1: Cis-Regulatory Elements (SCREEN)
|   +-- Candidate enhancers, promoters, insulators
|   +-- cCRE activity by cell type
|   +-- CTCF binding sites
|
+-- PHASE 2: Transcription Factor Binding
|   +-- JASPAR: TF binding motifs and PWMs
|   +-- ReMap: ChIP-seq validated TF binding sites
|   +-- ENCODE: TF ChIP-seq experiments
|
+-- PHASE 3: Regulatory Variant Scoring
|   +-- RegulomeDB: Variant regulatory evidence score
|   +-- Functional annotations from multiple data types
|
+-- PHASE 4: ENCODE Functional Genomics
|   +-- Histone modification ChIP-seq
|   +-- ATAC-seq / DNase-seq accessibility
|   +-- RNA-seq expression context
|   +-- Available experiments and datasets
|
+-- PHASE 5: Chromatin Conformation (4D Nucleome)
|   +-- Hi-C contact maps
|   +-- TAD boundaries
|   +-- Chromatin loops and compartments
|
+-- PHASE 6: Ensembl Regulatory Annotation
|   +-- Regulatory build features
|   +-- Promoter/enhancer/CTCF site annotations
|   +-- Activity states across cell types
|
+-- SYNTHESIS: Integrated Regulatory Model
    +-- Aggregate regulatory evidence
    +-- Build gene regulation model
    +-- Identify key regulatory elements and TFs
    +-- Data gaps and experimental recommendations

Gene / Region / Variant Query
|
+-- PHASE 0: Gene/Region Resolution (ALWAYS FIRST)
|   +-- Resolve gene symbol -> Ensembl ID, coordinates, aliases
|   +-- Define genomic region of interest (+/- 500kb flanking)
|
+-- PHASE 1: Cis-Regulatory Elements (SCREEN)
|   +-- Candidate enhancers, promoters, insulators
|   +-- cCRE activity by cell type
|   +-- CTCF binding sites
|
+-- PHASE 2: Transcription Factor Binding
|   +-- JASPAR: TF binding motifs and PWMs
|   +-- ReMap: ChIP-seq validated TF binding sites
|   +-- ENCODE: TF ChIP-seq experiments
|
+-- PHASE 3: Regulatory Variant Scoring
|   +-- RegulomeDB: Variant regulatory evidence score
|   +-- Functional annotations from multiple data types
|
+-- PHASE 4: ENCODE Functional Genomics
|   +-- Histone modification ChIP-seq
|   +-- ATAC-seq / DNase-seq accessibility
|   +-- RNA-seq expression context
|   +-- Available experiments and datasets
|
+-- PHASE 5: Chromatin Conformation (4D Nucleome)
|   +-- Hi-C contact maps
|   +-- TAD boundaries
|   +-- Chromatin loops and compartments
|
+-- PHASE 6: Ensembl Regulatory Annotation
|   +-- Regulatory build features
|   +-- Promoter/enhancer/CTCF site annotations
|   +-- Activity states across cell types
|
+-- SYNTHESIS: Integrated Regulatory Model
    +-- Aggregate regulatory evidence
    +-- Build gene regulation model
    +-- Identify key regulatory elements and TFs
    +-- Data gaps and experimental recommendations

Phase 0: Gene/Region Resolution (ALWAYS FIRST)

阶段0:基因/区域解析(始终优先执行)

CRITICAL: Resolve gene identity and genomic coordinates before any analysis.
关键要求: 任何分析前必须先解析基因身份和基因组坐标。

Input Types Handled

支持的输入类型

Input FormatResolution Strategy
Gene symbol (e.g., "BRCA1")Ensembl lookup -> coordinates, Ensembl ID
Genomic region (e.g., "chr17:43044295-43170245")Use directly; identify overlapping genes
Ensembl ID (e.g., "ENSG00000012048")Ensembl lookup -> symbol, coordinates
rsID (e.g., "rs12345")RegulomeDB/Ensembl -> coordinates, nearby genes
输入格式解析策略
基因符号(例如 "BRCA1")Ensembl查询 -> 坐标、Ensembl ID
基因组区域(例如 "chr17:43044295-43170245")直接使用;识别重叠基因
Ensembl ID(例如 "ENSG00000012048")Ensembl查询 -> 符号、坐标
rsID(例如 "rs12345")RegulomeDB/Ensembl -> 坐标、附近基因

Resolution Tools

解析工具

ToolPurposeParameters
ensembl_lookup_gene
Gene symbol to Ensembl ID + coordinates
id
: str,
species
: str
HGNC_get_gene_info
Official gene symbol, aliases
symbol
: str
ensembl_get_xrefs
Cross-references to external databases
id
: str
工具用途参数
ensembl_lookup_gene
基因符号转Ensembl ID + 坐标
id
: str,
species
: str
HGNC_get_gene_info
官方基因符号、别名
symbol
: str
ensembl_get_xrefs
外部数据库交叉引用
id
: str

Disambiguation Output

消歧输出

markdown
undefined
markdown
undefined

Gene Identity

基因身份信息

PropertyValue
Gene SymbolTP53
Ensembl IDENSG00000141510
Chromosome17
Start7661779
End7687550
Strand-
Region of Interest17:7161779-8187550 (+/- 500kb)
Aliasesp53, TRP53, LFS1

---
属性
基因符号TP53
Ensembl IDENSG00000141510
染色体17
起始位置7661779
终止位置7687550
链方向-
感兴趣区域17:7161779-8187550(上下游各500kb)
别名p53, TRP53, LFS1

---

Phase 1: Cis-Regulatory Elements (SCREEN)

阶段1:顺式调控元件(SCREEN)

When: Gene name or genomic region available
Objective: Catalog candidate cis-regulatory elements (cCREs) from the ENCODE SCREEN database
适用场景: 已获取基因名称或基因组区域
目标: 从ENCODE SCREEN数据库中分类候选顺式调控元件(cCRE)

Tools Used

使用工具

ToolFunctionParameters
SCREEN_get_regulatory_elements
Get cCREs for a gene
gene_name
: str,
element_type
: str,
limit
: int
工具功能参数
SCREEN_get_regulatory_elements
获取基因的cCRE数据
gene_name
: str,
element_type
: str,
limit
: int

Workflow

工作流程

  1. Query enhancers:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="enhancer", limit=20)
  2. Query promoters:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="promoter", limit=20)
  3. Query insulators:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="insulator", limit=10)
  4. For each element: extract coordinates, activity scores, cell type specificity
  1. 查询增强子:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="enhancer", limit=20)
  2. 查询启动子:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="promoter", limit=20)
  3. 查询绝缘子:
    SCREEN_get_regulatory_elements(gene_name=gene, element_type="insulator", limit=10)
  4. 针对每个元件:提取坐标、活性评分、细胞类型特异性

Decision Logic

决策逻辑

  • Multiple element types: Always query enhancers AND promoters (insulators optional)
  • Empty results: Some genes have fewer regulatory elements; note counts
  • Cell type specificity: SCREEN data is cell-type annotated; report top active cell types
  • All findings graded [T2]: SCREEN cCREs are experimentally derived from ENCODE data
  • 多元件类型: 始终查询增强子和启动子(绝缘子为可选)
  • 空结果: 部分基因的调控元件较少;需记录数量
  • 细胞类型特异性: SCREEN数据带有细胞类型注释;报告活性最高的细胞类型
  • 所有发现分级为[T2]: SCREEN cCRE来自ENCODE的实验数据

Output Format

输出格式

markdown
undefined
markdown
undefined

Cis-Regulatory Elements (SCREEN) [T2]

顺式调控元件(SCREEN)[T2]

Enhancers (15 found)

增强子(发现15个)

Element IDCoordinatesActivity ScoreTop Cell Types
EH38E1234567chr17:7650000-76510000.95HepG2, K562
............
元件ID坐标活性评分主要活性细胞类型
EH38E1234567chr17:7650000-76510000.95HepG2, K562
............

Promoters (3 found)

启动子(发现3个)

Element IDCoordinatesActivity ScoreTop Cell Types
EH38E9876543chr17:7687000-76880000.99Ubiquitous
............
元件ID坐标活性评分主要活性细胞类型
EH38E9876543chr17:7687000-76880000.99泛表达
............

Insulators (2 found)

绝缘子(发现2个)

Element IDCoordinatesCTCF Binding
EH38E5555555chr17:7700000-7701000Yes

---
元件ID坐标CTCF结合情况
EH38E5555555chr17:7700000-7701000

---

Phase 2: Transcription Factor Binding

阶段2:转录因子结合

When: Gene symbol available
Objective: Identify transcription factors that regulate the gene through motif analysis and ChIP-seq binding data
适用场景: 已获取基因符号
目标: 通过基序分析和ChIP-seq结合数据识别调控该基因的转录因子

Tools Used

使用工具

JASPAR - TF Binding Motifs

JASPAR - 转录因子结合基序

ToolFunctionParameters
jaspar_search_matrices
Search TF binding motifs
search
: str,
collection
: str,
tax_group
: str,
species
: str
jaspar_get_matrix
Get PWM for specific TF
matrix_id
: str
JASPAR_get_transcription_factors
List TFs in collection
collection
: str,
page
: int,
page_size
: int
工具功能参数
jaspar_search_matrices
搜索转录因子结合基序
search
: str,
collection
: str,
tax_group
: str,
species
: str
jaspar_get_matrix
获取特定转录因子的PWM
matrix_id
: str
JASPAR_get_transcription_factors
列出集合中的转录因子
collection
: str,
page
: int,
page_size
: int

ReMap - Validated TF Binding Sites

ReMap - 验证后的转录因子结合位点

ToolFunctionParameters
ReMap_get_transcription_factor_binding
Get TF binding sites near gene
gene_name
: str,
cell_type
: str,
limit
: int
工具功能参数
ReMap_get_transcription_factor_binding
获取基因附近的转录因子结合位点
gene_name
: str,
cell_type
: str,
limit
: int

ENCODE - ChIP-seq Experiments

ENCODE - ChIP-seq实验数据

ToolFunctionParameters
ENCODE_search_experiments
Search TF ChIP-seq experiments
assay_title
: str,
target
: str,
organism
: str,
limit
: int
工具功能参数
ENCODE_search_experiments
搜索转录因子ChIP-seq实验
assay_title
: str,
target
: str,
organism
: str,
limit
: int

Workflow

工作流程

  1. JASPAR motif search: Search for known TF binding motifs
    • jaspar_search_matrices(search=gene_symbol, collection="CORE", species="9606")
    • If gene IS a TF: get its PWM binding motif
    • If gene is NOT a TF: identify TFs known to bind its promoter
  2. ReMap binding data: Get experimentally validated TF binding sites
    • ReMap_get_transcription_factor_binding(gene_name=gene, cell_type="HepG2", limit=20)
    • Try multiple cell types: "HepG2", "K562", "MCF-7", "GM12878"
  3. ENCODE ChIP-seq: Find available ChIP-seq experiments for key TFs
    • ENCODE_search_experiments(assay_title="ChIP-seq", target=top_tf, organism="Homo sapiens", limit=5)
  1. JASPAR基序搜索: 搜索已知转录因子结合基序
    • jaspar_search_matrices(search=gene_symbol, collection="CORE", species="9606")
    • 如果该基因是转录因子:获取其PWM结合基序
    • 如果该基因不是转录因子:识别已知会结合到其启动子的转录因子
  2. ReMap结合数据: 获取实验验证的转录因子结合位点
    • ReMap_get_transcription_factor_binding(gene_name=gene, cell_type="HepG2", limit=20)
    • 尝试多种细胞类型:"HepG2"、"K562"、"MCF-7"、"GM12878"
  3. ENCODE ChIP-seq: 查找关键转录因子的可用ChIP-seq实验数据
    • ENCODE_search_experiments(assay_title="ChIP-seq", target=top_tf, organism="Homo sapiens", limit=5)

Decision Logic

决策逻辑

  • Gene is a TF: Show its binding motif (JASPAR PWM) + target genes + ENCODE ChIP-seq experiments
  • Gene is NOT a TF: Show TFs that bind its promoter/enhancers (ReMap) + relevant motifs
  • Multiple cell types for ReMap: Query at least 2-3 common cell types
  • JASPAR grades [T3]: Motif predictions are computational
  • ReMap grades [T2]: Based on experimental ChIP-seq data
  • ENCODE grades [T2]: Direct experimental data
  • 基因是转录因子: 展示其结合基序(JASPAR PWM)+ 靶基因 + ENCODE ChIP-seq实验数据
  • 基因不是转录因子: 展示结合到其启动子/增强子的转录因子(ReMap)+ 相关基序
  • ReMap多细胞类型: 至少查询2-3种常见细胞类型
  • JASPAR分级为[T3]: 基序预测属于计算结果
  • ReMap分级为[T2]: 基于实验ChIP-seq数据
  • ENCODE分级为[T2]: 直接实验数据

Output Format

输出格式

markdown
undefined
markdown
undefined

Transcription Factor Binding

转录因子结合情况

JASPAR Binding Motifs [T3]

JASPAR结合基序 [T3]

Matrix IDTF NameScoreSequence Logo
MA0106.3TP530.92RRRCWWGYYY
............
矩阵ID转录因子名称评分序列标识
MA0106.3TP530.92RRRCWWGYYY
............

ReMap ChIP-seq Validated Binding [T2]

ReMap ChIP-seq验证结合位点 [T2]

Transcription FactorCell TypeBinding ScoreCoordinates
SP1HepG2850chr17:7687200-7687500
CTCFK562920chr17:7700100-7700400
............
转录因子细胞类型结合评分坐标
SP1HepG2850chr17:7687200-7687500
CTCFK562920chr17:7700100-7700400
............

ENCODE ChIP-seq Experiments Available [T2]

可用ENCODE ChIP-seq实验 [T2]

ExperimentTargetCell TypeFilesStatus
ENCSR000BNTTP53HepG212released
...............

---
实验靶标细胞类型文件数状态
ENCSR000BNTTP53HepG212已发布
...............

---

Phase 3: Regulatory Variant Scoring

阶段3:调控变异评分

When: rsID or variant provided, OR gene has known regulatory variants
Objective: Assess the regulatory impact of genetic variants in the region
适用场景: 已提供rsID或变异信息,或基因存在已知调控变异
目标: 评估区域内遗传变异的调控影响

Tools Used

使用工具

ToolFunctionParameters
RegulomeDB_query_variant
Get regulatory evidence score for variant
rsid
: str
工具功能参数
RegulomeDB_query_variant
获取变异的调控证据评分
rsid
: str

Workflow

工作流程

  1. If rsID provided: Query RegulomeDB directly
    • RegulomeDB_query_variant(rsid=rsid)
  2. Parse RegulomeDB score (1a-7): lower = more regulatory evidence
  3. Extract supporting evidence types (eQTL, TF binding, chromatin state, etc.)
  4. Cross-reference with SCREEN and ENCODE data from other phases
  1. 如果提供rsID:直接查询RegulomeDB
    • RegulomeDB_query_variant(rsid=rsid)
  2. 解析RegulomeDB评分(1a-7):评分越低,调控证据越充分
  3. 提取支持证据类型(eQTL、转录因子结合、染色质状态等)
  4. 与其他阶段的SCREEN和ENCODE数据进行交叉引用

RegulomeDB Score Interpretation

RegulomeDB评分解读

ScoreMeaningEvidence Level
1aeQTL + TF binding + DNase + motifVery likely regulatory [T2]
1beQTL + TF binding + DNaseLikely regulatory [T2]
1ceQTL + TF binding + motifLikely regulatory [T2]
1deQTL + TF bindingLikely regulatory [T2]
1eeQTL + DNaseLikely regulatory [T3]
1feQTL onlyPossible regulatory [T3]
2a-2cTF binding + DNase/motifLikely affects TF binding [T3]
3a-3bDNase or ChIP-seq evidenceSome evidence [T3]
4-7Minimal or no evidenceLimited evidence [T4]
评分含义证据等级
1aeQTL + 转录因子结合 + DNase + 基序极可能具有调控功能 [T2]
1beQTL + 转录因子结合 + DNase可能具有调控功能 [T2]
1ceQTL + 转录因子结合 + 基序可能具有调控功能 [T2]
1deQTL + 转录因子结合可能具有调控功能 [T2]
1eeQTL + DNase可能具有调控功能 [T3]
1f仅eQTL可能具有调控功能 [T3]
2a-2c转录因子结合 + DNase/基序可能影响转录因子结合 [T3]
3a-3bDNase或ChIP-seq证据有一定证据 [T3]
4-7极少或无证据证据有限 [T4]

Decision Logic

决策逻辑

  • Score 1a-1d: Flag as likely functional regulatory variant; high confidence
  • Score 2a-3b: Moderate evidence; recommend experimental validation
  • Score 4-7: Low regulatory evidence; likely benign regulatory impact
  • No rsID provided: Skip this phase gracefully; note "no variant specified"
  • 评分1a-1d: 标记为可能的功能性调控变异;置信度高
  • 评分2a-3b: 中等证据;建议实验验证
  • 评分4-7: 调控证据低;调控影响可能为良性
  • 未提供rsID: 优雅跳过该阶段;标注“未指定变异”

Output Format

输出格式

markdown
undefined
markdown
undefined

Regulatory Variant Impact [T2/T3]

调控变异影响 [T2/T3]

VariantRegulomeDB ScoreInterpretationEvidence Types
rs123451bLikely regulatoryeQTL, TF binding, DNase
rs678903aSome evidenceDNase peak

---
变异RegulomeDB评分解读证据类型
rs123451b可能具有调控功能eQTL、转录因子结合、DNase
rs678903a有一定证据DNase峰

---

Phase 4: ENCODE Functional Genomics

阶段4:ENCODE功能基因组学

When: Gene or region available
Objective: Discover functional genomics experiments and datasets from ENCODE
适用场景: 已获取基因或区域信息
目标: 从ENCODE发现功能基因组学实验和数据集

Tools Used

使用工具

ToolFunctionParameters
ENCODE_search_experiments
Search experiments by assay/target
assay_title
,
target
,
organism
,
status
,
limit
ENCODE_get_experiment
Get detailed experiment metadata
accession
: str
ENCODE_list_files
List available data files
file_type
,
assay_title
,
limit
ENCODE_search_biosamples
Search available cell types
organism
,
biosample_type
,
treatment
,
limit
工具功能参数
ENCODE_search_experiments
按实验类型/靶标搜索实验
assay_title
,
target
,
organism
,
status
,
limit
ENCODE_get_experiment
获取详细实验元数据
accession
: str
ENCODE_list_files
列出可用数据文件
file_type
,
assay_title
,
limit
ENCODE_search_biosamples
搜索可用细胞类型
organism
,
biosample_type
,
treatment
,
limit

Workflow

工作流程

  1. Histone marks: Search for H3K4me3 (promoter), H3K27ac (enhancer), H3K4me1 (enhancer), H3K27me3 (repressive)
    • ENCODE_search_experiments(assay_title="ChIP-seq", target="H3K27ac", organism="Homo sapiens", limit=5)
  2. Chromatin accessibility: Search ATAC-seq and DNase-seq
    • ENCODE_search_experiments(assay_title="ATAC-seq", organism="Homo sapiens", limit=5)
  3. If gene is a TF: Search for ChIP-seq of that TF
    • ENCODE_search_experiments(assay_title="ChIP-seq", target=gene, organism="Homo sapiens", limit=5)
  4. RNA-seq context: Search for expression experiments
    • ENCODE_search_experiments(assay_title="RNA-seq", organism="Homo sapiens", limit=5)
  1. 组蛋白标记: 搜索H3K4me3(启动子)、H3K27ac(增强子)、H3K4me1(增强子)、H3K27me3(抑制型)
    • ENCODE_search_experiments(assay_title="ChIP-seq", target="H3K27ac", organism="Homo sapiens", limit=5)
  2. 染色质可及性: 搜索ATAC-seq和DNase-seq数据
    • ENCODE_search_experiments(assay_title="ATAC-seq", organism="Homo sapiens", limit=5)
  3. 如果基因是转录因子: 搜索该转录因子的ChIP-seq数据
    • ENCODE_search_experiments(assay_title="ChIP-seq", target=gene, organism="Homo sapiens", limit=5)
  4. RNA-seq上下文: 搜索表达实验数据
    • ENCODE_search_experiments(assay_title="RNA-seq", organism="Homo sapiens", limit=5)

Decision Logic

决策逻辑

  • Prioritize by relevance: Histone marks and accessibility most informative for regulatory analysis
  • Cell type matching: When possible, focus on cell types relevant to user's question
  • Experiment quality: Prefer "released" status and recent experiments
  • Data volume: ENCODE has thousands of experiments; limit results and highlight most relevant
  • All ENCODE data graded [T2]: High-quality experimental data
  • 相关性优先: 组蛋白标记和可及性数据对调控分析最具参考价值
  • 细胞类型匹配: 尽可能聚焦与用户问题相关的细胞类型
  • 实验质量: 优先选择“已发布”状态和近期实验
  • 数据量: ENCODE拥有数千个实验;限制结果数量,重点展示最相关的数据
  • 所有ENCODE数据分级为[T2]: 高质量实验数据

Output Format

输出格式

markdown
undefined
markdown
undefined

ENCODE Functional Genomics [T2]

ENCODE功能基因组学 [T2]

Histone Modification Experiments

组蛋白修饰实验

ExperimentMarkCell TypeStatusFiles
ENCSR000AKPH3K27acHepG2released8
ENCSR000ALAH3K4me3K562released6
实验标记细胞类型状态文件数
ENCSR000AKPH3K27acHepG2已发布8
ENCSR000ALAH3K4me3K562已发布6

Chromatin Accessibility

染色质可及性

ExperimentAssayCell TypeStatus
ENCSR889WQXATAC-seqGM12878released
实验实验类型细胞类型状态
ENCSR889WQXATAC-seqGM12878已发布

TF ChIP-seq (for [gene] if TF)

转录因子ChIP-seq(若基因为转录因子)

ExperimentTargetCell TypeStatus
ENCSR000BNTTP53HepG2released

---
实验靶标细胞类型状态
ENCSR000BNTTP53HepG2已发布

---

Phase 5: Chromatin Conformation (4D Nucleome)

阶段5:染色质构象(4D Nucleome)

When: Gene or region available
Objective: Explore 3D genome organization data from the 4D Nucleome project
适用场景: 已获取基因或区域信息
目标: 探索4D Nucleome项目的3D基因组组织数据

Tools Used

使用工具

ToolFunctionParameters
FourDN_search_data
Search Hi-C data
operation
: "search_data",
assay_title
,
biosource_name
,
limit
FourDN_get_experiment_metadata
Get experiment details
operation
: "get_experiment_metadata",
experiment_accession
: str
工具功能参数
FourDN_search_data
搜索Hi-C数据
operation
: "search_data",
assay_title
,
biosource_name
,
limit
FourDN_get_experiment_metadata
获取实验详情
operation
: "get_experiment_metadata",
experiment_accession
: str

Workflow

工作流程

  1. Search Hi-C experiments:
    FourDN_search_data(operation="search_data", assay_title="Hi-C", limit=10)
  2. Search Micro-C data:
    FourDN_search_data(operation="search_data", assay_title="Micro-C", limit=5)
  3. For relevant experiments: get metadata for top results
  4. Note available cell types and data types
  1. 搜索Hi-C实验:
    FourDN_search_data(operation="search_data", assay_title="Hi-C", limit=10)
  2. 搜索Micro-C数据:
    FourDN_search_data(operation="search_data", assay_title="Micro-C", limit=5)
  3. 针对相关实验:获取排名靠前结果的元数据
  4. 记录可用细胞类型和数据类型

Decision Logic

决策逻辑

  • IMPORTANT: 4DN tools require
    operation
    parameter
    - This is a SOAP-style tool
  • Hi-C vs Micro-C: Micro-C has higher resolution for local interactions
  • Cell type matching: Note which cell types have chromatin data
  • Data availability: 4DN may not cover all cell types of interest
  • Grade [T2]: High-quality experimental chromatin conformation data
  • 重要提示:4DN工具需要
    operation
    参数
    - 这是SOAP风格的工具
  • Hi-C vs Micro-C: Micro-C对局部相互作用的分辨率更高
  • 细胞类型匹配: 记录哪些细胞类型有染色质数据
  • 数据可用性: 4DN可能未覆盖所有感兴趣的细胞类型
  • 分级为[T2]: 高质量实验染色质构象数据

Output Format

输出格式

markdown
undefined
markdown
undefined

Chromatin Conformation (4D Nucleome) [T2]

染色质构象(4D Nucleome)[T2]

Available Hi-C Datasets

可用Hi-C数据集

ExperimentCell TypeAssayResolutionStatus
4DNESXXXXXXXH1-hESCHi-C10kbreleased
4DNESYYYYYYYGM12878Micro-C1kbreleased
实验细胞类型实验类型分辨率状态
4DNESXXXXXXXH1-hESCHi-C10kb已发布
4DNESYYYYYYYGM12878Micro-C1kb已发布

Chromatin Organization Context

染色质组织上下文

  • TAD: Gene located within TAD spanning chr17:7.1-8.2Mb
  • Compartment: A compartment (active)
  • Nearby CTCF sites: 3 CTCF sites within 100kb (from SCREEN Phase 1)

---
  • TAD: 基因位于TAD区域内,范围为chr17:7.1-8.2Mb
  • 染色质区室: A区室(活性区)
  • 附近CTCF位点: 100kb范围内有3个CTCF位点(来自阶段1的SCREEN数据)

---

Phase 6: Ensembl Regulatory Annotation

阶段6:Ensembl调控注释

When: Genomic region coordinates available
Objective: Get regulatory feature annotations from the Ensembl Regulatory Build
适用场景: 已获取基因组区域坐标
目标: 从Ensembl调控构建中获取调控特征注释

Tools Used

使用工具

ToolFunctionParameters
ensembl_get_regulatory_features
Get regulatory features in region
region
: str (chr:start-end),
feature
: str,
species
: str
工具功能参数
ensembl_get_regulatory_features
获取区域内的调控特征
region
: str(chr:start-end),
feature
: str,
species
: str

Workflow

工作流程

  1. Get regulatory features:
    ensembl_get_regulatory_features(region="17:7661779-7687550", feature="regulatory", species="human")
  2. Parse feature types: promoter, enhancer, CTCF_binding_site, TF_binding_site, open_chromatin_region
  3. Note activity states across cell types when available
  1. 获取调控特征:
    ensembl_get_regulatory_features(region="17:7661779-7687550", feature="regulatory", species="human")
  2. 解析特征类型:启动子、增强子、CTCF结合位点、转录因子结合位点、开放染色质区域
  3. 记录跨细胞类型的活性状态(若有)

Decision Logic

决策逻辑

  • Region format: Use chromosome:start-end without "chr" prefix
  • Feature parameter: Must be "regulatory" for this endpoint
  • Cross-reference with SCREEN: Compare Ensembl regulatory build with SCREEN cCREs
  • Grade [T3]: Ensembl regulatory build is computationally derived
  • 区域格式: 使用chromosome:start-end格式,不带"chr"前缀
  • Feature参数: 该端点必须设置为"regulatory"
  • 与SCREEN交叉引用: 对比Ensembl调控构建与SCREEN cCRE数据
  • 分级为[T3]: Ensembl调控构建为计算推导结果

Output Format

输出格式

markdown
undefined
markdown
undefined

Ensembl Regulatory Build [T3]

Ensembl调控构建 [T3]

Feature IDTypeCoordinatesActivity State
ENSR00000123456Promoter17:7687200-7688000Active (most cell types)
ENSR00000789012Enhancer17:7650000-7651500Active (liver, lung)
ENSR00000345678CTCF_binding_site17:7700000-7700500Active

---
特征ID类型坐标活性状态
ENSR00000123456启动子17:7687200-7688000活性(大多数细胞类型)
ENSR00000789012增强子17:7650000-7651500活性(肝脏、肺脏)
ENSR00000345678CTCF结合位点17:7700000-7700500活性

---

Synthesis: Integrated Regulatory Model (MANDATORY)

整合:调控模型(强制要求)

Always the final section. Integrates all evidence into a coherent regulatory model.
必须作为最后一节。将所有整合成连贯的调控模型。

Synthesis Template

整合模板

markdown
undefined
markdown
undefined

Integrated Regulatory Model

整合调控模型

Regulatory Architecture Summary

调控架构摘要

Gene: [GENE] ([Ensembl ID]) Region analyzed: [coordinates] ([size]kb)
基因: [GENE] ([Ensembl ID]) 分析区域: [坐标](大小[size]kb)

Key Regulatory Elements

关键调控元件

  1. Proximal promoter [T2/T3]: Located at [coords], active in [cell types]
    • TFs binding: SP1, CTCF, [others from ReMap]
    • Histone marks: H3K4me3 (ENCODE), H3K27ac (ENCODE)
    • SCREEN cCRE: [element ID]
  2. Distal enhancer 1 [T2]: Located at [coords], [distance] from TSS
    • Active in [cell types] (SCREEN)
    • TF binding: [TFs from ReMap/ENCODE]
    • Hi-C contact with promoter: [Yes/No/Unknown]
  3. CTCF insulator [T2]: Located at [coords]
    • Defines TAD boundary
    • CTCF motif score: [from JASPAR]
  1. 近端启动子 [T2/T3]: 位于[坐标],在[细胞类型]中具有活性
    • 结合的转录因子: SP1、CTCF、[来自ReMap的其他因子]
    • 组蛋白标记: H3K4me3(ENCODE)、H3K27ac(ENCODE)
    • SCREEN cCRE: [元件ID]
  2. 远端增强子1 [T2]: 位于[坐标],距TSS[距离]
    • 在[细胞类型]中具有活性(SCREEN)
    • 结合的转录因子: [来自ReMap/ENCODE的转录因子]
    • 与启动子的Hi-C互作: 是/否/未知
  3. CTCF绝缘子 [T2]: 位于[坐标]
    • 定义TAD边界
    • CTCF基序评分: [来自JASPAR]

Transcription Factor Regulatory Network

转录因子调控网络

TFBinding EvidenceMotif MatchCell TypesRole
SP1ReMap ChIP-seq [T2]JASPAR 0.92 [T3]HepG2, K562Activator
CTCFENCODE ChIP-seq [T2]JASPAR 0.98 [T3]UbiquitousInsulator
转录因子结合证据基序匹配细胞类型作用
SP1ReMap ChIP-seq [T2]JASPAR 0.92 [T3]HepG2、K562激活子
CTCFENCODE ChIP-seq [T2]JASPAR 0.98 [T3]泛表达绝缘子

Regulatory Variants (if applicable)

调控变异(如适用)

VariantRegulomeDB ScoreRegulatory ImpactAffected Element
rs123451bDisrupts SP1 bindingProximal promoter
变异RegulomeDB评分调控影响受影响元件
rs123451b干扰SP1结合近端启动子

Evidence Quality Assessment

证据质量评估

DimensionData AvailableEvidence TierConfidence
cCREs (SCREEN)15 enhancers, 3 promoters[T2]High
TF Binding (ReMap)8 TFs validated[T2]High
Motifs (JASPAR)12 motif matches[T3]Medium
ENCODE experiments25 relevant datasets[T2]High
Chromatin (4DN)Hi-C in 3 cell types[T2]Medium
Regulatory Build5 features annotated[T3]Medium
维度可用数据证据等级置信度
cCREs(SCREEN)15个增强子、3个启动子[T2]
转录因子结合(ReMap)8个已验证转录因子[T2]
基序(JASPAR)12个基序匹配[T3]
ENCODE实验25个相关数据集[T2]
染色质(4DN)3种细胞类型的Hi-C数据[T2]
调控构建5个注释特征[T3]

Data Gaps

数据缺口

  • No single-cell ATAC-seq data available for this region
  • Chromatin conformation data limited to 3 cell types
  • No CRISPR-validated enhancers (would be needed for [T1])
  • Regulatory variant impact is predictive (needs experimental validation)
  • 该区域无单细胞ATAC-seq数据
  • 染色质构象数据仅覆盖3种细胞类型
  • 无CRISPR验证的增强子(需要此数据才能达到[T1]级)
  • 调控变异影响为预测结果(需实验验证)

Experimental Recommendations

实验建议

  1. Validate key enhancers: CRISPR deletion or reporter assays for top 3 enhancers
  2. Confirm TF binding: ChIP-qPCR for SP1, CTCF at predicted sites
  3. Test regulatory variants: Allele-specific reporter assays for rs12345

---
  1. 验证关键增强子: 对排名前三的增强子进行CRISPR敲除或报告基因实验
  2. 确认转录因子结合: 在预测位点对SP1、CTCF进行ChIP-qPCR实验
  3. 测试调控变异: 对rs12345进行等位基因特异性报告基因实验

---

Mandatory Completeness Checklist

强制完整性检查清单

Before finalizing any report, verify:
  • Phase 0: Gene/region fully resolved (symbol, Ensembl ID, coordinates)
  • Phase 1: SCREEN queried for enhancers AND promoters (counts reported)
  • Phase 2: At least 2 TF data sources queried (JASPAR + ReMap or ENCODE)
  • Phase 3: RegulomeDB queried for variants OR "no variant specified" noted
  • Phase 4: At least 2 ENCODE assay types searched (histone marks + accessibility)
  • Phase 5: 4DN queried for Hi-C/Micro-C data OR "no chromatin data" noted
  • Phase 6: Ensembl regulatory build queried OR "no regulatory features" noted
  • Synthesis: Regulatory model provided with element catalog and TF network
  • Evidence Grading: All findings have [T1]-[T4] annotations
  • Cell-type context: Cell type specificity noted for all binding/activity data
  • Data gaps: Explicitly listed in synthesis section

在最终确定报告前,需验证:
  • 阶段0: 基因/区域已完全解析(符号、Ensembl ID、坐标)
  • 阶段1: 已查询SCREEN的增强子和启动子(已报告数量)
  • 阶段2: 至少查询2种转录因子数据源(JASPAR + ReMap或ENCODE)
  • 阶段3: 已查询RegulomeDB的变异数据,或标注“未指定变异”
  • 阶段4: 至少搜索2种ENCODE实验类型(组蛋白标记 + 可及性)
  • 阶段5: 已查询4DN的Hi-C/Micro-C数据,或标注“无染色质数据”
  • 阶段6: 已查询Ensembl调控构建,或标注“无调控特征”
  • 整合: 已提供包含元件目录和转录因子网络的调控模型
  • 证据分级: 所有发现均带有[T1]-[T4]标注
  • 细胞类型上下文: 所有结合/活性数据均标注了细胞类型特异性
  • 数据缺口: 在整合部分明确列出

Tool Parameter Reference

工具参数参考

Critical Parameter Notes (verified from source code):
ToolParameter NameTypeNotes
SCREEN_get_regulatory_elements
gene_name
,
element_type
,
limit
str, str, intelement_type: "enhancer", "promoter", "insulator"
ReMap_get_transcription_factor_binding
gene_name
,
cell_type
,
limit
str, str, intcell_type default: "HepG2"
RegulomeDB_query_variant
rsid
strrsID format (e.g., "rs12345")
jaspar_search_matrices
search
,
name
,
collection
,
tax_group
,
species
str (all optional)species="9606" for human
jaspar_get_matrix
matrix_id
strJASPAR matrix ID (e.g., "MA0106.3")
JASPAR_get_transcription_factors
collection
,
page
,
page_size
str, int, intcollection="CORE" default
ENCODE_search_experiments
assay_title
,
target
,
organism
,
status
,
limit
str (all optional)status="released" default
ENCODE_get_experiment
accession
strENCODE accession (e.g., "ENCSR000BNT")
ENCODE_list_files
file_type
,
assay_title
,
limit
str, str, intAll optional
ENCODE_search_biosamples
organism
,
biosample_type
,
treatment
,
limit
str (all optional)
FourDN_search_data
operation
,
query
,
item_type
,
assay_title
,
biosource_name
,
limit
operation REQUIREDoperation="search_data"
FourDN_get_experiment_metadata
operation
,
experiment_accession
operation REQUIREDoperation="get_experiment_metadata"
ensembl_get_regulatory_features
region
,
feature
,
species
str, str, strfeature="regulatory", region="17:start-end"
关键参数说明(来自源代码验证):
工具参数名称类型说明
SCREEN_get_regulatory_elements
gene_name
,
element_type
,
limit
str, str, intelement_type可选值: "enhancer", "promoter", "insulator"
ReMap_get_transcription_factor_binding
gene_name
,
cell_type
,
limit
str, str, intcell_type默认值: "HepG2"
RegulomeDB_query_variant
rsid
strrsID格式(例如 "rs12345")
jaspar_search_matrices
search
,
name
,
collection
,
tax_group
,
species
str(均为可选)人类物种设置为"9606"
jaspar_get_matrix
matrix_id
strJASPAR矩阵ID(例如 "MA0106.3")
JASPAR_get_transcription_factors
collection
,
page
,
page_size
str, int, intcollection默认值为"CORE"
ENCODE_search_experiments
assay_title
,
target
,
organism
,
status
,
limit
str(均为可选)status默认值为"released"
ENCODE_get_experiment
accession
strENCODE登录号(例如 "ENCSR000BNT")
ENCODE_list_files
file_type
,
assay_title
,
limit
str, str, int均为可选
ENCODE_search_biosamples
organism
,
biosample_type
,
treatment
,
limit
str(均为可选)
FourDN_search_data
operation
,
query
,
item_type
,
assay_title
,
biosource_name
,
limit
operation为必填项operation="search_data"
FourDN_get_experiment_metadata
operation
,
experiment_accession
operation为必填项operation="get_experiment_metadata"
ensembl_get_regulatory_features
region
,
feature
,
species
str, str, strfeature="regulatory", region格式为"17:start-end"

CRITICAL: SOAP-style Tools

关键提示:SOAP风格工具

The following tools require an
operation
parameter:
  • FourDN_search_data:
    operation="search_data"
  • FourDN_get_experiment_metadata:
    operation="get_experiment_metadata"
  • FourDN_get_file_metadata:
    operation="get_file_metadata"
  • FourDN_get_download_url:
    operation="get_download_url"
以下工具必须包含
operation
参数:
  • FourDN_search_data:
    operation="search_data"
  • FourDN_get_experiment_metadata:
    operation="get_experiment_metadata"
  • FourDN_get_file_metadata:
    operation="get_file_metadata"
  • FourDN_get_download_url:
    operation="get_download_url"

Response Format Notes (verified from testing)

响应格式说明(来自测试验证)

  • SCREEN: Returns dict with
    @context
    ,
    @graph
    ,
    @id
    ,
    @type
    ,
    all
    keys (JSON-LD format)
  • ReMap: Returns dict with TF binding records
  • RegulomeDB: Returns
    {status, data, url}
    with regulatory score and evidence in
    data
  • JASPAR search: Returns
    {count, next, previous, results}
    with matrix objects in
    results
  • JASPAR get_matrix: Returns dict with matrix details (name, PFM, sequence logo)
  • ENCODE: Returns dict with experiment/file objects (structure varies by endpoint)
  • 4DN: Returns dict with search results
  • Ensembl: Returns
    {status, data, url, content_type}
    with regulatory features in
    data

  • SCREEN: 返回包含
    @context
    ,
    @graph
    ,
    @id
    ,
    @type
    ,
    all
    键的字典(JSON-LD格式)
  • ReMap: 返回包含转录因子结合记录的字典
  • RegulomeDB: 返回
    {status, data, url}
    ,其中
    data
    包含调控评分和证据
  • JASPAR搜索: 返回
    {count, next, previous, results}
    results
    包含矩阵对象
  • JASPAR get_matrix: 返回包含矩阵详情的字典(名称、PFM、序列标识)
  • ENCODE: 返回包含实验/文件对象的字典(结构因端点而异)
  • 4DN: 返回包含搜索结果的字典
  • Ensembl: 返回
    {status, data, url, content_type}
    data
    包含调控特征

Fallback Strategies

备选策略

Regulatory Elements

调控元件

  • Primary: SCREEN cCREs by gene name
  • Fallback: Ensembl Regulatory Build by coordinates
  • If both empty: Note "limited regulatory annotation in this region"
  • 首选: 按基因名称查询SCREEN cCRE
  • 备选: 按坐标查询Ensembl调控构建
  • 若两者均为空: 标注“该区域调控注释有限”

TF Binding

转录因子结合

  • Primary: ReMap binding sites + JASPAR motifs
  • Fallback: ENCODE ChIP-seq experiments
  • If all empty: Gene may have limited TF binding data; note and continue
  • 首选: ReMap结合位点 + JASPAR基序
  • 备选: ENCODE ChIP-seq实验
  • 若全部为空: 该基因可能转录因子结合数据有限;标注后继续分析

Chromatin Data

染色质数据

  • Primary: 4DN Hi-C experiments
  • Fallback: ENCODE Hi-C experiments
  • If empty: Note "no chromatin conformation data available for this region"
  • 首选: 4DN Hi-C实验
  • 备选: ENCODE Hi-C实验
  • 若为空: 标注“该区域无染色质构象数据”

Variant Scoring

变异评分

  • Primary: RegulomeDB for rsID
  • Fallback: SCREEN + ENCODE overlap analysis at variant position
  • If no variant: Skip gracefully

  • 首选: 按rsID查询RegulomeDB
  • 备选: 在变异位置进行SCREEN + ENCODE重叠分析
  • 若无变异信息: 优雅跳过

Common Use Patterns

常见使用模式

Pattern 1: Gene-Centric Regulatory Landscape

模式1:基因为中心的调控图谱

Input: Gene symbol (e.g., "TP53")
Workflow: All phases (0-6 + Synthesis)
Output: Complete regulatory atlas for the gene locus
输入: 基因符号(例如 "TP53")
工作流程: 所有阶段(0-6 + 整合)
输出: 基因位点的完整调控图谱

Pattern 2: Transcription Factor Target Analysis

模式2:转录因子靶标分析

Input: TF name (e.g., "CTCF")
Workflow: Phase 0 -> Phase 2 (JASPAR motif + ENCODE ChIP-seq) -> Phase 1 (target gene cCREs)
Output: TF binding motif, genome-wide binding data, target gene catalog
输入: 转录因子名称(例如 "CTCF")
工作流程: 阶段0 -> 阶段2(JASPAR基序 + ENCODE ChIP-seq)-> 阶段1(靶基因cCRE)
输出: 转录因子结合基序、全基因组结合数据、靶基因目录

Pattern 3: Non-Coding Variant Interpretation

模式3:非编码变异解读

Input: rsID (e.g., "rs6983267")
Workflow: Phase 0 -> Phase 3 (RegulomeDB) -> Phase 1 (nearby cCREs) -> Phase 2 (TF binding) -> Synthesis
Output: Regulatory impact assessment with functional context
输入: rsID(例如 "rs6983267")
工作流程: 阶段0 -> 阶段3(RegulomeDB)-> 阶段1(附近cCRE)-> 阶段2(转录因子结合)-> 整合
输出: 带有功能上下文的调控影响评估

Pattern 4: Cell-Type Specific Regulation

模式4:细胞类型特异性调控

Input: Gene + cell type (e.g., "MYC in HepG2")
Workflow: Phase 0 -> Phase 1 (SCREEN) -> Phase 2 (ReMap in HepG2) -> Phase 4 (ENCODE in HepG2)
Output: Cell-type specific regulatory landscape
输入: 基因 + 细胞类型(例如 "MYC in HepG2")
工作流程: 阶段0 -> 阶段1(SCREEN)-> 阶段2(HepG2的ReMap数据)-> 阶段4(HepG2的ENCODE数据)
输出: 细胞类型特异性调控图谱

Pattern 5: Epigenetic Data Discovery

模式5:表观遗传数据发现

Input: Histone mark or assay type (e.g., "H3K27ac ChIP-seq in liver")
Workflow: Phase 4 (ENCODE search) -> Phase 5 (4DN chromatin) -> Summary
Output: Available datasets and download information

输入: 组蛋白标记或实验类型(例如 "肝脏中的H3K27ac ChIP-seq")
工作流程: 阶段4(ENCODE搜索)-> 阶段5(4DN染色质数据)-> 摘要
输出: 可用数据集和下载信息

Limitations & Known Issues

局限性与已知问题

Database-Specific

数据库相关

  • SCREEN: Limited to ENCODE-defined cCREs; may miss tissue-specific regulatory elements
  • JASPAR: Motif predictions have false positive rate; binding =/= function
  • ReMap: Coverage varies by TF and cell type; ~1000 TFs covered
  • RegulomeDB: Scoring based on available data; novel variants may lack evidence
  • ENCODE: Primarily human and mouse; limited other organisms
  • 4DN: Focused on chromatin conformation; limited cell type coverage
  • Ensembl: Regulatory build is computationally predicted; may miss novel elements
  • SCREEN: 仅限于ENCODE定义的cCRE;可能遗漏组织特异性调控元件
  • JASPAR: 基序预测存在假阳性率;结合不代表功能
  • ReMap: 覆盖范围因转录因子和细胞类型而异;约覆盖1000种转录因子
  • RegulomeDB: 评分基于现有数据;新型变异可能缺乏证据
  • ENCODE: 主要覆盖人类和小鼠;其他物种数据有限
  • 4DN: 聚焦染色质构象;细胞类型覆盖有限
  • Ensembl: 调控构建为计算预测结果;可能遗漏新型元件

Analysis

分析相关

  • Cell-type specificity: Regulatory elements are highly cell-type specific; data from one cell type may not generalize
  • Functional validation gap: Most findings are [T2]-[T3]; [T1] validation requires experimental follow-up
  • Non-coding complexity: Regulatory mechanisms are complex; catalog does not capture all interactions
  • 3D genome: TAD and loop data available for limited cell types
  • 细胞类型特异性: 调控元件具有高度细胞类型特异性;单一细胞类型的数据可能不具有通用性
  • 功能验证缺口: 大多数发现为[T2]-[T3]级;[T1]级验证需要后续实验
  • 非编码区复杂性: 调控机制复杂;目录无法涵盖所有相互作用
  • 3D基因组: TAD和环数据仅覆盖有限细胞类型

Technical

技术相关

  • 4DN operation parameter: Must include
    operation
    for all 4DN tools (SOAP-style)
  • Region format: Ensembl uses "17:start-end" (no "chr" prefix); SCREEN/ENCODE may use "chr17:start-end"
  • Large gene loci: Genes spanning >1Mb may require multiple queries

  • 4DN operation参数: 所有4DN工具必须包含
    operation
    参数(SOAP风格)
  • 区域格式: Ensembl使用"17:start-end"格式(不带"chr"前缀);SCREEN/ENCODE可能使用"chr17:start-end"
  • 大基因位点: 跨度>1Mb的基因可能需要多次查询

Summary

总结

Epigenomics & Gene Regulation Skill provides comprehensive regulatory landscape analysis by integrating:
  1. Cis-regulatory elements (SCREEN) - Enhancers, promoters, insulators from ENCODE cCRE catalog
  2. Transcription factor binding (JASPAR + ReMap + ENCODE) - Motifs, validated binding sites, ChIP-seq data
  3. Regulatory variant scoring (RegulomeDB) - Evidence-based variant regulatory impact
  4. Functional genomics (ENCODE) - Histone marks, chromatin accessibility, expression
  5. Chromatin conformation (4D Nucleome) - Hi-C, TADs, chromatin loops
  6. Regulatory annotation (Ensembl) - Computational regulatory build features
Outputs: Structured markdown report with regulatory element catalog, TF network, variant scoring, and integrated regulatory model
Best for: Gene regulation analysis, non-coding variant interpretation, enhancer/promoter identification, TF binding profiling, epigenetic data discovery
Total tools integrated: 21 tools across 7 databases
表观基因组与基因调控技能通过整合以下数据提供全面的调控图谱分析:
  1. 顺式调控元件(SCREEN)- 来自ENCODE cCRE目录的增强子、启动子、绝缘子
  2. 转录因子结合(JASPAR + ReMap + ENCODE)- 基序、验证结合位点、ChIP-seq数据
  3. 调控变异评分(RegulomeDB)- 基于证据的变异调控影响评估
  4. 功能基因组学(ENCODE)- 组蛋白标记、染色质可及性、表达数据
  5. 染色质构象(4D Nucleome)- Hi-C、TAD、染色质环
  6. 调控注释(Ensembl)- 计算推导的调控构建特征
输出: 结构化Markdown报告,包含调控元件目录、转录因子网络、变异评分和整合调控模型
最佳适用场景: 基因调控分析、非编码变异解读、增强子/启动子识别、转录因子结合分析、表观遗传数据发现
整合工具总数: 7个数据库中的21个工具