tooluniverse-epigenomics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEpigenomics & Gene Regulation Analysis
表观基因组与基因调控分析
Comprehensive analysis of the regulatory genome integrating functional genomics experiments, transcription factor binding data, cis-regulatory element catalogs, chromatin conformation, and variant regulatory scoring. Generates structured regulatory landscape reports with evidence grading.
整合功能基因组学实验、转录因子结合数据、顺式调控元件目录、染色质构象和变异调控评分的调控基因组综合性分析。生成带有证据分级的结构化调控图谱报告。
When to Use This Skill
何时使用该技能
Triggers:
- "What regulates [gene]?" / "Show the regulatory landscape of [gene]"
- "What transcription factors bind to [gene/region]?"
- "Find enhancers near [gene]"
- "What is the regulatory impact of variant [rsID]?"
- "Find ENCODE experiments for [histone mark/TF] in [cell type]"
- "What is the chromatin structure around [gene/region]?"
- "Analyze the epigenetic regulation of [gene]"
- "Find transcription factor binding motifs for [TF]"
- "Regulatory element analysis for [genomic region]"
Use Cases:
- Gene Regulatory Landscape: Comprehensive view of all regulatory elements, TF binding, and chromatin around a gene
- Transcription Factor Profiling: TF binding motifs (JASPAR), binding sites (ReMap), and target gene identification
- Regulatory Variant Interpretation: Assess non-coding variant impact using RegulomeDB, SCREEN, and ENCODE
- Functional Genomics Data Discovery: Find ChIP-seq, ATAC-seq, Hi-C experiments from ENCODE and 4DN
- Enhancer/Promoter Cataloging: Identify and characterize cis-regulatory elements using SCREEN
- Chromatin Conformation: 3D genome organization from 4D Nucleome Hi-C data
- Epigenetic Profiling: Histone modification patterns, DNA methylation, chromatin accessibility
触发场景:
- "[基因]的调控机制是什么?" / "展示[基因]的调控图谱"
- "哪些转录因子会结合到[基因/区域]?"
- "寻找[基因]附近的增强子"
- "变异[rsID]的调控影响是什么?"
- "查找[细胞类型]中[组蛋白标记/转录因子]的ENCODE实验数据"
- "[基因/区域]周边的染色质结构是什么样的?"
- "分析[基因]的表观遗传调控机制"
- "查找[转录因子]的转录因子结合基序"
- "[基因组区域]的调控元件分析"
使用场景:
- 基因调控图谱: 基因周边所有调控元件、转录因子结合情况和染色质状态的全面视图
- 转录因子分析: 转录因子结合基序(JASPAR)、结合位点(ReMap)和靶基因识别
- 调控变异解读: 利用RegulomeDB、SCREEN和ENCODE评估非编码变异的影响
- 功能基因组学数据发现: 从ENCODE和4DN查找ChIP-seq、ATAC-seq、Hi-C实验数据
- 增强子/启动子分类: 利用SCREEN识别并表征顺式调控元件
- 染色质构象: 来自4D Nucleome Hi-C数据的3D基因组组织信息
- 表观遗传分析: 组蛋白修饰模式、DNA甲基化、染色质可及性
KEY PRINCIPLES
核心原则
- Report-first approach - Create report file FIRST, then populate progressively
- Tool parameter verification - Verify params via before calling unfamiliar tools
get_tool_info - Evidence grading - Grade all regulatory findings by evidence strength (T1-T4)
- Citation requirements - Every finding must have inline source attribution (database, experiment ID)
- Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
- Gene disambiguation first - Resolve gene symbol/coordinates before analysis
- Cell-type context - Always note cell type specificity of regulatory data
- Negative results documented - "No enhancers found in region" is data; empty sections are failures
- English-first queries - Always use English gene names and standard nomenclature in tool calls
- 报告优先原则 - 先创建报告文件,再逐步填充内容
- 工具参数验证 - 在调用不熟悉的工具前,通过验证参数
get_tool_info - 证据分级 - 所有调控发现按证据强度分级(T1-T4)
- 引用要求 - 每个发现必须有内联来源标注(数据库、实验ID)
- 强制完整性 - 所有章节必须存在,至少包含基础数据或明确标注“无数据”
- 基因消歧优先 - 分析前先解析基因符号/坐标
- 细胞类型上下文 - 始终标注调控数据的细胞类型特异性
- 阴性结果记录 - “区域内未发现增强子”属于有效数据;空章节视为失败
- 英文查询优先 - 工具调用中始终使用英文基因名称和标准命名法
Evidence Grading System (MANDATORY)
证据分级系统(强制要求)
Grade every regulatory finding by evidence strength:
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | [T1] | Direct experimental validation, functional assay | CRISPR-validated enhancer, reporter assay, luciferase |
| T2 | [T2] | High-quality experimental data, curated | ENCODE ChIP-seq peak, SCREEN cCRE, ReMap binding site |
| T3 | [T3] | Computational prediction, motif match | JASPAR motif score, RegulomeDB score, Ensembl regulatory prediction |
| T4 | [T4] | Association, text-mined, low confidence | Literature mention, low-score motif match, inferred regulation |
所有调控发现按证据强度分级:
| 等级 | 标识 | 标准 | 示例 |
|---|---|---|---|
| T1 | [T1] | 直接实验验证、功能测定 | CRISPR验证的增强子、报告基因实验、荧光素酶实验 |
| T2 | [T2] | 高质量实验数据、经过整理 | ENCODE ChIP-seq峰、SCREEN cCRE、ReMap结合位点 |
| T3 | [T3] | 计算预测、基序匹配 | JASPAR基序评分、RegulomeDB评分、Ensembl调控预测 |
| T4 | [T4] | 关联分析、文本挖掘、低置信度 | 文献提及、低评分基序匹配、推断调控关系 |
Core Strategy: 7 Research Dimensions
核心策略:7个研究维度
Gene / Region / Variant Query
|
+-- PHASE 0: Gene/Region Resolution (ALWAYS FIRST)
| +-- Resolve gene symbol -> Ensembl ID, coordinates, aliases
| +-- Define genomic region of interest (+/- 500kb flanking)
|
+-- PHASE 1: Cis-Regulatory Elements (SCREEN)
| +-- Candidate enhancers, promoters, insulators
| +-- cCRE activity by cell type
| +-- CTCF binding sites
|
+-- PHASE 2: Transcription Factor Binding
| +-- JASPAR: TF binding motifs and PWMs
| +-- ReMap: ChIP-seq validated TF binding sites
| +-- ENCODE: TF ChIP-seq experiments
|
+-- PHASE 3: Regulatory Variant Scoring
| +-- RegulomeDB: Variant regulatory evidence score
| +-- Functional annotations from multiple data types
|
+-- PHASE 4: ENCODE Functional Genomics
| +-- Histone modification ChIP-seq
| +-- ATAC-seq / DNase-seq accessibility
| +-- RNA-seq expression context
| +-- Available experiments and datasets
|
+-- PHASE 5: Chromatin Conformation (4D Nucleome)
| +-- Hi-C contact maps
| +-- TAD boundaries
| +-- Chromatin loops and compartments
|
+-- PHASE 6: Ensembl Regulatory Annotation
| +-- Regulatory build features
| +-- Promoter/enhancer/CTCF site annotations
| +-- Activity states across cell types
|
+-- SYNTHESIS: Integrated Regulatory Model
+-- Aggregate regulatory evidence
+-- Build gene regulation model
+-- Identify key regulatory elements and TFs
+-- Data gaps and experimental recommendationsGene / Region / Variant Query
|
+-- PHASE 0: Gene/Region Resolution (ALWAYS FIRST)
| +-- Resolve gene symbol -> Ensembl ID, coordinates, aliases
| +-- Define genomic region of interest (+/- 500kb flanking)
|
+-- PHASE 1: Cis-Regulatory Elements (SCREEN)
| +-- Candidate enhancers, promoters, insulators
| +-- cCRE activity by cell type
| +-- CTCF binding sites
|
+-- PHASE 2: Transcription Factor Binding
| +-- JASPAR: TF binding motifs and PWMs
| +-- ReMap: ChIP-seq validated TF binding sites
| +-- ENCODE: TF ChIP-seq experiments
|
+-- PHASE 3: Regulatory Variant Scoring
| +-- RegulomeDB: Variant regulatory evidence score
| +-- Functional annotations from multiple data types
|
+-- PHASE 4: ENCODE Functional Genomics
| +-- Histone modification ChIP-seq
| +-- ATAC-seq / DNase-seq accessibility
| +-- RNA-seq expression context
| +-- Available experiments and datasets
|
+-- PHASE 5: Chromatin Conformation (4D Nucleome)
| +-- Hi-C contact maps
| +-- TAD boundaries
| +-- Chromatin loops and compartments
|
+-- PHASE 6: Ensembl Regulatory Annotation
| +-- Regulatory build features
| +-- Promoter/enhancer/CTCF site annotations
| +-- Activity states across cell types
|
+-- SYNTHESIS: Integrated Regulatory Model
+-- Aggregate regulatory evidence
+-- Build gene regulation model
+-- Identify key regulatory elements and TFs
+-- Data gaps and experimental recommendationsPhase 0: Gene/Region Resolution (ALWAYS FIRST)
阶段0:基因/区域解析(始终优先执行)
CRITICAL: Resolve gene identity and genomic coordinates before any analysis.
关键要求: 任何分析前必须先解析基因身份和基因组坐标。
Input Types Handled
支持的输入类型
| Input Format | Resolution Strategy |
|---|---|
| Gene symbol (e.g., "BRCA1") | Ensembl lookup -> coordinates, Ensembl ID |
| Genomic region (e.g., "chr17:43044295-43170245") | Use directly; identify overlapping genes |
| Ensembl ID (e.g., "ENSG00000012048") | Ensembl lookup -> symbol, coordinates |
| rsID (e.g., "rs12345") | RegulomeDB/Ensembl -> coordinates, nearby genes |
| 输入格式 | 解析策略 |
|---|---|
| 基因符号(例如 "BRCA1") | Ensembl查询 -> 坐标、Ensembl ID |
| 基因组区域(例如 "chr17:43044295-43170245") | 直接使用;识别重叠基因 |
| Ensembl ID(例如 "ENSG00000012048") | Ensembl查询 -> 符号、坐标 |
| rsID(例如 "rs12345") | RegulomeDB/Ensembl -> 坐标、附近基因 |
Resolution Tools
解析工具
| Tool | Purpose | Parameters |
|---|---|---|
| Gene symbol to Ensembl ID + coordinates | |
| Official gene symbol, aliases | |
| Cross-references to external databases | |
| 工具 | 用途 | 参数 |
|---|---|---|
| 基因符号转Ensembl ID + 坐标 | |
| 官方基因符号、别名 | |
| 外部数据库交叉引用 | |
Disambiguation Output
消歧输出
markdown
undefinedmarkdown
undefinedGene Identity
基因身份信息
| Property | Value |
|---|---|
| Gene Symbol | TP53 |
| Ensembl ID | ENSG00000141510 |
| Chromosome | 17 |
| Start | 7661779 |
| End | 7687550 |
| Strand | - |
| Region of Interest | 17:7161779-8187550 (+/- 500kb) |
| Aliases | p53, TRP53, LFS1 |
---| 属性 | 值 |
|---|---|
| 基因符号 | TP53 |
| Ensembl ID | ENSG00000141510 |
| 染色体 | 17 |
| 起始位置 | 7661779 |
| 终止位置 | 7687550 |
| 链方向 | - |
| 感兴趣区域 | 17:7161779-8187550(上下游各500kb) |
| 别名 | p53, TRP53, LFS1 |
---Phase 1: Cis-Regulatory Elements (SCREEN)
阶段1:顺式调控元件(SCREEN)
When: Gene name or genomic region available
Objective: Catalog candidate cis-regulatory elements (cCREs) from the ENCODE SCREEN database
适用场景: 已获取基因名称或基因组区域
目标: 从ENCODE SCREEN数据库中分类候选顺式调控元件(cCRE)
Tools Used
使用工具
| Tool | Function | Parameters |
|---|---|---|
| Get cCREs for a gene | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 获取基因的cCRE数据 | |
Workflow
工作流程
- Query enhancers:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="enhancer", limit=20) - Query promoters:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="promoter", limit=20) - Query insulators:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="insulator", limit=10) - For each element: extract coordinates, activity scores, cell type specificity
- 查询增强子:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="enhancer", limit=20) - 查询启动子:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="promoter", limit=20) - 查询绝缘子:
SCREEN_get_regulatory_elements(gene_name=gene, element_type="insulator", limit=10) - 针对每个元件:提取坐标、活性评分、细胞类型特异性
Decision Logic
决策逻辑
- Multiple element types: Always query enhancers AND promoters (insulators optional)
- Empty results: Some genes have fewer regulatory elements; note counts
- Cell type specificity: SCREEN data is cell-type annotated; report top active cell types
- All findings graded [T2]: SCREEN cCREs are experimentally derived from ENCODE data
- 多元件类型: 始终查询增强子和启动子(绝缘子为可选)
- 空结果: 部分基因的调控元件较少;需记录数量
- 细胞类型特异性: SCREEN数据带有细胞类型注释;报告活性最高的细胞类型
- 所有发现分级为[T2]: SCREEN cCRE来自ENCODE的实验数据
Output Format
输出格式
markdown
undefinedmarkdown
undefinedCis-Regulatory Elements (SCREEN) [T2]
顺式调控元件(SCREEN)[T2]
Enhancers (15 found)
增强子(发现15个)
| Element ID | Coordinates | Activity Score | Top Cell Types |
|---|---|---|---|
| EH38E1234567 | chr17:7650000-7651000 | 0.95 | HepG2, K562 |
| ... | ... | ... | ... |
| 元件ID | 坐标 | 活性评分 | 主要活性细胞类型 |
|---|---|---|---|
| EH38E1234567 | chr17:7650000-7651000 | 0.95 | HepG2, K562 |
| ... | ... | ... | ... |
Promoters (3 found)
启动子(发现3个)
| Element ID | Coordinates | Activity Score | Top Cell Types |
|---|---|---|---|
| EH38E9876543 | chr17:7687000-7688000 | 0.99 | Ubiquitous |
| ... | ... | ... | ... |
| 元件ID | 坐标 | 活性评分 | 主要活性细胞类型 |
|---|---|---|---|
| EH38E9876543 | chr17:7687000-7688000 | 0.99 | 泛表达 |
| ... | ... | ... | ... |
Insulators (2 found)
绝缘子(发现2个)
| Element ID | Coordinates | CTCF Binding |
|---|---|---|
| EH38E5555555 | chr17:7700000-7701000 | Yes |
---| 元件ID | 坐标 | CTCF结合情况 |
|---|---|---|
| EH38E5555555 | chr17:7700000-7701000 | 是 |
---Phase 2: Transcription Factor Binding
阶段2:转录因子结合
When: Gene symbol available
Objective: Identify transcription factors that regulate the gene through motif analysis and ChIP-seq binding data
适用场景: 已获取基因符号
目标: 通过基序分析和ChIP-seq结合数据识别调控该基因的转录因子
Tools Used
使用工具
JASPAR - TF Binding Motifs
JASPAR - 转录因子结合基序
| Tool | Function | Parameters |
|---|---|---|
| Search TF binding motifs | |
| Get PWM for specific TF | |
| List TFs in collection | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 搜索转录因子结合基序 | |
| 获取特定转录因子的PWM | |
| 列出集合中的转录因子 | |
ReMap - Validated TF Binding Sites
ReMap - 验证后的转录因子结合位点
| Tool | Function | Parameters |
|---|---|---|
| Get TF binding sites near gene | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 获取基因附近的转录因子结合位点 | |
ENCODE - ChIP-seq Experiments
ENCODE - ChIP-seq实验数据
| Tool | Function | Parameters |
|---|---|---|
| Search TF ChIP-seq experiments | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 搜索转录因子ChIP-seq实验 | |
Workflow
工作流程
- JASPAR motif search: Search for known TF binding motifs
jaspar_search_matrices(search=gene_symbol, collection="CORE", species="9606")- If gene IS a TF: get its PWM binding motif
- If gene is NOT a TF: identify TFs known to bind its promoter
- ReMap binding data: Get experimentally validated TF binding sites
ReMap_get_transcription_factor_binding(gene_name=gene, cell_type="HepG2", limit=20)- Try multiple cell types: "HepG2", "K562", "MCF-7", "GM12878"
- ENCODE ChIP-seq: Find available ChIP-seq experiments for key TFs
ENCODE_search_experiments(assay_title="ChIP-seq", target=top_tf, organism="Homo sapiens", limit=5)
- JASPAR基序搜索: 搜索已知转录因子结合基序
jaspar_search_matrices(search=gene_symbol, collection="CORE", species="9606")- 如果该基因是转录因子:获取其PWM结合基序
- 如果该基因不是转录因子:识别已知会结合到其启动子的转录因子
- ReMap结合数据: 获取实验验证的转录因子结合位点
ReMap_get_transcription_factor_binding(gene_name=gene, cell_type="HepG2", limit=20)- 尝试多种细胞类型:"HepG2"、"K562"、"MCF-7"、"GM12878"
- ENCODE ChIP-seq: 查找关键转录因子的可用ChIP-seq实验数据
ENCODE_search_experiments(assay_title="ChIP-seq", target=top_tf, organism="Homo sapiens", limit=5)
Decision Logic
决策逻辑
- Gene is a TF: Show its binding motif (JASPAR PWM) + target genes + ENCODE ChIP-seq experiments
- Gene is NOT a TF: Show TFs that bind its promoter/enhancers (ReMap) + relevant motifs
- Multiple cell types for ReMap: Query at least 2-3 common cell types
- JASPAR grades [T3]: Motif predictions are computational
- ReMap grades [T2]: Based on experimental ChIP-seq data
- ENCODE grades [T2]: Direct experimental data
- 基因是转录因子: 展示其结合基序(JASPAR PWM)+ 靶基因 + ENCODE ChIP-seq实验数据
- 基因不是转录因子: 展示结合到其启动子/增强子的转录因子(ReMap)+ 相关基序
- ReMap多细胞类型: 至少查询2-3种常见细胞类型
- JASPAR分级为[T3]: 基序预测属于计算结果
- ReMap分级为[T2]: 基于实验ChIP-seq数据
- ENCODE分级为[T2]: 直接实验数据
Output Format
输出格式
markdown
undefinedmarkdown
undefinedTranscription Factor Binding
转录因子结合情况
JASPAR Binding Motifs [T3]
JASPAR结合基序 [T3]
| Matrix ID | TF Name | Score | Sequence Logo |
|---|---|---|---|
| MA0106.3 | TP53 | 0.92 | RRRCWWGYYY |
| ... | ... | ... | ... |
| 矩阵ID | 转录因子名称 | 评分 | 序列标识 |
|---|---|---|---|
| MA0106.3 | TP53 | 0.92 | RRRCWWGYYY |
| ... | ... | ... | ... |
ReMap ChIP-seq Validated Binding [T2]
ReMap ChIP-seq验证结合位点 [T2]
| Transcription Factor | Cell Type | Binding Score | Coordinates |
|---|---|---|---|
| SP1 | HepG2 | 850 | chr17:7687200-7687500 |
| CTCF | K562 | 920 | chr17:7700100-7700400 |
| ... | ... | ... | ... |
| 转录因子 | 细胞类型 | 结合评分 | 坐标 |
|---|---|---|---|
| SP1 | HepG2 | 850 | chr17:7687200-7687500 |
| CTCF | K562 | 920 | chr17:7700100-7700400 |
| ... | ... | ... | ... |
ENCODE ChIP-seq Experiments Available [T2]
可用ENCODE ChIP-seq实验 [T2]
| Experiment | Target | Cell Type | Files | Status |
|---|---|---|---|---|
| ENCSR000BNT | TP53 | HepG2 | 12 | released |
| ... | ... | ... | ... | ... |
---| 实验 | 靶标 | 细胞类型 | 文件数 | 状态 |
|---|---|---|---|---|
| ENCSR000BNT | TP53 | HepG2 | 12 | 已发布 |
| ... | ... | ... | ... | ... |
---Phase 3: Regulatory Variant Scoring
阶段3:调控变异评分
When: rsID or variant provided, OR gene has known regulatory variants
Objective: Assess the regulatory impact of genetic variants in the region
适用场景: 已提供rsID或变异信息,或基因存在已知调控变异
目标: 评估区域内遗传变异的调控影响
Tools Used
使用工具
| Tool | Function | Parameters |
|---|---|---|
| Get regulatory evidence score for variant | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 获取变异的调控证据评分 | |
Workflow
工作流程
- If rsID provided: Query RegulomeDB directly
RegulomeDB_query_variant(rsid=rsid)
- Parse RegulomeDB score (1a-7): lower = more regulatory evidence
- Extract supporting evidence types (eQTL, TF binding, chromatin state, etc.)
- Cross-reference with SCREEN and ENCODE data from other phases
- 如果提供rsID:直接查询RegulomeDB
RegulomeDB_query_variant(rsid=rsid)
- 解析RegulomeDB评分(1a-7):评分越低,调控证据越充分
- 提取支持证据类型(eQTL、转录因子结合、染色质状态等)
- 与其他阶段的SCREEN和ENCODE数据进行交叉引用
RegulomeDB Score Interpretation
RegulomeDB评分解读
| Score | Meaning | Evidence Level |
|---|---|---|
| 1a | eQTL + TF binding + DNase + motif | Very likely regulatory [T2] |
| 1b | eQTL + TF binding + DNase | Likely regulatory [T2] |
| 1c | eQTL + TF binding + motif | Likely regulatory [T2] |
| 1d | eQTL + TF binding | Likely regulatory [T2] |
| 1e | eQTL + DNase | Likely regulatory [T3] |
| 1f | eQTL only | Possible regulatory [T3] |
| 2a-2c | TF binding + DNase/motif | Likely affects TF binding [T3] |
| 3a-3b | DNase or ChIP-seq evidence | Some evidence [T3] |
| 4-7 | Minimal or no evidence | Limited evidence [T4] |
| 评分 | 含义 | 证据等级 |
|---|---|---|
| 1a | eQTL + 转录因子结合 + DNase + 基序 | 极可能具有调控功能 [T2] |
| 1b | eQTL + 转录因子结合 + DNase | 可能具有调控功能 [T2] |
| 1c | eQTL + 转录因子结合 + 基序 | 可能具有调控功能 [T2] |
| 1d | eQTL + 转录因子结合 | 可能具有调控功能 [T2] |
| 1e | eQTL + DNase | 可能具有调控功能 [T3] |
| 1f | 仅eQTL | 可能具有调控功能 [T3] |
| 2a-2c | 转录因子结合 + DNase/基序 | 可能影响转录因子结合 [T3] |
| 3a-3b | DNase或ChIP-seq证据 | 有一定证据 [T3] |
| 4-7 | 极少或无证据 | 证据有限 [T4] |
Decision Logic
决策逻辑
- Score 1a-1d: Flag as likely functional regulatory variant; high confidence
- Score 2a-3b: Moderate evidence; recommend experimental validation
- Score 4-7: Low regulatory evidence; likely benign regulatory impact
- No rsID provided: Skip this phase gracefully; note "no variant specified"
- 评分1a-1d: 标记为可能的功能性调控变异;置信度高
- 评分2a-3b: 中等证据;建议实验验证
- 评分4-7: 调控证据低;调控影响可能为良性
- 未提供rsID: 优雅跳过该阶段;标注“未指定变异”
Output Format
输出格式
markdown
undefinedmarkdown
undefinedRegulatory Variant Impact [T2/T3]
调控变异影响 [T2/T3]
| Variant | RegulomeDB Score | Interpretation | Evidence Types |
|---|---|---|---|
| rs12345 | 1b | Likely regulatory | eQTL, TF binding, DNase |
| rs67890 | 3a | Some evidence | DNase peak |
---| 变异 | RegulomeDB评分 | 解读 | 证据类型 |
|---|---|---|---|
| rs12345 | 1b | 可能具有调控功能 | eQTL、转录因子结合、DNase |
| rs67890 | 3a | 有一定证据 | DNase峰 |
---Phase 4: ENCODE Functional Genomics
阶段4:ENCODE功能基因组学
When: Gene or region available
Objective: Discover functional genomics experiments and datasets from ENCODE
适用场景: 已获取基因或区域信息
目标: 从ENCODE发现功能基因组学实验和数据集
Tools Used
使用工具
| Tool | Function | Parameters |
|---|---|---|
| Search experiments by assay/target | |
| Get detailed experiment metadata | |
| List available data files | |
| Search available cell types | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 按实验类型/靶标搜索实验 | |
| 获取详细实验元数据 | |
| 列出可用数据文件 | |
| 搜索可用细胞类型 | |
Workflow
工作流程
- Histone marks: Search for H3K4me3 (promoter), H3K27ac (enhancer), H3K4me1 (enhancer), H3K27me3 (repressive)
ENCODE_search_experiments(assay_title="ChIP-seq", target="H3K27ac", organism="Homo sapiens", limit=5)
- Chromatin accessibility: Search ATAC-seq and DNase-seq
ENCODE_search_experiments(assay_title="ATAC-seq", organism="Homo sapiens", limit=5)
- If gene is a TF: Search for ChIP-seq of that TF
ENCODE_search_experiments(assay_title="ChIP-seq", target=gene, organism="Homo sapiens", limit=5)
- RNA-seq context: Search for expression experiments
ENCODE_search_experiments(assay_title="RNA-seq", organism="Homo sapiens", limit=5)
- 组蛋白标记: 搜索H3K4me3(启动子)、H3K27ac(增强子)、H3K4me1(增强子)、H3K27me3(抑制型)
ENCODE_search_experiments(assay_title="ChIP-seq", target="H3K27ac", organism="Homo sapiens", limit=5)
- 染色质可及性: 搜索ATAC-seq和DNase-seq数据
ENCODE_search_experiments(assay_title="ATAC-seq", organism="Homo sapiens", limit=5)
- 如果基因是转录因子: 搜索该转录因子的ChIP-seq数据
ENCODE_search_experiments(assay_title="ChIP-seq", target=gene, organism="Homo sapiens", limit=5)
- RNA-seq上下文: 搜索表达实验数据
ENCODE_search_experiments(assay_title="RNA-seq", organism="Homo sapiens", limit=5)
Decision Logic
决策逻辑
- Prioritize by relevance: Histone marks and accessibility most informative for regulatory analysis
- Cell type matching: When possible, focus on cell types relevant to user's question
- Experiment quality: Prefer "released" status and recent experiments
- Data volume: ENCODE has thousands of experiments; limit results and highlight most relevant
- All ENCODE data graded [T2]: High-quality experimental data
- 相关性优先: 组蛋白标记和可及性数据对调控分析最具参考价值
- 细胞类型匹配: 尽可能聚焦与用户问题相关的细胞类型
- 实验质量: 优先选择“已发布”状态和近期实验
- 数据量: ENCODE拥有数千个实验;限制结果数量,重点展示最相关的数据
- 所有ENCODE数据分级为[T2]: 高质量实验数据
Output Format
输出格式
markdown
undefinedmarkdown
undefinedENCODE Functional Genomics [T2]
ENCODE功能基因组学 [T2]
Histone Modification Experiments
组蛋白修饰实验
| Experiment | Mark | Cell Type | Status | Files |
|---|---|---|---|---|
| ENCSR000AKP | H3K27ac | HepG2 | released | 8 |
| ENCSR000ALA | H3K4me3 | K562 | released | 6 |
| 实验 | 标记 | 细胞类型 | 状态 | 文件数 |
|---|---|---|---|---|
| ENCSR000AKP | H3K27ac | HepG2 | 已发布 | 8 |
| ENCSR000ALA | H3K4me3 | K562 | 已发布 | 6 |
Chromatin Accessibility
染色质可及性
| Experiment | Assay | Cell Type | Status |
|---|---|---|---|
| ENCSR889WQX | ATAC-seq | GM12878 | released |
| 实验 | 实验类型 | 细胞类型 | 状态 |
|---|---|---|---|
| ENCSR889WQX | ATAC-seq | GM12878 | 已发布 |
TF ChIP-seq (for [gene] if TF)
转录因子ChIP-seq(若基因为转录因子)
| Experiment | Target | Cell Type | Status |
|---|---|---|---|
| ENCSR000BNT | TP53 | HepG2 | released |
---| 实验 | 靶标 | 细胞类型 | 状态 |
|---|---|---|---|
| ENCSR000BNT | TP53 | HepG2 | 已发布 |
---Phase 5: Chromatin Conformation (4D Nucleome)
阶段5:染色质构象(4D Nucleome)
When: Gene or region available
Objective: Explore 3D genome organization data from the 4D Nucleome project
适用场景: 已获取基因或区域信息
目标: 探索4D Nucleome项目的3D基因组组织数据
Tools Used
使用工具
| Tool | Function | Parameters |
|---|---|---|
| Search Hi-C data | |
| Get experiment details | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 搜索Hi-C数据 | |
| 获取实验详情 | |
Workflow
工作流程
- Search Hi-C experiments:
FourDN_search_data(operation="search_data", assay_title="Hi-C", limit=10) - Search Micro-C data:
FourDN_search_data(operation="search_data", assay_title="Micro-C", limit=5) - For relevant experiments: get metadata for top results
- Note available cell types and data types
- 搜索Hi-C实验:
FourDN_search_data(operation="search_data", assay_title="Hi-C", limit=10) - 搜索Micro-C数据:
FourDN_search_data(operation="search_data", assay_title="Micro-C", limit=5) - 针对相关实验:获取排名靠前结果的元数据
- 记录可用细胞类型和数据类型
Decision Logic
决策逻辑
- IMPORTANT: 4DN tools require parameter - This is a SOAP-style tool
operation - Hi-C vs Micro-C: Micro-C has higher resolution for local interactions
- Cell type matching: Note which cell types have chromatin data
- Data availability: 4DN may not cover all cell types of interest
- Grade [T2]: High-quality experimental chromatin conformation data
- 重要提示:4DN工具需要参数 - 这是SOAP风格的工具
operation - Hi-C vs Micro-C: Micro-C对局部相互作用的分辨率更高
- 细胞类型匹配: 记录哪些细胞类型有染色质数据
- 数据可用性: 4DN可能未覆盖所有感兴趣的细胞类型
- 分级为[T2]: 高质量实验染色质构象数据
Output Format
输出格式
markdown
undefinedmarkdown
undefinedChromatin Conformation (4D Nucleome) [T2]
染色质构象(4D Nucleome)[T2]
Available Hi-C Datasets
可用Hi-C数据集
| Experiment | Cell Type | Assay | Resolution | Status |
|---|---|---|---|---|
| 4DNESXXXXXXX | H1-hESC | Hi-C | 10kb | released |
| 4DNESYYYYYYY | GM12878 | Micro-C | 1kb | released |
| 实验 | 细胞类型 | 实验类型 | 分辨率 | 状态 |
|---|---|---|---|---|
| 4DNESXXXXXXX | H1-hESC | Hi-C | 10kb | 已发布 |
| 4DNESYYYYYYY | GM12878 | Micro-C | 1kb | 已发布 |
Chromatin Organization Context
染色质组织上下文
- TAD: Gene located within TAD spanning chr17:7.1-8.2Mb
- Compartment: A compartment (active)
- Nearby CTCF sites: 3 CTCF sites within 100kb (from SCREEN Phase 1)
---- TAD: 基因位于TAD区域内,范围为chr17:7.1-8.2Mb
- 染色质区室: A区室(活性区)
- 附近CTCF位点: 100kb范围内有3个CTCF位点(来自阶段1的SCREEN数据)
---Phase 6: Ensembl Regulatory Annotation
阶段6:Ensembl调控注释
When: Genomic region coordinates available
Objective: Get regulatory feature annotations from the Ensembl Regulatory Build
适用场景: 已获取基因组区域坐标
目标: 从Ensembl调控构建中获取调控特征注释
Tools Used
使用工具
| Tool | Function | Parameters |
|---|---|---|
| Get regulatory features in region | |
| 工具 | 功能 | 参数 |
|---|---|---|
| 获取区域内的调控特征 | |
Workflow
工作流程
- Get regulatory features:
ensembl_get_regulatory_features(region="17:7661779-7687550", feature="regulatory", species="human") - Parse feature types: promoter, enhancer, CTCF_binding_site, TF_binding_site, open_chromatin_region
- Note activity states across cell types when available
- 获取调控特征:
ensembl_get_regulatory_features(region="17:7661779-7687550", feature="regulatory", species="human") - 解析特征类型:启动子、增强子、CTCF结合位点、转录因子结合位点、开放染色质区域
- 记录跨细胞类型的活性状态(若有)
Decision Logic
决策逻辑
- Region format: Use chromosome:start-end without "chr" prefix
- Feature parameter: Must be "regulatory" for this endpoint
- Cross-reference with SCREEN: Compare Ensembl regulatory build with SCREEN cCREs
- Grade [T3]: Ensembl regulatory build is computationally derived
- 区域格式: 使用chromosome:start-end格式,不带"chr"前缀
- Feature参数: 该端点必须设置为"regulatory"
- 与SCREEN交叉引用: 对比Ensembl调控构建与SCREEN cCRE数据
- 分级为[T3]: Ensembl调控构建为计算推导结果
Output Format
输出格式
markdown
undefinedmarkdown
undefinedEnsembl Regulatory Build [T3]
Ensembl调控构建 [T3]
| Feature ID | Type | Coordinates | Activity State |
|---|---|---|---|
| ENSR00000123456 | Promoter | 17:7687200-7688000 | Active (most cell types) |
| ENSR00000789012 | Enhancer | 17:7650000-7651500 | Active (liver, lung) |
| ENSR00000345678 | CTCF_binding_site | 17:7700000-7700500 | Active |
---| 特征ID | 类型 | 坐标 | 活性状态 |
|---|---|---|---|
| ENSR00000123456 | 启动子 | 17:7687200-7688000 | 活性(大多数细胞类型) |
| ENSR00000789012 | 增强子 | 17:7650000-7651500 | 活性(肝脏、肺脏) |
| ENSR00000345678 | CTCF结合位点 | 17:7700000-7700500 | 活性 |
---Synthesis: Integrated Regulatory Model (MANDATORY)
整合:调控模型(强制要求)
Always the final section. Integrates all evidence into a coherent regulatory model.
必须作为最后一节。将所有整合成连贯的调控模型。
Synthesis Template
整合模板
markdown
undefinedmarkdown
undefinedIntegrated Regulatory Model
整合调控模型
Regulatory Architecture Summary
调控架构摘要
Gene: [GENE] ([Ensembl ID])
Region analyzed: [coordinates] ([size]kb)
基因: [GENE] ([Ensembl ID])
分析区域: [坐标](大小[size]kb)
Key Regulatory Elements
关键调控元件
-
Proximal promoter [T2/T3]: Located at [coords], active in [cell types]
- TFs binding: SP1, CTCF, [others from ReMap]
- Histone marks: H3K4me3 (ENCODE), H3K27ac (ENCODE)
- SCREEN cCRE: [element ID]
-
Distal enhancer 1 [T2]: Located at [coords], [distance] from TSS
- Active in [cell types] (SCREEN)
- TF binding: [TFs from ReMap/ENCODE]
- Hi-C contact with promoter: [Yes/No/Unknown]
-
CTCF insulator [T2]: Located at [coords]
- Defines TAD boundary
- CTCF motif score: [from JASPAR]
-
近端启动子 [T2/T3]: 位于[坐标],在[细胞类型]中具有活性
- 结合的转录因子: SP1、CTCF、[来自ReMap的其他因子]
- 组蛋白标记: H3K4me3(ENCODE)、H3K27ac(ENCODE)
- SCREEN cCRE: [元件ID]
-
远端增强子1 [T2]: 位于[坐标],距TSS[距离]
- 在[细胞类型]中具有活性(SCREEN)
- 结合的转录因子: [来自ReMap/ENCODE的转录因子]
- 与启动子的Hi-C互作: 是/否/未知
-
CTCF绝缘子 [T2]: 位于[坐标]
- 定义TAD边界
- CTCF基序评分: [来自JASPAR]
Transcription Factor Regulatory Network
转录因子调控网络
| TF | Binding Evidence | Motif Match | Cell Types | Role |
|---|---|---|---|---|
| SP1 | ReMap ChIP-seq [T2] | JASPAR 0.92 [T3] | HepG2, K562 | Activator |
| CTCF | ENCODE ChIP-seq [T2] | JASPAR 0.98 [T3] | Ubiquitous | Insulator |
| 转录因子 | 结合证据 | 基序匹配 | 细胞类型 | 作用 |
|---|---|---|---|---|
| SP1 | ReMap ChIP-seq [T2] | JASPAR 0.92 [T3] | HepG2、K562 | 激活子 |
| CTCF | ENCODE ChIP-seq [T2] | JASPAR 0.98 [T3] | 泛表达 | 绝缘子 |
Regulatory Variants (if applicable)
调控变异(如适用)
| Variant | RegulomeDB Score | Regulatory Impact | Affected Element |
|---|---|---|---|
| rs12345 | 1b | Disrupts SP1 binding | Proximal promoter |
| 变异 | RegulomeDB评分 | 调控影响 | 受影响元件 |
|---|---|---|---|
| rs12345 | 1b | 干扰SP1结合 | 近端启动子 |
Evidence Quality Assessment
证据质量评估
| Dimension | Data Available | Evidence Tier | Confidence |
|---|---|---|---|
| cCREs (SCREEN) | 15 enhancers, 3 promoters | [T2] | High |
| TF Binding (ReMap) | 8 TFs validated | [T2] | High |
| Motifs (JASPAR) | 12 motif matches | [T3] | Medium |
| ENCODE experiments | 25 relevant datasets | [T2] | High |
| Chromatin (4DN) | Hi-C in 3 cell types | [T2] | Medium |
| Regulatory Build | 5 features annotated | [T3] | Medium |
| 维度 | 可用数据 | 证据等级 | 置信度 |
|---|---|---|---|
| cCREs(SCREEN) | 15个增强子、3个启动子 | [T2] | 高 |
| 转录因子结合(ReMap) | 8个已验证转录因子 | [T2] | 高 |
| 基序(JASPAR) | 12个基序匹配 | [T3] | 中 |
| ENCODE实验 | 25个相关数据集 | [T2] | 高 |
| 染色质(4DN) | 3种细胞类型的Hi-C数据 | [T2] | 中 |
| 调控构建 | 5个注释特征 | [T3] | 中 |
Data Gaps
数据缺口
- No single-cell ATAC-seq data available for this region
- Chromatin conformation data limited to 3 cell types
- No CRISPR-validated enhancers (would be needed for [T1])
- Regulatory variant impact is predictive (needs experimental validation)
- 该区域无单细胞ATAC-seq数据
- 染色质构象数据仅覆盖3种细胞类型
- 无CRISPR验证的增强子(需要此数据才能达到[T1]级)
- 调控变异影响为预测结果(需实验验证)
Experimental Recommendations
实验建议
- Validate key enhancers: CRISPR deletion or reporter assays for top 3 enhancers
- Confirm TF binding: ChIP-qPCR for SP1, CTCF at predicted sites
- Test regulatory variants: Allele-specific reporter assays for rs12345
---- 验证关键增强子: 对排名前三的增强子进行CRISPR敲除或报告基因实验
- 确认转录因子结合: 在预测位点对SP1、CTCF进行ChIP-qPCR实验
- 测试调控变异: 对rs12345进行等位基因特异性报告基因实验
---Mandatory Completeness Checklist
强制完整性检查清单
Before finalizing any report, verify:
- Phase 0: Gene/region fully resolved (symbol, Ensembl ID, coordinates)
- Phase 1: SCREEN queried for enhancers AND promoters (counts reported)
- Phase 2: At least 2 TF data sources queried (JASPAR + ReMap or ENCODE)
- Phase 3: RegulomeDB queried for variants OR "no variant specified" noted
- Phase 4: At least 2 ENCODE assay types searched (histone marks + accessibility)
- Phase 5: 4DN queried for Hi-C/Micro-C data OR "no chromatin data" noted
- Phase 6: Ensembl regulatory build queried OR "no regulatory features" noted
- Synthesis: Regulatory model provided with element catalog and TF network
- Evidence Grading: All findings have [T1]-[T4] annotations
- Cell-type context: Cell type specificity noted for all binding/activity data
- Data gaps: Explicitly listed in synthesis section
在最终确定报告前,需验证:
- 阶段0: 基因/区域已完全解析(符号、Ensembl ID、坐标)
- 阶段1: 已查询SCREEN的增强子和启动子(已报告数量)
- 阶段2: 至少查询2种转录因子数据源(JASPAR + ReMap或ENCODE)
- 阶段3: 已查询RegulomeDB的变异数据,或标注“未指定变异”
- 阶段4: 至少搜索2种ENCODE实验类型(组蛋白标记 + 可及性)
- 阶段5: 已查询4DN的Hi-C/Micro-C数据,或标注“无染色质数据”
- 阶段6: 已查询Ensembl调控构建,或标注“无调控特征”
- 整合: 已提供包含元件目录和转录因子网络的调控模型
- 证据分级: 所有发现均带有[T1]-[T4]标注
- 细胞类型上下文: 所有结合/活性数据均标注了细胞类型特异性
- 数据缺口: 在整合部分明确列出
Tool Parameter Reference
工具参数参考
Critical Parameter Notes (verified from source code):
| Tool | Parameter Name | Type | Notes |
|---|---|---|---|
| | str, str, int | element_type: "enhancer", "promoter", "insulator" |
| | str, str, int | cell_type default: "HepG2" |
| | str | rsID format (e.g., "rs12345") |
| | str (all optional) | species="9606" for human |
| | str | JASPAR matrix ID (e.g., "MA0106.3") |
| | str, int, int | collection="CORE" default |
| | str (all optional) | status="released" default |
| | str | ENCODE accession (e.g., "ENCSR000BNT") |
| | str, str, int | All optional |
| | str (all optional) | |
| | operation REQUIRED | operation="search_data" |
| | operation REQUIRED | operation="get_experiment_metadata" |
| | str, str, str | feature="regulatory", region="17:start-end" |
关键参数说明(来自源代码验证):
| 工具 | 参数名称 | 类型 | 说明 |
|---|---|---|---|
| | str, str, int | element_type可选值: "enhancer", "promoter", "insulator" |
| | str, str, int | cell_type默认值: "HepG2" |
| | str | rsID格式(例如 "rs12345") |
| | str(均为可选) | 人类物种设置为"9606" |
| | str | JASPAR矩阵ID(例如 "MA0106.3") |
| | str, int, int | collection默认值为"CORE" |
| | str(均为可选) | status默认值为"released" |
| | str | ENCODE登录号(例如 "ENCSR000BNT") |
| | str, str, int | 均为可选 |
| | str(均为可选) | |
| | operation为必填项 | operation="search_data" |
| | operation为必填项 | operation="get_experiment_metadata" |
| | str, str, str | feature="regulatory", region格式为"17:start-end" |
CRITICAL: SOAP-style Tools
关键提示:SOAP风格工具
The following tools require an parameter:
operation- FourDN_search_data:
operation="search_data" - FourDN_get_experiment_metadata:
operation="get_experiment_metadata" - FourDN_get_file_metadata:
operation="get_file_metadata" - FourDN_get_download_url:
operation="get_download_url"
以下工具必须包含参数:
operation- FourDN_search_data:
operation="search_data" - FourDN_get_experiment_metadata:
operation="get_experiment_metadata" - FourDN_get_file_metadata:
operation="get_file_metadata" - FourDN_get_download_url:
operation="get_download_url"
Response Format Notes (verified from testing)
响应格式说明(来自测试验证)
- SCREEN: Returns dict with ,
@context,@graph,@id,@typekeys (JSON-LD format)all - ReMap: Returns dict with TF binding records
- RegulomeDB: Returns with regulatory score and evidence in
{status, data, url}data - JASPAR search: Returns with matrix objects in
{count, next, previous, results}results - JASPAR get_matrix: Returns dict with matrix details (name, PFM, sequence logo)
- ENCODE: Returns dict with experiment/file objects (structure varies by endpoint)
- 4DN: Returns dict with search results
- Ensembl: Returns with regulatory features in
{status, data, url, content_type}data
- SCREEN: 返回包含,
@context,@graph,@id,@type键的字典(JSON-LD格式)all - ReMap: 返回包含转录因子结合记录的字典
- RegulomeDB: 返回,其中
{status, data, url}包含调控评分和证据data - JASPAR搜索: 返回,
{count, next, previous, results}包含矩阵对象results - JASPAR get_matrix: 返回包含矩阵详情的字典(名称、PFM、序列标识)
- ENCODE: 返回包含实验/文件对象的字典(结构因端点而异)
- 4DN: 返回包含搜索结果的字典
- Ensembl: 返回,
{status, data, url, content_type}包含调控特征data
Fallback Strategies
备选策略
Regulatory Elements
调控元件
- Primary: SCREEN cCREs by gene name
- Fallback: Ensembl Regulatory Build by coordinates
- If both empty: Note "limited regulatory annotation in this region"
- 首选: 按基因名称查询SCREEN cCRE
- 备选: 按坐标查询Ensembl调控构建
- 若两者均为空: 标注“该区域调控注释有限”
TF Binding
转录因子结合
- Primary: ReMap binding sites + JASPAR motifs
- Fallback: ENCODE ChIP-seq experiments
- If all empty: Gene may have limited TF binding data; note and continue
- 首选: ReMap结合位点 + JASPAR基序
- 备选: ENCODE ChIP-seq实验
- 若全部为空: 该基因可能转录因子结合数据有限;标注后继续分析
Chromatin Data
染色质数据
- Primary: 4DN Hi-C experiments
- Fallback: ENCODE Hi-C experiments
- If empty: Note "no chromatin conformation data available for this region"
- 首选: 4DN Hi-C实验
- 备选: ENCODE Hi-C实验
- 若为空: 标注“该区域无染色质构象数据”
Variant Scoring
变异评分
- Primary: RegulomeDB for rsID
- Fallback: SCREEN + ENCODE overlap analysis at variant position
- If no variant: Skip gracefully
- 首选: 按rsID查询RegulomeDB
- 备选: 在变异位置进行SCREEN + ENCODE重叠分析
- 若无变异信息: 优雅跳过
Common Use Patterns
常见使用模式
Pattern 1: Gene-Centric Regulatory Landscape
模式1:基因为中心的调控图谱
Input: Gene symbol (e.g., "TP53")
Workflow: All phases (0-6 + Synthesis)
Output: Complete regulatory atlas for the gene locus输入: 基因符号(例如 "TP53")
工作流程: 所有阶段(0-6 + 整合)
输出: 基因位点的完整调控图谱Pattern 2: Transcription Factor Target Analysis
模式2:转录因子靶标分析
Input: TF name (e.g., "CTCF")
Workflow: Phase 0 -> Phase 2 (JASPAR motif + ENCODE ChIP-seq) -> Phase 1 (target gene cCREs)
Output: TF binding motif, genome-wide binding data, target gene catalog输入: 转录因子名称(例如 "CTCF")
工作流程: 阶段0 -> 阶段2(JASPAR基序 + ENCODE ChIP-seq)-> 阶段1(靶基因cCRE)
输出: 转录因子结合基序、全基因组结合数据、靶基因目录Pattern 3: Non-Coding Variant Interpretation
模式3:非编码变异解读
Input: rsID (e.g., "rs6983267")
Workflow: Phase 0 -> Phase 3 (RegulomeDB) -> Phase 1 (nearby cCREs) -> Phase 2 (TF binding) -> Synthesis
Output: Regulatory impact assessment with functional context输入: rsID(例如 "rs6983267")
工作流程: 阶段0 -> 阶段3(RegulomeDB)-> 阶段1(附近cCRE)-> 阶段2(转录因子结合)-> 整合
输出: 带有功能上下文的调控影响评估Pattern 4: Cell-Type Specific Regulation
模式4:细胞类型特异性调控
Input: Gene + cell type (e.g., "MYC in HepG2")
Workflow: Phase 0 -> Phase 1 (SCREEN) -> Phase 2 (ReMap in HepG2) -> Phase 4 (ENCODE in HepG2)
Output: Cell-type specific regulatory landscape输入: 基因 + 细胞类型(例如 "MYC in HepG2")
工作流程: 阶段0 -> 阶段1(SCREEN)-> 阶段2(HepG2的ReMap数据)-> 阶段4(HepG2的ENCODE数据)
输出: 细胞类型特异性调控图谱Pattern 5: Epigenetic Data Discovery
模式5:表观遗传数据发现
Input: Histone mark or assay type (e.g., "H3K27ac ChIP-seq in liver")
Workflow: Phase 4 (ENCODE search) -> Phase 5 (4DN chromatin) -> Summary
Output: Available datasets and download information输入: 组蛋白标记或实验类型(例如 "肝脏中的H3K27ac ChIP-seq")
工作流程: 阶段4(ENCODE搜索)-> 阶段5(4DN染色质数据)-> 摘要
输出: 可用数据集和下载信息Limitations & Known Issues
局限性与已知问题
Database-Specific
数据库相关
- SCREEN: Limited to ENCODE-defined cCREs; may miss tissue-specific regulatory elements
- JASPAR: Motif predictions have false positive rate; binding =/= function
- ReMap: Coverage varies by TF and cell type; ~1000 TFs covered
- RegulomeDB: Scoring based on available data; novel variants may lack evidence
- ENCODE: Primarily human and mouse; limited other organisms
- 4DN: Focused on chromatin conformation; limited cell type coverage
- Ensembl: Regulatory build is computationally predicted; may miss novel elements
- SCREEN: 仅限于ENCODE定义的cCRE;可能遗漏组织特异性调控元件
- JASPAR: 基序预测存在假阳性率;结合不代表功能
- ReMap: 覆盖范围因转录因子和细胞类型而异;约覆盖1000种转录因子
- RegulomeDB: 评分基于现有数据;新型变异可能缺乏证据
- ENCODE: 主要覆盖人类和小鼠;其他物种数据有限
- 4DN: 聚焦染色质构象;细胞类型覆盖有限
- Ensembl: 调控构建为计算预测结果;可能遗漏新型元件
Analysis
分析相关
- Cell-type specificity: Regulatory elements are highly cell-type specific; data from one cell type may not generalize
- Functional validation gap: Most findings are [T2]-[T3]; [T1] validation requires experimental follow-up
- Non-coding complexity: Regulatory mechanisms are complex; catalog does not capture all interactions
- 3D genome: TAD and loop data available for limited cell types
- 细胞类型特异性: 调控元件具有高度细胞类型特异性;单一细胞类型的数据可能不具有通用性
- 功能验证缺口: 大多数发现为[T2]-[T3]级;[T1]级验证需要后续实验
- 非编码区复杂性: 调控机制复杂;目录无法涵盖所有相互作用
- 3D基因组: TAD和环数据仅覆盖有限细胞类型
Technical
技术相关
- 4DN operation parameter: Must include for all 4DN tools (SOAP-style)
operation - Region format: Ensembl uses "17:start-end" (no "chr" prefix); SCREEN/ENCODE may use "chr17:start-end"
- Large gene loci: Genes spanning >1Mb may require multiple queries
- 4DN operation参数: 所有4DN工具必须包含参数(SOAP风格)
operation - 区域格式: Ensembl使用"17:start-end"格式(不带"chr"前缀);SCREEN/ENCODE可能使用"chr17:start-end"
- 大基因位点: 跨度>1Mb的基因可能需要多次查询
Summary
总结
Epigenomics & Gene Regulation Skill provides comprehensive regulatory landscape analysis by integrating:
- Cis-regulatory elements (SCREEN) - Enhancers, promoters, insulators from ENCODE cCRE catalog
- Transcription factor binding (JASPAR + ReMap + ENCODE) - Motifs, validated binding sites, ChIP-seq data
- Regulatory variant scoring (RegulomeDB) - Evidence-based variant regulatory impact
- Functional genomics (ENCODE) - Histone marks, chromatin accessibility, expression
- Chromatin conformation (4D Nucleome) - Hi-C, TADs, chromatin loops
- Regulatory annotation (Ensembl) - Computational regulatory build features
Outputs: Structured markdown report with regulatory element catalog, TF network, variant scoring, and integrated regulatory model
Best for: Gene regulation analysis, non-coding variant interpretation, enhancer/promoter identification, TF binding profiling, epigenetic data discovery
Total tools integrated: 21 tools across 7 databases
表观基因组与基因调控技能通过整合以下数据提供全面的调控图谱分析:
- 顺式调控元件(SCREEN)- 来自ENCODE cCRE目录的增强子、启动子、绝缘子
- 转录因子结合(JASPAR + ReMap + ENCODE)- 基序、验证结合位点、ChIP-seq数据
- 调控变异评分(RegulomeDB)- 基于证据的变异调控影响评估
- 功能基因组学(ENCODE)- 组蛋白标记、染色质可及性、表达数据
- 染色质构象(4D Nucleome)- Hi-C、TAD、染色质环
- 调控注释(Ensembl)- 计算推导的调控构建特征
输出: 结构化Markdown报告,包含调控元件目录、转录因子网络、变异评分和整合调控模型
最佳适用场景: 基因调控分析、非编码变异解读、增强子/启动子识别、转录因子结合分析、表观遗传数据发现
整合工具总数: 7个数据库中的21个工具