tooluniverse-regulatory-genomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Regulatory Genomics Research Skill

调控基因组学研究技能

Systematic investigation of gene regulation through transcription factor binding, chromatin state, and regulatory element annotation. Integrates JASPAR (TF motifs), ENCODE (functional genomics experiments), RegulomeDB (regulatory variant scoring), and UCSC cCREs.
通过转录因子结合、染色质状态和调控元件注释对基因调控进行系统性研究。整合了JASPAR(TF基序)、ENCODE(功能基因组学实验)、RegulomeDB(调控变异评分)和UCSC cCREs工具。

Domain Reasoning

领域推理

Regulatory element identification requires converging lines of evidence: sequence conservation alone is insufficient (many conserved sequences are not regulatory), chromatin accessibility is necessary but not sufficient (open chromatin can be structural), TF binding peaks require motif validation, and eQTL evidence ties the element to a transcriptional outcome. No single data type is sufficient. A high-confidence regulatory element requires at least two independent evidence types, and ideally all four.
调控元件的鉴定需要多方面证据的支撑:仅靠序列保守性是不够的(许多保守序列不具备调控功能),染色质可及性是必要条件但非充分条件(开放染色质可能仅具有结构作用),TF结合峰需要基序验证,而eQTL证据可将元件与转录结果关联起来。单一数据类型不足以完成鉴定,高可信度的调控元件需要至少两种独立证据类型,理想情况下四种都具备。

LOOK UP DON'T GUESS

LOOK UP DON'T GUESS

  • TF binding motifs: retrieve from
    jaspar_search_matrices
    and
    jaspar_get_matrix
    ; do not describe motifs from memory.
  • Experimental ChIP-seq data: search
    ENCODE_search_experiments
    ; do not assume a TF has been profiled in a given cell type.
  • cCRE annotations for a genomic region: call
    UCSC_get_encode_cCREs
    with exact coordinates; do not guess element types.
  • Regulatory impact of a variant: query
    RegulomeDB_query_variant
    ; never estimate regulatory importance from position alone.

KEY PRINCIPLES:
  1. English-first queries - Use English gene/TF names in all tool calls; respond in user's language
  2. Evidence layering - Combine motif (JASPAR) + experimental (ENCODE ChIP-seq) + variant (RegulomeDB) evidence
  3. Coordinate precision - Genome coordinates must specify assembly (GRCh38 preferred)
  4. Negative results documented - Report when a TF has no ChIP-seq data in ENCODE

  • TF结合基序:从
    jaspar_search_matrices
    jaspar_get_matrix
    获取;切勿凭记忆描述基序。
  • 实验ChIP-seq数据:通过
    ENCODE_search_experiments
    搜索;切勿假设某一TF在特定细胞类型中已被分析过。
  • 基因组区域的cCRE注释:使用精确坐标调用
    UCSC_get_encode_cCREs
    ;切勿猜测元件类型。
  • 变异的调控影响:查询
    RegulomeDB_query_variant
    ;切勿仅根据位置估算调控重要性。

核心原则:
  1. 优先使用英文查询 - 在所有工具调用中使用英文基因/TF名称;以用户使用的语言回复
  2. 证据分层 - 结合基序(JASPAR)+实验(ENCODE ChIP-seq)+变异(RegulomeDB)证据
  3. 坐标精确性 - 基因组坐标必须指定组装版本(优先使用GRCh38)
  4. 记录阴性结果 - 当某一TF在ENCODE中无ChIP-seq数据时需进行报告

COMPUTE, DON'T DESCRIBE

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你会做什么——直接执行并报告实际结果。使用ToolUniverse工具获取数据,然后用Python(pandas、scipy、statsmodels、matplotlib)进行分析。

When to Use

使用场景

  • "What transcription factors bind near gene X?"
  • "Does this SNP affect a regulatory element?"
  • "Find CTCF binding sites in liver tissue"
  • "What are the enhancers active in this cell type?"
  • "Show me ChIP-seq experiments for H3K27ac in T cells"
  • "Is rs1234567 in a regulatory region?"
  • "What TF motifs overlap this genomic region?"
  • "Find ENCODE experiments for ATAC-seq in cancer cell lines"

  • "哪些转录因子结合在基因X附近?"
  • "这个SNP是否影响调控元件?"
  • "寻找肝脏组织中的CTCF结合位点"
  • "该细胞类型中活跃的增强子有哪些?"
  • "展示T细胞中H3K27ac的ChIP-seq实验"
  • "rs1234567是否位于调控区域?"
  • "哪些TF基序与该基因组区域重叠?"
  • "寻找癌细胞系中ATAC-seq的ENCODE实验"

Key Tools

核心工具

ToolPurposeKey Params
jaspar_search_matrices
Find TF binding motifs by TF name or organism
name
,
species
,
collection
,
tax_id
jaspar_get_matrix
Get full PWM/PFM for a specific JASPAR matrix
matrix_id
(e.g., "MA0139.1")
JASPAR_get_transcription_factors
List all TF matrices (paginated)
page
,
page_size
ENCODE_search_experiments
Search ENCODE ChIP-seq/ATAC-seq/WGBS experiments
assay_title
,
target
,
biosample_term_name
,
limit
ENCODE_search_histone_experiments
Search histone mark ChIP-seq specifically
histone_mark
,
biosample_term_name
,
limit
ENCODE_search_chromatin_accessibility
Search ATAC-seq/DNase-seq experiments
biosample_term_name
,
limit
ENCODE_get_experiment
Get full metadata for a specific ENCODE experiment
accession
(e.g., "ENCSR000EGM")
ENCODE_search_annotations
Search ENCODE cCRE and chromatin state annotations
annotation_type
,
biosample_term_name
,
limit
ENCODE_get_chromatin_state
Search ChromHMM segmentation data
biosample_term_name
,
limit
UCSC_get_encode_cCREs
Get cCREs overlapping a genomic region
chrom
,
start
,
end
RegulomeDB_query_variant
Score regulatory impact of a variant
rsid
(e.g., "rs4994")
ENCODE_search_biosamples
Find available cell lines/tissues in ENCODE
term_name
,
biosample_type
,
limit

工具用途关键参数
jaspar_search_matrices
根据TF名称或物种查找TF结合基序
name
,
species
,
collection
,
tax_id
jaspar_get_matrix
获取特定JASPAR矩阵的完整PWM/PFM
matrix_id
(例如:"MA0139.1")
JASPAR_get_transcription_factors
列出所有TF矩阵(分页)
page
,
page_size
ENCODE_search_experiments
搜索ENCODE的ChIP-seq/ATAC-seq/WGBS实验
assay_title
,
target
,
biosample_term_name
,
limit
ENCODE_search_histone_experiments
专门搜索组蛋白标记的ChIP-seq实验
histone_mark
,
biosample_term_name
,
limit
ENCODE_search_chromatin_accessibility
搜索ATAC-seq/DNase-seq实验
biosample_term_name
,
limit
ENCODE_get_experiment
获取特定ENCODE实验的完整元数据
accession
(例如:"ENCSR000EGM")
ENCODE_search_annotations
搜索ENCODE的cCRE和染色质状态注释
annotation_type
,
biosample_term_name
,
limit
ENCODE_get_chromatin_state
搜索ChromHMM分段数据
biosample_term_name
,
limit
UCSC_get_encode_cCREs
获取与基因组区域重叠的cCREs
chrom
,
start
,
end
RegulomeDB_query_variant
对变异的调控影响进行评分
rsid
(例如:"rs4994")
ENCODE_search_biosamples
在ENCODE中查找可用的细胞系/组织
term_name
,
biosample_type
,
limit

Workflow

工作流程

Phase 1: TF Motif Discovery (JASPAR)

阶段1:TF基序发现(JASPAR)

When asked about TF binding motifs or what TFs might regulate a gene:
1. jaspar_search_matrices(name="TF_NAME", species="Homo sapiens")
   -> Returns list of matrices with matrix_id, collection, version

2. jaspar_get_matrix(matrix_id="MA0139.1")
   -> Returns full PFM/PWM matrix, sequence logo URL, binding sites URL

3. For broad TF family search:
   jaspar_search_matrices(species="Homo sapiens", collection="CORE")
   -> Filter by TF family name in results
JASPAR Collections:
  • CORE
    : High-quality, non-redundant matrices (best for most use cases)
  • CNE
    : Conserved non-coding elements
  • POLII
    : RNA Pol II binding sites
Key Response Fields:
  • matrix_id
    : Versioned ID (e.g., "MA0139.1") — use for jaspar_get_matrix
  • name
    : TF gene symbol
  • sequence_logo
    : URL to binding site logo PNG/SVG
  • collection
    : Which JASPAR collection
当被问及TF结合基序或哪些TF可能调控某一基因时:
1. jaspar_search_matrices(name="TF_NAME", species="Homo sapiens")
   -> 返回包含matrix_id、collection、version的矩阵列表

2. jaspar_get_matrix(matrix_id="MA0139.1")
   -> 返回完整的PFM/PWM矩阵、序列logo URL、结合位点URL

3. 如需进行广泛的TF家族搜索:
   jaspar_search_matrices(species="Homo sapiens", collection="CORE")
   -> 在结果中按TF家族名称筛选
JASPAR集合:
  • CORE
    : 高质量、非冗余矩阵(适用于大多数场景)
  • CNE
    : 保守非编码元件
  • POLII
    : RNA Pol II结合位点
关键响应字段:
  • matrix_id
    : 带版本的ID(例如:"MA0139.1")——用于调用jaspar_get_matrix
  • name
    : TF基因符号
  • sequence_logo
    : 结合位点logo的PNG/SVG URL
  • collection
    : 所属的JASPAR集合

Phase 2: ENCODE Experiment Search

阶段2:ENCODE实验搜索

When looking for ChIP-seq, ATAC-seq, or other functional genomics data:
For TF ChIP-seq:
ENCODE_search_experiments(
    assay_title="TF ChIP-seq",
    target="CTCF",              # TF gene name
    biosample_term_name="HepG2", # Cell line or tissue
    limit=10
)
For histone marks:
ENCODE_search_histone_experiments(
    histone_mark="H3K27ac",         # or H3K4me3, H3K27me3, H3K36me3
    biosample_term_name="liver",
    limit=10
)
For chromatin accessibility:
ENCODE_search_chromatin_accessibility(
    biosample_term_name="T cell",
    limit=10
)
For regulatory annotations (cCREs, ChromHMM):
ENCODE_search_annotations(
    annotation_type="candidate Cis-Regulatory Elements",
    biosample_term_name="K562",
    limit=10
)
Common assay_title values:
  • "TF ChIP-seq"
    - Transcription factor binding
  • "Histone ChIP-seq"
    - Histone modification
  • "ATAC-seq"
    - Chromatin accessibility
  • "DNase-seq"
    - Open chromatin (older method)
  • "WGBS"
    - DNA methylation
Note:
ENCODE_search_experiments
returns experiment metadata only (accession, biosample, status). Use
ENCODE_get_experiment(accession)
to get file download links and detailed metadata.
当查找ChIP-seq、ATAC-seq或其他功能基因组学数据时:
TF ChIP-seq实验:
ENCODE_search_experiments(
    assay_title="TF ChIP-seq",
    target="CTCF",              # TF基因名称
    biosample_term_name="HepG2", # 细胞系或组织
    limit=10
)
组蛋白标记实验:
ENCODE_search_histone_experiments(
    histone_mark="H3K27ac",         # 或H3K4me3、H3K27me3、H3K36me3
    biosample_term_name="liver",
    limit=10
)
染色质可及性实验:
ENCODE_search_chromatin_accessibility(
    biosample_term_name="T cell",
    limit=10
)
调控注释实验(cCREs、ChromHMM):
ENCODE_search_annotations(
    annotation_type="candidate Cis-Regulatory Elements",
    biosample_term_name="K562",
    limit=10
)
常见assay_title值:
  • "TF ChIP-seq"
    - 转录因子结合
  • "Histone ChIP-seq"
    - 组蛋白修饰
  • "ATAC-seq"
    - 染色质可及性
  • "DNase-seq"
    - 开放染色质(旧方法)
  • "WGBS"
    - DNA甲基化
注意:
ENCODE_search_experiments
仅返回实验元数据(accession、生物样本、状态)。使用
ENCODE_get_experiment(accession)
获取文件下载链接和详细元数据。

Phase 3: cCRE Annotation (UCSC + ENCODE)

阶段3:cCRE注释(UCSC + ENCODE)

When annotating a specific genomic region:
UCSC_get_encode_cCREs(
    chrom="chr8",       # Chromosome (GRCh38)
    start=37966000,     # Start coordinate
    end=37967000        # End coordinate
)
当注释特定基因组区域时:
UCSC_get_encode_cCREs(
    chrom="chr8",       # 染色体(GRCh38)
    start=37966000,     # 起始坐标
    end=37967000        # 终止坐标
)

Returns cCREs with type: pELS (proximal enhancer), dELS (distal enhancer),

返回包含以下类型的cCREs:pELS(近端增强子)、dELS(远端增强子)、

PLS (promoter-like), CTCF-only, DNase-H3K4me3

PLS(启动子样元件)、仅CTCF结合元件、DNase-H3K4me3元件


**cCRE Types**:
- **PLS** (Promoter-like): High DNase + H3K4me3 + H3K27ac signal near TSS
- **pELS** (Proximal Enhancer): High DNase + H3K27ac, within 2kb of TSS
- **dELS** (Distal Enhancer): High DNase + H3K27ac, >2kb from TSS
- **CTCF-only**: CTCF binding without enhancer marks
- **DNase-H3K4me3**: Unclassified accessible region

**cCRE类型**:
- **PLS**(启动子样元件):TSS附近具有高DNase + H3K4me3 + H3K27ac信号
- **pELS**(近端增强子):高DNase + H3K27ac信号,位于TSS 2kb范围内
- **dELS**(远端增强子):高DNase + H3K27ac信号,距离TSS超过2kb
- **仅CTCF结合元件**:仅存在CTCF结合,无增强子标记
- **DNase-H3K4me3**:未分类的可及区域

Phase 4: Regulatory Variant Scoring (RegulomeDB)

阶段4:调控变异评分(RegulomeDB)

When assessing regulatory impact of a variant:
RegulomeDB_query_variant(rsid="rs4994")
当评估变异的调控影响时:
RegulomeDB_query_variant(rsid="rs4994")

Returns:

返回:

regulome_score.ranking: "1a"-"7" (1a = highest regulatory evidence)

regulome_score.ranking: "1a"-"7"(1a表示调控证据最强)

regulome_score.probability: 0-1 continuous score

regulome_score.probability: 0-1范围内的连续评分

tissue_specific_scores: dict of tissue -> score

tissue_specific_scores: 组织->评分的字典

overlapping features: eQTLs, TF binding, DNase peaks, motifs

overlapping features: eQTLs、TF结合、DNase峰、基序


**RegulomeDB Score Interpretation**:
| Rank | Meaning |
|------|---------|
| 1a | eQTL + TF binding + matched TF motif + DNase peak |
| 1b | eQTL + TF binding + DNase peak |
| 1c | eQTL + TF binding or DNase peak |
| 1d | eQTL + motif or protein binding |
| 1e | eQTL + motif hit |
| 1f | eQTL only |
| 2a | TF binding + motif match + DNase |
| 2b | TF binding + matched motif |
| 2c | TF binding with/without motif |
| 3a | DNase peak + motif |
| 3b | DNase peak only |
| 4 | Motif hit only |
| 5 | Proximity to Footprint |
| 6 | Proximity to Footprint + TF |
| 7 | No evidence |

Variants with rank 1a-2b are most likely to affect gene regulation.

---

**RegulomeDB评分解读**:
| 等级 | 含义 |
|------|---------|
| 1a | eQTL + TF结合 + 匹配的TF基序 + DNase峰 |
| 1b | eQTL + TF结合 + DNase峰 |
| 1c | eQTL + TF结合或DNase峰 |
| 1d | eQTL + 基序或蛋白结合 |
| 1e | eQTL + 基序匹配 |
| 1f | 仅eQTL |
| 2a | TF结合 + 基序匹配 + DNase |
| 2b | TF结合 + 匹配的基序 |
| 2c | 带/不带基序的TF结合 |
| 3a | DNase峰 + 基序 |
| 3b | 仅DNase峰 |
| 4 | 仅基序匹配 |
| 5 | 靠近Footprint |
| 6 | 靠近Footprint + TF |
| 7 | 无证据 |

等级为1a-2b的变异最有可能影响基因调控。

---

Tool Parameter Reference

工具参数参考

ToolRequired ParamsOptional ParamsNotes
jaspar_search_matrices
(none — returns all if empty)
name
,
species
,
collection
,
tax_id
,
page
,
page_size
Use
name
for TF name search
jaspar_get_matrix
matrix_id
Full version required: "MA0139.1" not "MA0139"
JASPAR_get_transcription_factors
(none)
page
,
page_size
Paginated; default page_size=10
jaspar_get_matrix_versions
base_id
base_id is unversioned (e.g., "MA0139")
ENCODE_search_experiments
(none — returns all if empty)
assay_title
,
target
,
biosample_term_name
,
limit
assay_title must match ENCODE vocabulary exactly
ENCODE_search_histone_experiments
(none)
histone_mark
,
biosample_term_name
,
limit
histone_mark: "H3K27ac", "H3K4me3", etc.
ENCODE_search_chromatin_accessibility
(none)
biosample_term_name
,
limit
Returns ATAC-seq and DNase-seq
ENCODE_get_experiment
accession
accession: "ENCSR..." format
ENCODE_search_annotations
(none)
annotation_type
,
biosample_term_name
,
limit
annotation_type: "candidate Cis-Regulatory Elements"
ENCODE_get_chromatin_state
(none)
biosample_term_name
,
limit
Returns ChromHMM segmentation
ENCODE_search_biosamples
(none)
term_name
,
biosample_type
,
limit
biosample_type: "cell line", "tissue", "primary cell"
UCSC_get_encode_cCREs
chrom
,
start
,
end
Coordinates in GRCh38; chrom format: "chr1"
RegulomeDB_query_variant
rsid
rsid format: "rs4994" (with rs prefix)

工具必填参数可选参数说明
jaspar_search_matrices
无(为空时返回所有结果)
name
,
species
,
collection
,
tax_id
,
page
,
page_size
使用
name
进行TF名称搜索
jaspar_get_matrix
matrix_id
必须使用完整版本号:"MA0139.1"而非"MA0139"
JASPAR_get_transcription_factors
page
,
page_size
分页返回;默认page_size=10
jaspar_get_matrix_versions
base_id
base_id为无版本号的ID(例如:"MA0139")
ENCODE_search_experiments
无(为空时返回所有结果)
assay_title
,
target
,
biosample_term_name
,
limit
assay_title必须严格匹配ENCODE的标准词汇
ENCODE_search_histone_experiments
histone_mark
,
biosample_term_name
,
limit
histone_mark取值:"H3K27ac"、"H3K4me3"等
ENCODE_search_chromatin_accessibility
biosample_term_name
,
limit
返回ATAC-seq和DNase-seq实验
ENCODE_get_experiment
accession
accession格式:"ENCSR..."
ENCODE_search_annotations
annotation_type
,
biosample_term_name
,
limit
annotation_type取值:"candidate Cis-Regulatory Elements"
ENCODE_get_chromatin_state
biosample_term_name
,
limit
返回ChromHMM分段数据
ENCODE_search_biosamples
term_name
,
biosample_type
,
limit
biosample_type取值:"cell line"、"tissue"、"primary cell"
UCSC_get_encode_cCREs
chrom
,
start
,
end
坐标基于GRCh38;chrom格式:"chr1"
RegulomeDB_query_variant
rsid
rsid格式:"rs4994"(需包含rs前缀)

Common Patterns

常见模式

Pattern 1: TF Binding Site Investigation

模式1:TF结合位点研究

Goal: Find where TF X binds and what motif it recognizes
Flow:
  1. jaspar_search_matrices(name="CTCF") -> get matrix_id
  2. jaspar_get_matrix(matrix_id) -> get full PWM, logo URL
  3. ENCODE_search_experiments(assay_title="TF ChIP-seq", target="CTCF") -> experimental binding data
  4. For specific tissue: add biosample_term_name="HepG2"
Output: Motif logo + experimental binding evidence
目标:查找TF X的结合位置及其识别的基序
流程:
  1. jaspar_search_matrices(name="CTCF") -> 获取matrix_id
  2. jaspar_get_matrix(matrix_id) -> 获取完整PWM、logo URL
  3. ENCODE_search_experiments(assay_title="TF ChIP-seq", target="CTCF") -> 实验结合数据
  4. 针对特定组织:添加biosample_term_name="HepG2"
输出:基序logo + 实验结合证据

Pattern 2: Regulatory Variant Interpretation

模式2:调控变异解读

Goal: Assess if variant rs1234567 affects gene regulation
Flow:
  1. RegulomeDB_query_variant(rsid="rs1234567") -> score + overlapping features
  2. If score <= 2b: ENCODE_search_experiments(target=overlapping_TF) -> experimental evidence
  3. UCSC_get_encode_cCREs(chrom, start, end) -> check if variant in known cCRE
Output: Regulatory score + supporting evidence + cCRE context
目标:评估变异rs1234567是否影响基因调控
流程:
  1. RegulomeDB_query_variant(rsid="rs1234567") -> 评分 + 重叠特征
  2. 如果评分 <= 2b:ENCODE_search_experiments(target=overlapping_TF) -> 实验证据
  3. UCSC_get_encode_cCREs(chrom, start, end) -> 检查变异是否位于已知cCRE中
输出:调控评分 + 支撑证据 + cCRE背景信息

Pattern 3: Cell-Type Regulatory Landscape

模式3:细胞类型调控图谱

Goal: Characterize active enhancers in a cell type
Flow:
  1. ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name="K562") -> active enhancers
  2. ENCODE_search_chromatin_accessibility(biosample_term_name="K562") -> open chromatin
  3. ENCODE_search_annotations(annotation_type="candidate Cis-Regulatory Elements", biosample_term_name="K562")
  4. ENCODE_get_chromatin_state(biosample_term_name="K562") -> ChromHMM states
Output: Active regulatory elements specific to the cell type
目标:表征某一细胞类型中活跃的增强子
流程:
  1. ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name="K562") -> 活跃增强子
  2. ENCODE_search_chromatin_accessibility(biosample_term_name="K562") -> 开放染色质
  3. ENCODE_search_annotations(annotation_type="candidate Cis-Regulatory Elements", biosample_term_name="K562")
  4. ENCODE_get_chromatin_state(biosample_term_name="K562") -> ChromHMM状态
输出:该细胞类型特有的活跃调控元件

Pattern 4: Gene Regulatory Region Mapping

模式4:基因调控区域定位

Goal: Find all regulatory elements near a gene
Flow:
  1. Get gene coordinates from MyGene_query_genes or ensembl_lookup_gene
  2. UCSC_get_encode_cCREs(chrom, start-50000, end+50000) -> nearby cCREs
  3. ENCODE_search_experiments(target=TF_OF_INTEREST) -> TF binding data
  4. jaspar_search_matrices(name=TF_NAME) -> motif for TF
Output: Map of regulatory elements around gene with evidence types

目标:查找某一基因附近的所有调控元件
流程:
  1. 通过MyGene_query_genes或ensembl_lookup_gene获取基因坐标
  2. UCSC_get_encode_cCREs(chrom, start-50000, end+50000) -> 附近的cCREs
  3. ENCODE_search_experiments(target=TF_OF_INTEREST) -> TF结合数据
  4. jaspar_search_matrices(name=TF_NAME) -> TF的基序
输出:基因周围的调控元件图谱及证据类型

Fallback Strategies

fallback策略

Primary ToolFallbackWhen
ENCODE_search_experiments
with specific biosample
Remove
biosample_term_name
filter
No results for specific tissue
jaspar_search_matrices(name=TF)
jaspar_search_matrices(name=TF_family)
TF not found by exact name
UCSC_get_encode_cCREs
ENCODE_search_annotations
without coordinates
If coordinates unknown
RegulomeDB_query_variant(rsid)
Use
ENCODE_search_experiments
+
JASPAR
to manually assess overlap
rsid not in RegulomeDB

主工具替代方案适用场景
指定生物样本的
ENCODE_search_experiments
移除
biosample_term_name
筛选条件
特定组织无结果时
jaspar_search_matrices(name=TF)
jaspar_search_matrices(name=TF_family)
无法通过精确名称找到TF时
UCSC_get_encode_cCREs
不指定坐标的
ENCODE_search_annotations
坐标未知时
RegulomeDB_query_variant(rsid)
使用
ENCODE_search_experiments
+
JASPAR
手动评估重叠情况
rsid不在RegulomeDB中时

Limitations

局限性

  • ENCODE TF ChIP-seq:
    assay_title="TF ChIP-seq"
    uses ENCODE's exact controlled vocabulary — avoid "ChIP-seq" (too general)
  • UCSC cCREs: Coordinates must be in GRCh38 (hg38); liftOver required for hg19 variants
  • RegulomeDB: Only scores variants with known rsIDs; novel variants not supported
  • JASPAR: Provides motif databases only — not genomic binding locations; combine with ENCODE for experimental evidence
  • ENCODE experiment results: The
    @graph
    field may be empty if query filters are too restrictive; relax filters and retry
  • ENCODE TF ChIP-seq
    assay_title="TF ChIP-seq"
    需使用ENCODE的精确受控词汇——避免使用"ChIP-seq"(过于宽泛)
  • UCSC cCREs:坐标必须基于GRCh38(hg38);hg19变异需使用liftOver转换
  • RegulomeDB:仅对已知rsID的变异进行评分;不支持新型变异
  • JASPAR:仅提供基序数据库——不包含基因组结合位置;需结合ENCODE获取实验证据
  • ENCODE实验结果:如果查询筛选条件过于严格,
    @graph
    字段可能为空;需放宽筛选条件并重试