tooluniverse-rare-disease-genomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

COMPUTE, DON'T DESCRIBE

计算而非描述

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
当分析需要计算(统计、数据处理、评分、富集分析)时,通过Bash编写并运行Python代码。不要描述你会做什么——直接执行并报告实际结果。使用ToolUniverse工具检索数据,然后使用Python(pandas、scipy、statsmodels、matplotlib)进行分析。

Rare Disease Genomics Research

罕见病基因组学研究

Rare Disease Investigation Strategy

罕见病调查策略

The order of investigation matters: phenotype -> disease -> gene -> variant, not the reverse. When starting from a gene, reverse it: gene -> diseases -> expected phenotypes -> does the patient match?
Resist the urge to skip to ClinVar immediately. A "Pathogenic" ClinVar entry is only meaningful if the gene is actually causative for the disease in question with the right inheritance mode.
调查顺序至关重要:表型 -> 疾病 -> 基因 -> 变异,而非反向流程。如果从基因开始,则反向操作:基因 -> 疾病 -> 预期表型 -> 患者是否匹配?
请勿急于直接使用ClinVar。只有当该基因确实是相关疾病的致病基因且符合正确遗传模式时,ClinVar中的“致病性”条目才有意义。

Variant Prioritization Reasoning (CRITICAL)

变异优先级推理(关键)

LOOK UP DON'T GUESS -- when uncertain about any gene, variant, or disease association, search the database. Do not rely on memory.
查找而非猜测——当对任何基因、变异或疾病关联不确定时,搜索数据库。不要依赖记忆。

How to filter thousands of variants down to one causal variant

如何将数千种变异筛选至一种致病变异

  1. Inheritance pattern first -- Check Orphanet_get_natural_history for inheritance mode. This determines your filtering strategy:
    • Autosomal dominant: look for heterozygous variants in ONE copy; de novo if unaffected parents
    • Autosomal recessive: need TWO hits (homozygous or compound heterozygous); check carrier parents
    • X-linked recessive: hemizygous males affected; carrier females usually unaffected
    • Mitochondrial: maternal inheritance only; heteroplasmy complicates penetrance
  2. Allele frequency filter -- Rare disease variants should be RARE in population:
    • Dominant diseases: allele frequency < 0.001 (1 in 1,000) in gnomAD
    • Recessive diseases: allele frequency < 0.01 (1 in 100) for carriers
    • Use gnomAD via Ensembl VEP annotation or OpenTargets variant info to check frequency
    • LOOK UP the actual frequency -- do not assume a variant is rare
  3. Consequence hierarchy -- Prioritize by predicted impact:
    • Loss-of-function (frameshift, nonsense, splice-site): strongest candidates
    • Missense in conserved domain: strong if in known functional domain
    • Synonymous / intronic: usually benign unless at splice junction
  4. ClinVar vs OMIM vs gnomAD -- when to check each:
    • ClinVar: "Is this specific variant known to be pathogenic?" Check review stars (>=2 stars = reliable)
    • OMIM (via Orphanet/GenCC): "Is this gene known to cause this disease?" Check BEFORE ClinVar
    • gnomAD (via VEP/OpenTargets): "Is this variant too common to cause a rare disease?" Check allele frequency
  5. Phenotype-genotype correlation -- After identifying a candidate gene:
    • Get HPO phenotypes for the associated disease (Orphanet_get_phenotypes)
    • Check: do the patient's features match the "Very frequent" phenotypes?
    • Mismatches in core features argue AGAINST the gene being causative
    • GenCC validity level tells you how strong the gene-disease link is overall
  1. 先看遗传模式——通过Orphanet_get_natural_history查询遗传模式。这将决定你的筛选策略:
    • 常染色体显性:寻找单拷贝杂合变异;若父母未患病则为新发变异
    • 常染色体隐性:需要两个变异位点(纯合或复合杂合);检查父母是否为携带者
    • X连锁隐性:半合子男性患病;携带者女性通常不患病
    • 线粒体遗传:仅母系遗传;异质性会降低外显率
  2. 等位基因频率筛选——罕见病变异在人群中应属于罕见类型:
    • 显性疾病:gnomAD中等位基因频率 < 0.001(千分之一)
    • 隐性疾病:携带者等位基因频率 < 0.01(百分之一)
    • 通过Ensembl VEP注释或OpenTargets变异信息查询gnomAD数据
    • 务必查询实际频率——不要假设变异是罕见的
  3. 变异影响层级——按预测影响优先级排序:
    • 功能丧失型(移码、无义、剪接位点):最有力的候选变异
    • 保守结构域中的错义变异:若位于已知功能域则为强候选
    • 同义/内含子变异:通常为良性,除非位于剪接 junction
  4. ClinVar vs OMIM vs gnomAD——何时使用各自工具
    • ClinVar:“该特定变异是否已知为致病性?”查看评分星级(≥2星=可靠)
    • OMIM(通过Orphanet/GenCC):“该基因是否已知会导致该疾病?”先于ClinVar查询
    • gnomAD(通过VEP/OpenTargets):“该变异是否过于常见,不足以导致罕见病?”查询等位基因频率
  5. 表型-基因型相关性——确定候选基因后:
    • 通过Orphanet_get_phenotypes获取相关疾病的HPO表型
    • 检查:患者的特征是否与“极常见”表型匹配?
    • 核心特征不匹配则说明该基因不太可能是致病基因
    • GenCC有效性等级可告知你基因-疾病关联的整体强度

When to Use

适用场景

  • "What is the genetic cause of Marfan syndrome?"
  • "Find HPO phenotypes associated with cystic fibrosis"
  • "What is the prevalence of Ehlers-Danlos syndrome?"
  • "Which genes are linked to this rare disease?"
  • "Is FBN1 definitively associated with Marfan syndrome?"
  • "Find pathogenic variants in CFTR"
  • "Are there clinical trials for Gaucher disease?"
  • "What diseases are associated with gene FBN1?"
  • “马凡综合征的遗传病因是什么?”
  • “查找与囊性纤维化相关的HPO表型”
  • “埃勒斯-丹洛斯综合征的患病率是多少?”
  • “哪些基因与该罕见病相关?”
  • “FBN1是否明确与马凡综合征相关?”
  • “查找CFTR中的致病性变异”
  • “戈谢病有相关临床试验吗?”
  • “基因FBN1与哪些疾病相关?”

NOT for (use other skills instead)

不适用场景(请使用其他技能)

  • Common disease genomics (type 2 diabetes, hypertension) -> Use
    tooluniverse-disease-research
  • Cancer variant interpretation -> Use
    tooluniverse-cancer-variant-interpretation
  • GWAS-based variant interpretation -> Use
    tooluniverse-gwas-snp-interpretation
  • Pharmacogenomics / drug-gene interactions -> Use
    tooluniverse-pharmacogenomics
  • Differential diagnosis from symptoms -> Use
    tooluniverse-rare-disease-diagnosis

  • 常见疾病基因组学(2型糖尿病、高血压)-> 使用
    tooluniverse-disease-research
  • 癌症变异解读 -> 使用
    tooluniverse-cancer-variant-interpretation
  • 基于GWAS的变异解读 -> 使用
    tooluniverse-gwas-snp-interpretation
  • 药物基因组学/药物-基因相互作用 -> 使用
    tooluniverse-pharmacogenomics
  • 基于症状的鉴别诊断 -> 使用
    tooluniverse-rare-disease-diagnosis

Workflow Overview

工作流程概述

Phase 0: Disambiguation (resolve to ORPHA code / HGNC symbol) -> Phase 1: Disease Characterization -> Phase 2: Phenotype Mapping (HPO) -> Phase 3: Causative Genes -> Phase 4: Gene-Disease Validity (GenCC) -> Phase 5: Pathogenic Variants (ClinVar) -> Phase 6: Epidemiology -> Phase 7: Clinical Trials -> Phase 8: Literature -> Phase 9: Report

阶段0:消歧(解析为ORPHA代码/HGNC符号)-> 阶段1:疾病特征描述 -> 阶段2:表型映射(HPO)-> 阶段3:致病基因发现 -> 阶段4:基因-疾病有效性评估(GenCC)-> 阶段5:致病性变异(ClinVar)-> 阶段6:流行病学 -> 阶段7:临床试验 -> 阶段8:文献 -> 阶段9:报告

Phase 0: Disambiguation

阶段0:消歧

Resolve user input to canonical Orphanet identifiers before doing anything else. Many disease names have subtypes or umbrella syndromes that will produce misleading results if you pick the wrong one.
Orphanet_Orphanet_search_diseases:
name
(string REQUIRED, e.g., "Marfan syndrome"),
exact
(bool, default False),
lang
(string, default "en"). Primary tool for name-to-ORPHA-code resolution. The parameter is
name
(NOT
query
). Returns multiple matches — select the exact disease, not subtypes or umbrella syndromes. "Marfan syndrome" should resolve to ORPHAcode 558, not 284993 ("Marfan syndrome and Marfan-related disorders").
Orphanet_search_diseases:
query
(string REQUIRED). Fallback if the primary tool returns no results.
Orphanet_get_gene_diseases:
gene_symbol
(string REQUIRED, e.g., "FBN1"). Use when starting from a gene. Returns all diseases associated with the gene, including association type.
Key identifier formats: disease codes are ORPHAcode integers (e.g., 558 for Marfan syndrome); gene identifiers are HGNC symbols (e.g., FBN1); phenotypes use HPO CURIE format (e.g., HP:0001519).

在进行任何操作前,先将用户输入解析为标准Orphanet标识符。许多疾病名称存在亚型或 umbrella综合征,若选错会产生误导性结果。
Orphanet_Orphanet_search_diseases
name
(字符串必填,例如“Marfan syndrome”),
exact
(布尔值,默认False),
lang
(字符串,默认“en”)。用于名称到ORPHA代码转换的主要工具。参数为
name
(而非
query
)。返回多个匹配结果——选择精确的疾病,而非亚型或umbrella综合征。“Marfan syndrome”应解析为ORPHAcode 558,而非284993(“Marfan syndrome and Marfan-related disorders”)。
Orphanet_search_diseases
query
(字符串必填)。若主工具无结果,可作为备选。
Orphanet_get_gene_diseases
gene_symbol
(字符串必填,例如“FBN1”)。从基因开始研究时使用。返回该基因关联的所有疾病及关联类型。
关键标识符格式:疾病代码为ORPHAcode整数(例如马凡综合征为558);基因标识符为HGNC符号(例如FBN1);表型使用HPO CURIE格式(例如HP:0001519)。

Phase 1: Disease Characterization

阶段1:疾病特征描述

Orphanet_get_disease:
orpha_code
(string REQUIRED, e.g., "558"). Returns the official Orphanet definition and synonym list.
Orphanet_get_classification:
orpha_code
(string REQUIRED). Shows which disease hierarchies include this condition (e.g., "rare genetic diseases", "rare ophthalmic disorders"). Useful for understanding what kind of rare disease this is.
Orphanet_get_natural_history:
orpha_code
(string REQUIRED). Returns
average_age_of_onset
and
type_of_inheritance
. Inheritance mode (autosomal dominant, X-linked recessive, etc.) is critical context for interpreting variant pathogenicity and family risk.
Orphanet_get_icd_mapping:
orpha_code
(string REQUIRED). Maps to ICD-10/ICD-11 for clinical coding contexts.

Orphanet_get_disease
orpha_code
(字符串必填,例如“558”)。返回Orphanet官方定义及同义词列表。
Orphanet_get_classification
orpha_code
(字符串必填)。显示该疾病所属的疾病层级(例如“罕见遗传性疾病”、“罕见眼科疾病”)。有助于了解该罕见病的类型。
Orphanet_get_natural_history
orpha_code
(字符串必填)。返回
average_age_of_onset
type_of_inheritance
。遗传模式(常染色体显性、X连锁隐性等)是解读变异致病性和家族风险的关键背景信息。
Orphanet_get_icd_mapping
orpha_code
(字符串必填)。映射至ICD-10/ICD-11,用于临床编码场景。

Phase 2: Phenotype Mapping (HPO)

阶段2:表型映射(HPO)

Orphanet_get_phenotypes:
orpha_code
(string REQUIRED). Returns HPO phenotypes with frequency labels and whether each is a formal diagnostic criterion.
Frequency should guide your interpretation: phenotypes marked "Very frequent (99-80%)" are core features present in nearly all patients and should be weighted heavily in differential diagnosis. "Frequent (79-30%)" are supporting features. "Occasional (29-5%)" reflect variable presentations. "Excluded (0%)" are active rule-out criteria — their presence argues against the diagnosis.
When a phenotype is marked
diagnostic_criteria: "Diagnostic criterion"
, it belongs to the formal diagnostic framework for that disease, not just a statistical association.
Orphanet_get_phenotypes
orpha_code
(字符串必填)。返回带有频率标签及是否为正式诊断标准的HPO表型。
频率应指导你的解读:标记为“极常见(99-80%)”的表型是几乎所有患者都具备的核心特征,在鉴别诊断中应给予高权重。“常见(79-30%)”为支持特征。“偶发(29-5%)”反映变异型表现。“排除(0%)”是明确的排除标准——若存在则不利于诊断。
当表型标记为
diagnostic_criteria: "Diagnostic criterion"
时,它属于该疾病的正式诊断框架,而非单纯的统计关联。

OLS for ontology lookups

OLS本体查询

When you need to look up an HPO term by description or resolve a CURIE to a label, use the OLS tools. Pass
ontology="hp"
to scope to HPO,
ontology="ordo"
for Orphanet terms,
ontology="mondo"
for MONDO disease terms.
ols_search_terms:
query
(string REQUIRED),
ontology
(string, optional),
rows
(int, alias
size
, default 10),
exact_match
(bool, default False).
ols_get_term_info:
term_id
(CURIE e.g., "HP:0001519") OR
term_iri
. Prefix-based ontology inference works automatically — "HP:" routes to hp, "MONDO:" to mondo, "ORDO:" to ordo.
ols_get_term_children / ols_get_term_ancestors:
term_id
or
term_iri
,
ontology
(optional). Useful for finding parent HPO categories or broadening/narrowing a phenotype search.

当需要通过描述查找HPO术语,或解析CURIE为标签时,使用OLS工具。传入
ontology="hp"
限定为HPO,
ontology="ordo"
用于Orphanet术语,
ontology="mondo"
用于MONDO疾病术语。
ols_search_terms
query
(字符串必填),
ontology
(字符串,可选),
rows
(整数,别名
size
,默认10),
exact_match
(布尔值,默认False)。
ols_get_term_info
term_id
(CURIE例如“HP:0001519”)或
term_iri
。基于前缀的本体推断自动生效——“HP:”指向hp,“MONDO:”指向mondo,“ORDO:”指向ordo。
ols_get_term_children / ols_get_term_ancestors
term_id
term_iri
ontology
(可选)。有助于查找父HPO类别或扩大/缩小表型搜索范围。

Phase 3: Causative Gene Discovery

阶段3:致病基因发现

Orphanet_get_genes:
orpha_code
(string REQUIRED, alias:
disease_id
). Returns genes with their association types and loci.
The association type is crucial. "Disease-causing germline mutation(s) in" means the gene is a confirmed cause — this is the primary diagnostic target. "Major susceptibility factor in" means risk factor with incomplete penetrance. "Candidate gene tested in" means preliminary and unconfirmed — do not report this as a causative gene without additional validation from GenCC or literature. "Modifying germline mutation in" means the gene modifies severity but does not cause the disease alone.
Do not treat all Orphanet gene associations equally. Always note the association type when reporting.

Orphanet_get_genes
orpha_code
(字符串必填,别名:
disease_id
)。返回带有关联类型和基因位点的基因列表。
关联类型至关重要。“Disease-causing germline mutation(s) in”表示该基因是确诊的致病基因——这是主要的诊断目标。“Major susceptibility factor in”表示该基因是外显率不完全的风险因素。“Candidate gene tested in”表示初步且未确认的候选基因——若无GenCC或文献的额外验证,请勿将其报告为致病基因。“Modifying germline mutation in”表示该基因仅会改变疾病严重程度,不会单独致病。
请勿将所有Orphanet基因关联视为同等重要。报告时务必注明关联类型。

Phase 4: Gene-Disease Validity Assessment

阶段4:基因-疾病有效性评估

GenCC aggregates independent assessments from multiple clinical labs and curation groups. The key insight is that consensus across submitters matters more than any single classification. A single submitter at "Definitive" is weaker than three independent submitters agreeing at "Strong."
GenCC_search_gene:
gene_symbol
(string REQUIRED, e.g., "FBN1"). Returns all disease associations with classifications and submitters.
GenCC_search_disease:
disease
(string REQUIRED, e.g., "Marfan syndrome"). Note: the parameter is
disease
(NOT
disease_title
). Returns all gene associations for the disease with validity levels.
Classification levels from strongest to weakest: Definitive > Strong > Moderate > Limited > No Known Disease Relationship > Disputed > Refuted > Animal Model Only. "Disputed" means conflicting evidence exists — do not report this as a valid association. "Refuted" means a previously claimed association was disproven.
When reporting GenCC results, always note: (1) the highest classification, (2) how many submitters agree, and (3) whether any submitters disagree. Three or more submitters at "Definitive" is very high confidence. A single submitter should always be flagged as requiring independent validation.

GenCC汇总了多个临床实验室和整理小组的独立评估。核心要点是提交者之间的共识比单一分类更重要。单个提交者的“确定”等级弱于三个独立提交者达成的“强”等级共识。
GenCC_search_gene
gene_symbol
(字符串必填,例如“FBN1”)。返回所有疾病关联及分类和提交者信息。
GenCC_search_disease
disease
(字符串必填,例如“Marfan syndrome”)。注意:参数为
disease
(而非
disease_title
)。返回该疾病的所有基因关联及有效性等级。
分类等级从强到弱:确定 > 强 > 中等 > 有限 > 无已知疾病关联 > 有争议 > 已驳斥 > 仅动物模型。“有争议”表示存在冲突证据——请勿将其报告为有效关联。“已驳斥”表示先前声称的关联已被推翻。
报告GenCC结果时,务必注明:(1) 最高分类等级,(2) 达成共识的提交者数量,(3) 是否存在持不同意见的提交者。三个及以上提交者达成“确定”等级表示极高置信度。单个提交者的结果应始终标记为需要独立验证。

Phase 5: Pathogenic Variant Lookup

阶段5:致病性变异查找

ClinVar_search_variants:
gene
(string, gene symbol),
condition
(string, disease name),
variant_id
(string),
clinical_significance
(string),
max_results
(int, default 20, alias
limit
). At least one of
gene
,
condition
, or
variant_id
is required. The primary parameter is
gene
(NOT
query
).
Combine
gene
+
condition
for disease-specific variant lookup. This narrows results to variants classified in the context of the specific disease, which matters for genes associated with multiple conditions.
Review status reflects confidence in the classification. "Practice guideline" (4 stars) and "reviewed by expert panel" (3 stars) represent the highest-confidence assertions. "Criteria provided, multiple submitters, no conflicts" (2 stars) is good. "Criteria provided, single submitter" (1 star) is moderate. "No assertion criteria provided" (0 stars) should be treated with caution.
Do not report VUS (Variant of Uncertain Significance) as disease-causing. VUS means the evidence is insufficient to classify — it is not "probably pathogenic." The default returns 20 variants; check
total_count
to understand the full scope of pathogenic variants in the gene.
ClinVar_get_variant_details:
variant_id
(REQUIRED). Retrieves full details for a specific ClinVar variant.
ClinVar_get_clinical_significance:
variant_id
(REQUIRED). Returns the clinical significance summary with submitter count.

ClinVar_search_variants
gene
(字符串,基因符号),
condition
(字符串,疾病名称),
variant_id
(字符串),
clinical_significance
(字符串),
max_results
(整数,默认20,别名
limit
)。
gene
condition
variant_id
至少必填一项。主要参数为
gene
(而非
query
)。
结合
gene
+
condition
进行疾病特异性变异查找。这会将结果限定为特定疾病背景下分类的变异,这对于与多种疾病相关的基因至关重要。
评审状态反映分类的置信度。“实践指南”(4星)和“专家小组评审”(3星)代表最高置信度的断言。“提供标准,多个提交者,无冲突”(2星)为良好。“提供标准,单个提交者”(1星)为中等。“未提供断言标准”(0星)应谨慎对待。
请勿将VUS(意义未明变异)报告为致病变异。VUS表示证据不足以进行分类——并非“可能致病性”。默认返回20个变异;查看
total_count
以了解该基因中致病性变异的完整范围。
ClinVar_get_variant_details
variant_id
(必填)。检索特定ClinVar变异的完整详情。
ClinVar_get_clinical_significance
variant_id
(必填)。返回带有提交者数量的临床意义摘要。

Phase 6: Epidemiology

阶段6:流行病学

Orphanet_get_epidemiology:
orpha_code
(string REQUIRED). Returns prevalence estimates by type (point prevalence, annual incidence, birth prevalence), geographic region, and source.
Prevalence below 1 in 2,000 is the EU/US regulatory threshold for "rare disease." Below 1 in 100,000 is uncommon. Below 1 in 1,000,000 is ultra-rare. These distinctions matter for clinical trial feasibility, natural history study design, and regulatory pathway discussions.
Prevalence data can vary significantly by geography (founder effects, consanguinity rates, ascertainment) and may be outdated. Always report the geographic scope and source year when citing prevalence figures.
Orphanet_get_natural_history: (also useful in Phase 1) Returns age of onset and inheritance pattern — essential context for patient counseling and family risk.

Orphanet_get_epidemiology
orpha_code
(字符串必填)。返回按类型(点患病率、年发病率、出生患病率)、地理区域和来源划分的患病率估计值。
患病率低于1/2000是欧盟/美国对“罕见病”的监管阈值。低于1/100000为不常见。低于1/1000000为极罕见。这些区分对临床试验可行性、自然病史研究设计和监管路径讨论至关重要。
患病率数据可能因地理差异(奠基者效应、近亲结婚率、确诊率)而显著不同,且可能过时。引用患病率数据时务必报告地理范围和来源年份。
Orphanet_get_natural_history:(在阶段1也有用)返回发病年龄和遗传模式——这是患者咨询和家族风险评估的重要背景信息。

Phase 6b: Metabolite-Disease Context (IEM)

阶段6b:代谢物-疾病关联(IEM)

For inborn errors of metabolism (IEM), link metabolite accumulation to disease using HMDB.
HMDB_search:
query
(string REQUIRED, compound name or formula). Find HMDB IDs for metabolites.
HMDB_get_metabolite:
hmdb_id
(string) OR
compound_name
(string). Returns cross-database IDs (KEGG, ChEBI, PubChem) for downstream pathway analysis.
HMDB_get_diseases:
hmdb_id
(string) OR
compound_name
(string). Returns disease associations backed by CTD. Use to confirm which diseases are linked to metabolite accumulation.

对于先天性代谢错误(IEM),使用HMDB将代谢物积累与疾病关联起来。
HMDB_search
query
(字符串必填,化合物名称或分子式)。查找代谢物的HMDB ID。
HMDB_get_metabolite
hmdb_id
(字符串)或
compound_name
(字符串)。返回跨数据库ID(KEGG、ChEBI、PubChem),用于下游通路分析。
HMDB_get_diseases
hmdb_id
(字符串)或
compound_name
(字符串)。返回由CTD支持的疾病关联。用于确认哪些疾病与代谢物积累相关。

Phase 7: Clinical Trials

阶段7:临床试验

search_clinical_trials:
query_term
(string REQUIRED),
condition
(string, optional),
intervention
(string, optional),
pageSize
(int, optional, default 10).
For rare diseases, even observational natural history studies are valuable — they characterize disease progression and identify biomarkers. Prioritize recruiting trials, then active-not-recruiting, then recently completed. Phase 2-3 trials are most clinically relevant. Check
len(studies) > 0
rather than
total_count
— the latter can be None even when studies exist.

search_clinical_trials
query_term
(字符串必填),
condition
(字符串,可选),
intervention
(字符串,可选),
pageSize
(整数,可选,默认10)。
对于罕见病,即使是观察性自然病史研究也很有价值——它们可描述疾病进展并识别生物标志物。优先选择招募中的试验,其次是活跃但不招募的试验,然后是近期完成的试验。2-3期试验与临床最相关。检查
len(studies) > 0
而非
total_count
——后者可能为None但实际存在研究。

Phase 8: Literature

阶段8:文献

EuropePMC_search_articles:
query
(string REQUIRED, e.g., "Marfan syndrome genetics"),
limit
(int, optional, default 10).
Use disease name + "genetics" or "gene" for genetic literature. For variant-specific evidence, add the gene symbol and variant. For genotype-phenotype correlations, add "genotype phenotype." Returns most recent articles first. HTML entities may appear in titles — strip for display.

EuropePMC_search_articles
query
(字符串必填,例如“Marfan syndrome genetics”),
limit
(整数,可选,默认10)。
使用疾病名称 + “genetics”或“gene”查找遗传学文献。针对变异特异性证据,添加基因符号和变异。针对基因型-表型相关性,添加“genotype phenotype”。返回最新文章优先。标题中可能出现HTML实体——显示时需去除。

Evidence Grading

证据分级

When synthesizing across phases, grade your confidence:
Tier 1 (Definitive): GenCC Definitive from multiple submitters + ClinVar expert-reviewed pathogenic variants + Orphanet "Disease-causing germline mutation(s) in" assessed association. Example: FBN1 causing Marfan syndrome.
Tier 2 (Strong): GenCC Strong + ClinVar single-submitter pathogenic variants + Orphanet disease-causing. Strong but less replicated evidence.
Tier 3 (Moderate): GenCC Limited or Moderate + ClinVar VUS + Orphanet candidate gene. Emerging associations requiring further validation.
Tier 4 (Preliminary): Literature only, animal models, or no GenCC/ClinVar data. Genes from case studies without independent replication.

综合各阶段结果时,对置信度进行分级:
1级(确定):GenCC多个提交者的确定等级 + ClinVar专家评审的致病性变异 + Orphanet评估为“Disease-causing germline mutation(s) in”的关联。示例:FBN1导致马凡综合征。
2级(强):GenCC强等级 + ClinVar单个提交者的致病性变异 + Orphanet致病基因。证据有力但复制性较弱。
3级(中等):GenCC有限或中等等级 + ClinVar VUS + Orphanet候选基因。新兴关联需进一步验证。
4级(初步):仅文献、动物模型,或无GenCC/ClinVar数据。来自病例研究的基因,无独立重复验证。

Fallback Strategies

备选策略

When a primary tool fails or returns no results:
  • Disease lookup: try
    Orphanet_search_diseases
    if
    Orphanet_Orphanet_search_diseases
    fails
  • Gene → diseases:
    GenCC_search_gene
    has broader coverage than
    Orphanet_get_gene_diseases
  • Disease → genes:
    GenCC_search_disease
    as complement to
    Orphanet_get_genes
  • Gene-disease validity: Orphanet AssociationType + SourceOfValidation PMIDs if GenCC has no submissions
  • Pathogenic variants: EuropePMC literature search if ClinVar has no entries
  • Epidemiology: literature search for prevalence studies if Orphanet data is absent

当主工具失败或无结果时:
  • 疾病查找:若
    Orphanet_Orphanet_search_diseases
    失败,尝试
    Orphanet_search_diseases
  • 基因→疾病:
    GenCC_search_gene
    的覆盖范围比
    Orphanet_get_gene_diseases
    更广
  • 疾病→基因:
    GenCC_search_disease
    作为
    Orphanet_get_genes
    的补充
  • 基因-疾病有效性:若GenCC无提交数据,使用Orphanet AssociationType + SourceOfValidation PMIDs
  • 致病性变异:若ClinVar无条目,使用EuropePMC文献搜索
  • 流行病学:若Orphanet无数据,搜索文献中的患病率研究

Example Workflows

示例工作流程

Full Rare Disease Investigation (disease name input)

完整罕见病调查(输入疾病名称)

1. Orphanet_Orphanet_search_diseases(name="Marfan syndrome") -> ORPHAcode 558
2. Orphanet_get_disease(orpha_code="558") -> definition, synonyms
3. Orphanet_get_phenotypes(orpha_code="558") -> HPO phenotypes with frequencies
4. Orphanet_get_genes(orpha_code="558") -> FBN1 (disease-causing), TGFBR1, TGFBR2
5. GenCC_search_gene(gene_symbol="FBN1") -> Definitive from ClinGen, Ambry, Invitae
6. ClinVar_search_variants(gene="FBN1", clinical_significance="Pathogenic", max_results=50)
7. Orphanet_get_epidemiology(orpha_code="558") -> 1-5/10,000 worldwide
8. search_clinical_trials(query_term="Marfan syndrome", pageSize=10)
9. EuropePMC_search_articles(query="Marfan syndrome genetics", limit=5)
1. Orphanet_Orphanet_search_diseases(name="Marfan syndrome") -> ORPHAcode 558
2. Orphanet_get_disease(orpha_code="558") -> definition, synonyms
3. Orphanet_get_phenotypes(orpha_code="558") -> HPO phenotypes with frequencies
4. Orphanet_get_genes(orpha_code="558") -> FBN1 (disease-causing), TGFBR1, TGFBR2
5. GenCC_search_gene(gene_symbol="FBN1") -> Definitive from ClinGen, Ambry, Invitae
6. ClinVar_search_variants(gene="FBN1", clinical_significance="Pathogenic", max_results=50)
7. Orphanet_get_epidemiology(orpha_code="558") -> 1-5/10,000 worldwide
8. search_clinical_trials(query_term="Marfan syndrome", pageSize=10)
9. EuropePMC_search_articles(query="Marfan syndrome genetics", limit=5)

Gene-First Investigation (starting from a gene)

基因优先调查(从基因开始)

1. Orphanet_get_gene_diseases(gene_symbol="FBN1") -> all associated diseases
2. GenCC_search_gene(gene_symbol="FBN1") -> validity classifications per disease
3. For top disease: Orphanet_get_phenotypes + Orphanet_get_epidemiology
4. ClinVar_search_variants(gene="FBN1", clinical_significance="Pathogenic")

1. Orphanet_get_gene_diseases(gene_symbol="FBN1") -> all associated diseases
2. GenCC_search_gene(gene_symbol="FBN1") -> validity classifications per disease
3. For top disease: Orphanet_get_phenotypes + Orphanet_get_epidemiology
4. ClinVar_search_variants(gene="FBN1", clinical_significance="Pathogenic")

Common Mistakes to Avoid

需避免的常见错误

  • Using
    disease_title
    in GenCC_search_disease: use
    disease
    instead
  • Using
    query
    in Orphanet_Orphanet_search_diseases: use
    name
  • Using
    query
    in ClinVar_search_variants: use
    gene
    ,
    condition
    , or
    variant_id
  • Assuming the first Orphanet search result is the right disease: always check for subtypes
  • Treating ClinVar VUS as pathogenic evidence
  • Treating Orphanet "Candidate gene tested in" as a confirmed causative gene
  • Ignoring GenCC submitter count: single-submitter Definitive is weaker than multi-submitter consensus

  • 在GenCC_search_disease中使用
    disease_title
    :请使用
    disease
  • 在Orphanet_Orphanet_search_diseases中使用
    query
    :请使用
    name
  • 在ClinVar_search_variants中使用
    query
    :请使用
    gene
    condition
    variant_id
  • 假设Orphanet搜索的第一个结果是正确疾病:务必检查是否存在亚型
  • 将ClinVar VUS视为致病性证据
  • 将Orphanet的“Candidate gene tested in”视为确诊致病基因
  • 忽略GenCC提交者数量:单个提交者的确定等级弱于多提交者共识

Limitations

局限性

  • Orphanet covers rare diseases only; common diseases may have minimal entries
  • ClinVar returns up to 20 variants by default; paginated retrieval is limited
  • GenCC submissions may lag behind the latest literature
  • Some ultra-rare diseases have no GenCC submissions, no ClinVar variants, and no clinical trials

  • Orphanet仅覆盖罕见病;常见疾病的条目可能极少
  • ClinVar默认最多返回20个变异;分页检索受限
  • GenCC提交数据可能滞后于最新文献
  • 部分极罕见病无GenCC提交、无ClinVar变异且无临床试验

Completeness Checklist

完整性检查清单

  • Disease resolved to correct ORPHA code (not a subtype or umbrella)
  • Causative genes identified with association types; GenCC validity assessed
  • ClinVar variants checked with review status; VUS NOT reported as pathogenic
  • Inheritance pattern checked BEFORE interpreting variants
  • Epidemiology, clinical trials, and literature included
  • Evidence graded by tier (T1-T4)
  • 疾病已解析为正确的ORPHA代码(非亚型或umbrella综合征)
  • 已识别致病基因及关联类型;已评估GenCC有效性
  • 已检查ClinVar变异及评审状态;未将VUS报告为致病性
  • 解读变异前已检查遗传模式
  • 已包含流行病学、临床试验和文献信息
  • 已按层级(T1-T4)对证据进行分级 ",