tooluniverse-cancer-genomics-tcga

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cancer Genomics / TCGA Analysis

癌症基因组学 / TCGA分析

TCGA analysis starts with: what cancer type? what data type? Build your cohort FIRST (GDC filters), then analyze. Don't query mutations without defining the cohort — pan-cancer counts from
GDC_get_mutation_frequency
are uninformative without cancer-type context. A mutation frequency of 10% in one cancer type may be 0.5% in another; always specify
project_id
. Survival analysis (Kaplan-Meier) is hypothesis-generating in retrospective TCGA data — always report sample size and p-value, and note that TCGA cohorts are not treatment-stratified.
LOOK UP DON'T GUESS: never assume TCGA project IDs, NCIt codes, or gene coordinates — use
GDC_list_projects
to confirm project IDs and
Progenetix_list_filtering_terms
for NCIt codes.
Systematic TCGA/GDC analysis: define cohorts, retrieve clinical data, profile somatic mutations, query copy number variations, run survival analysis, and interpret variants with OncoKB.
TCGA分析始于:何种癌症类型?何种数据类型? 首先构建你的队列(使用GDC筛选器),然后再进行分析。不要在未定义队列的情况下查询突变——
GDC_get_mutation_frequency
得出的泛癌症计数如果没有癌症类型背景信息是没有意义的。某一癌症类型中10%的突变频率在另一癌症类型中可能仅为0.5%;请始终指定
project_id
。回顾性TCGA数据中的生存分析(Kaplan-Meier)用于生成假设——请始终报告样本量和p值,并注意TCGA队列未按治疗分层。
查资料,别猜测:永远不要假设TCGA项目ID、NCIt编码或基因坐标——使用
GDC_list_projects
确认项目ID,使用
Progenetix_list_filtering_terms
获取NCIt编码。
系统化TCGA/GDC分析流程:定义队列、检索临床数据、分析体细胞突变、查询拷贝数变异、进行生存分析,并通过OncoKB解读变异。

When to Use

适用场景

  • "What is the mutation frequency of TP53 in TCGA-BRCA?"
  • "Get survival data for TCGA-LUAD patients"
  • "Find clinical data for breast cancer cases in GDC"
  • "Which TCGA projects have KRAS G12C mutations?"
  • "Show CNV amplifications of EGFR in glioblastoma"
  • "Annotate BRAF V600E for clinical significance in melanoma"
  • "TCGA-BRCA中TP53的突变频率是多少?"
  • "获取TCGA-LUAD患者的生存数据"
  • "查找GDC中乳腺癌病例的临床数据"
  • "哪些TCGA项目存在KRAS G12C突变?"
  • "展示胶质母细胞瘤中EGFR的CNV扩增情况"
  • "注释BRAF V600E在黑色素瘤中的临床意义"

NOT for (use other skills instead)

不适用场景(请使用其他技能)

  • Precision oncology treatment recommendations -> Use
    tooluniverse-precision-oncology
  • Rare disease gene discovery -> Use
    tooluniverse-rare-disease-genomics
  • GWAS variant interpretation -> Use
    tooluniverse-gwas-snp-interpretation

  • 精准肿瘤治疗建议 -> 使用
    tooluniverse-precision-oncology
  • 罕见病基因发现 -> 使用
    tooluniverse-rare-disease-genomics
  • GWAS变异解读 -> 使用
    tooluniverse-gwas-snp-interpretation

Workflow Overview

工作流概述

Input (cancer type / gene / TCGA project ID)
  |
  v
Phase 1: Study Selection  -- GDC_list_projects, GDC_search_cases
  |
  v
Phase 2: Clinical Data    -- GDC_get_clinical_data
  |
  v
Phase 3: Somatic Mutations -- GDC_get_ssm_by_gene, GDC_get_mutation_frequency
  |
  v
Phase 4: CNV Analysis     -- Progenetix_cnv_search, Progenetix_search_biosamples
  |
  v
Phase 5: Survival Analysis -- GDC_get_survival
  |
  v
Phase 6: Variant Interpretation -- OncoKB_annotate_variant

输入(癌症类型 / 基因 / TCGA项目ID)
  |
  v
阶段1:研究选择  -- GDC_list_projects, GDC_search_cases
  |
  v
阶段2:临床数据    -- GDC_get_clinical_data
  |
  v
阶段3:体细胞突变 -- GDC_get_ssm_by_gene, GDC_get_mutation_frequency
  |
  v
阶段4:CNV分析     -- Progenetix_cnv_search, Progenetix_search_biosamples
  |
  v
阶段5:生存分析 -- GDC_get_survival
  |
  v
阶段6:变异解读 -- OncoKB_annotate_variant

Key Identifiers

关键标识符

Data TypeFormatExample
GDC projectTCGA-{ABBREV}TCGA-BRCA, TCGA-LUAD, TCGA-SKCM
GDC caseUUID3c6ef4c1-...
NCIt cancer codeNCIT:C######NCIT:C4017 (breast), NCIT:C3058 (GBM)
RefSeq chromosomerefseq:NC_######refseq:NC_000007.14 (chr7)
数据类型格式示例
GDC项目TCGA-{缩写}TCGA-BRCA, TCGA-LUAD, TCGA-SKCM
GDC病例UUID3c6ef4c1-...
NCIt癌症编码NCIT:C######NCIT:C4017(乳腺癌), NCIT:C3058(胶质母细胞瘤)
RefSeq染色体refseq:NC_######refseq:NC_000007.14(第7号染色体)

Common TCGA Project IDs

常见TCGA项目ID

CancerProject IDNCIt Code
BreastTCGA-BRCANCIT:C4017
Lung adenocarcinomaTCGA-LUADNCIT:C3512
GlioblastomaTCGA-GBMNCIT:C3058
MelanomaTCGA-SKCMNCIT:C3510
ColorectalTCGA-COADNCIT:C4349
OvarianTCGA-OVNCIT:C4908
ProstateTCGA-PRADNCIT:C7378

癌症类型项目IDNCIt编码
乳腺癌TCGA-BRCANCIT:C4017
肺腺癌TCGA-LUADNCIT:C3512
胶质母细胞瘤TCGA-GBMNCIT:C3058
黑色素瘤TCGA-SKCMNCIT:C3510
结直肠癌TCGA-COADNCIT:C4349
卵巢癌TCGA-OVNCIT:C4908
前列腺癌TCGA-PRADNCIT:C7378

Phase 1: Study Selection

阶段1:研究选择

GDC_list_projects: No params required. Returns all GDC/TCGA projects with case counts.
  • Use to browse available projects and map cancer types to project IDs.
GDC_search_cases:
project_id
(string, e.g., "TCGA-BRCA"),
size
(int, default 10),
offset
(int). Returns case UUIDs and basic metadata.
  • Use to confirm a project exists and retrieve case counts before deeper queries.

GDC_list_projects:无需参数。返回所有带病例数的GDC/TCGA项目。
  • 用于浏览可用项目,并将癌症类型映射到项目ID。
GDC_search_cases
project_id
(字符串,例如"TCGA-BRCA")、
size
(整数,默认10)、
offset
(整数)。 返回病例UUID和基础元数据。
  • 用于确认项目存在,并在深入查询前检索病例数。

Phase 2: Clinical Data

阶段2:临床数据

GDC_get_clinical_data:
project_id
(string),
primary_site
(string, e.g., "Breast"),
disease_type
(string),
vital_status
("Alive" or "Dead"),
gender
("female"/"male"),
size
(int, 1-100),
offset
(int). Returns
{status, data: [{case_id, demographics: {gender, race, ethnicity, vital_status, age_at_index}, diagnoses: [{primary_diagnosis, tumor_stage, age_at_diagnosis, days_to_last_follow_up}], treatments: [{therapeutic_agents, treatment_type}]}]}
.
  • Use
    project_id
    + optional filters to retrieve patient-level clinical attributes.
  • age_at_diagnosis
    is in days; divide by 365.25 for years.
  • Multiple diagnoses or treatments per case are possible.
python
undefined
GDC_get_clinical_data
project_id
(字符串)、
primary_site
(字符串,例如"Breast")、
disease_type
(字符串)、
vital_status
("Alive"或"Dead")、
gender
("female"/"male")、
size
(整数,1-100)、
offset
(整数)。 返回
{status, data: [{case_id, demographics: {gender, race, ethnicity, vital_status, age_at_index}, diagnoses: [{primary_diagnosis, tumor_stage, age_at_diagnosis, days_to_last_follow_up}], treatments: [{therapeutic_agents, treatment_type}]}]}
  • 使用
    project_id
    加可选筛选器检索患者级别的临床属性。
  • age_at_diagnosis
    以天为单位;除以365.25转换为年。
  • 单个病例可能存在多个诊断或治疗记录。
python
undefined

Get clinical data for deceased BRCA patients

获取已故BRCA患者的临床数据

result = tu.tools.GDC_get_clinical_data( project_id="TCGA-BRCA", vital_status="Dead", size=50 )

---
result = tu.tools.GDC_get_clinical_data( project_id="TCGA-BRCA", vital_status="Dead", size=50 )

---

Phase 3: Somatic Mutations

阶段3:体细胞突变

GDC_get_mutation_frequency:
gene_symbol
(string REQUIRED, alias:
gene
). Returns pan-cancer SSM occurrence count.
  • Returns TOTAL count across all TCGA; no per-project breakdown.
  • For cancer-specific data, use
    GDC_get_ssm_by_gene
    with
    project_id
    .
GDC_get_ssm_by_gene:
gene_symbol
(string REQUIRED),
project_id
(string, optional),
size
(int, 1-100). Returns
{status, data: [{ssm_id, mutation_type, genomic_dna_change, aa_change, consequence_type}]}
.
  • mutation_type
    : "Single base substitution", "Insertion", "Deletion".
  • aa_change
    : amino acid change notation (e.g., "Val600Glu").
python
undefined
GDC_get_mutation_frequency
gene_symbol
(必填字符串,别名:
gene
)。返回泛癌症SSM发生计数。
  • 返回所有TCGA中的总计数;无按项目细分的数据。
  • 如需癌症特异性数据,请结合
    project_id
    使用
    GDC_get_ssm_by_gene
GDC_get_ssm_by_gene
gene_symbol
(必填字符串)、
project_id
(字符串,可选)、
size
(整数,1-100)。 返回
{status, data: [{ssm_id, mutation_type, genomic_dna_change, aa_change, consequence_type}]}
  • mutation_type
    :"Single base substitution"(单碱基替换)、"Insertion"(插入)、"Deletion"(缺失)。
  • aa_change
    :氨基酸变化表示法(例如"Val600Glu")。
python
undefined

TP53 mutations in lung adenocarcinoma

肺腺癌中的TP53突变

mutations = tu.tools.GDC_get_ssm_by_gene( gene_symbol="TP53", project_id="TCGA-LUAD", size=50 )

---
mutations = tu.tools.GDC_get_ssm_by_gene( gene_symbol="TP53", project_id="TCGA-LUAD", size=50 )

---

Phase 4: CNV Analysis (Progenetix)

阶段4:CNV分析(Progenetix)

Progenetix_search_biosamples:
filters
(string REQUIRED, NCIt code e.g., "NCIT:C4017"),
limit
(int),
skip
(int). Returns
{status, data: {biosamples: [{biosample_id, histological_diagnosis, pathological_stage, external_references}]}}
.
  • Use to find samples with CNV profiles for a given cancer type.
Progenetix_cnv_search:
reference_name
(string REQUIRED, RefSeq accession),
start
(int REQUIRED, GRCh38 1-based),
end
(int REQUIRED),
variant_type
("DUP"/"DEL"),
filters
(string, NCIt code),
limit
(int). Returns biosamples with CNV in the specified genomic region.
  • variant_type="DUP"
    for amplification,
    "DEL"
    for deletion.
  • Use
    filters
    to restrict to a cancer type.
python
undefined
Progenetix_search_biosamples
filters
(必填字符串,NCIt编码例如"NCIT:C4017")、
limit
(整数)、
skip
(整数)。 返回
{status, data: {biosamples: [{biosample_id, histological_diagnosis, pathological_stage, external_references}]}}
  • 用于查找具有特定癌症类型CNV谱的样本。
Progenetix_cnv_search
reference_name
(必填字符串,RefSeq登录号)、
start
(必填整数,GRCh38 1-based)、
end
(必填整数)、
variant_type
("DUP"/"DEL")、
filters
(字符串,NCIt编码)、
limit
(整数)。 返回指定基因组区域存在CNV的生物样本。
  • variant_type="DUP"
    表示扩增,
    "DEL"
    表示缺失。
  • 使用
    filters
    限制为特定癌症类型。
python
undefined

EGFR amplifications (chr7:55019017-55211628) in breast cancer

乳腺癌中EGFR的扩增(chr7:55019017-55211628)

result = tu.tools.Progenetix_cnv_search( reference_name="refseq:NC_000007.14", start=55019017, end=55211628, variant_type="DUP", filters="NCIT:C4017", limit=10 )

**Progenetix_list_filtering_terms**: No params. Returns all available NCIt codes and labels.
- Use when you need to find the NCIt code for a cancer type.

**Progenetix_list_cohorts**: No params. Returns named cohorts available in Progenetix.

---
result = tu.tools.Progenetix_cnv_search( reference_name="refseq:NC_000007.14", start=55019017, end=55211628, variant_type="DUP", filters="NCIT:C4017", limit=10 )

**Progenetix_list_filtering_terms**:无参数。返回所有可用的NCIt编码和标签。
- 当你需要查找某一癌症类型的NCIt编码时使用。

**Progenetix_list_cohorts**:无参数。返回Progenetix中可用的命名队列。

---

Phase 5: Survival Analysis

阶段5:生存分析

GDC_get_survival:
project_id
(string REQUIRED, e.g., "TCGA-BRCA"),
gene_symbol
(string, optional -- filters to mutated cases). Returns
{status, data: {donors: [{id, time, censored, survivalEstimate}], overallStats: {pValue}}}
.
  • Each donor has
    time
    (days),
    censored
    (bool: False=death event, True=censored), and
    survivalEstimate
    .
  • overallStats.pValue
    : log-rank p-value (present when
    gene_symbol
    splits cohort).
  • Without
    gene_symbol
    : returns full-cohort survival curve.
  • With
    gene_symbol
    : returns survival split by mutation status (mutated vs. wild-type).
python
undefined
GDC_get_survival
project_id
(必填字符串,例如"TCGA-BRCA")、
gene_symbol
(字符串,可选——筛选出突变病例)。 返回
{status, data: {donors: [{id, time, censored, survivalEstimate}], overallStats: {pValue}}}
  • 每个捐赠者包含
    time
    (天数)、
    censored
    (布尔值:False=死亡事件,True=截尾)和
    survivalEstimate
  • overallStats.pValue
    :log-rank p值(当
    gene_symbol
    用于拆分队列时提供)。
  • gene_symbol
    :返回整个队列的生存曲线。
  • gene_symbol
    :按突变状态(突变型vs野生型)拆分返回生存数据。
python
undefined

Survival for TCGA-BRCA split by TP53 mutation

按TP53突变状态拆分的TCGA-BRCA生存数据

surv = tu.tools.GDC_get_survival(project_id="TCGA-BRCA", gene_symbol="TP53") pval = surv["data"]["overallStats"]["pValue"]

---
surv = tu.tools.GDC_get_survival(project_id="TCGA-BRCA", gene_symbol="TP53") pval = surv["data"]["overallStats"]["pValue"]

---

Phase 6: Variant Interpretation (OncoKB)

阶段6:变异解读(OncoKB)

OncoKB_annotate_variant:
gene
(string, alias
gene_symbol
),
variant
(string, alias
alteration
, e.g., "V600E"),
tumor_type
(string, OncoTree code e.g., "MEL"). Returns
{status, data: {oncogenic, mutationEffect, highestSensitiveLevel, treatments: [{drugs, level, indication}]}}
.
  • oncogenic
    : "Oncogenic", "Likely Oncogenic", "Neutral", "Inconclusive", "Unknown".
  • highestSensitiveLevel
    : FDA approval level ("LEVEL_1"=FDA-approved, "LEVEL_2"=standard of care, etc.).
  • Demo mode available for BRAF, TP53, ROS1 without API key.
  • Set ONCOKB_API_TOKEN for full access.
python
undefined
OncoKB_annotate_variant
gene
(字符串,别名
gene_symbol
)、
variant
(字符串,别名
alteration
,例如"V600E")、
tumor_type
(字符串,OncoTree编码例如"MEL")。 返回
{status, data: {oncogenic, mutationEffect, highestSensitiveLevel, treatments: [{drugs, level, indication}]}}
  • oncogenic
    :"Oncogenic"(致癌性)、"Likely Oncogenic"(可能致癌)、"Neutral"(中性)、"Inconclusive"(不确定)、"Unknown"(未知)。
  • highestSensitiveLevel
    :FDA批准级别("LEVEL_1"=FDA批准,"LEVEL_2"=标准治疗等)。
  • 无需API密钥即可使用BRAF、TP53、ROS1的演示模式。
  • 设置ONCOKB_API_TOKEN以获得完整访问权限。
python
undefined

Annotate KRAS G12C in lung adenocarcinoma

注释肺腺癌中的KRAS G12C

result = tu.tools.OncoKB_annotate_variant( gene="KRAS", variant="G12C", tumor_type="LUAD" )

---
result = tu.tools.OncoKB_annotate_variant( gene="KRAS", variant="G12C", tumor_type="LUAD" )

---

Tool Quick Reference

工具速查

ToolKey ParamsReturns
GDC_list_projects(none)All TCGA/GDC projects with counts
GDC_search_cases
project_id
,
size
,
offset
Case UUIDs + metadata
GDC_get_clinical_data
project_id
,
vital_status
,
gender
,
size
Demographics + diagnoses + treatments
GDC_get_mutation_frequency
gene_symbol
(alias:
gene
)
Pan-cancer SSM count
GDC_get_ssm_by_gene
gene_symbol
,
project_id
,
size
Per-mutation records with aa_change
GDC_get_survival
project_id
,
gene_symbol
(optional)
Kaplan-Meier donor array + pValue
Progenetix_search_biosamples
filters
(NCIt code),
limit
Biosample records
Progenetix_cnv_search
reference_name
,
start
,
end
,
variant_type
,
filters
Biosamples with CNV in region
Progenetix_list_filtering_terms(none)All NCIt codes in Progenetix
OncoKB_annotate_variant
gene
,
variant
,
tumor_type
Oncogenicity + treatments

工具关键参数返回内容
GDC_list_projects所有带病例数的TCGA/GDC项目
GDC_search_cases
project_id
,
size
,
offset
病例UUID + 元数据
GDC_get_clinical_data
project_id
,
vital_status
,
gender
,
size
人口统计学数据 + 诊断 + 治疗记录
GDC_get_mutation_frequency
gene_symbol
(别名:
gene
泛癌症SSM计数
GDC_get_ssm_by_gene
gene_symbol
,
project_id
,
size
带氨基酸变化的单个突变记录
GDC_get_survival
project_id
,
gene_symbol
(可选)
Kaplan-Meier捐赠者数组 + p值
Progenetix_search_biosamples
filters
(NCIt编码),
limit
生物样本记录
Progenetix_cnv_search
reference_name
,
start
,
end
,
variant_type
,
filters
指定区域存在CNV的生物样本
Progenetix_list_filtering_termsProgenetix中所有NCIt编码
OncoKB_annotate_variant
gene
,
variant
,
tumor_type
致癌性 + 治疗方案

Example Workflows

示例工作流

Workflow 1: Gene-Centric Mutation + Survival Analysis

工作流1:以基因为中心的突变 + 生存分析

1. GDC_get_mutation_frequency(gene_symbol="KRAS")
   -> Pan-cancer mutation count

2. GDC_get_ssm_by_gene(gene_symbol="KRAS", project_id="TCGA-LUAD", size=50)
   -> Specific amino acid changes in lung adenocarcinoma

3. GDC_get_survival(project_id="TCGA-LUAD", gene_symbol="KRAS")
   -> Survival split by KRAS mutation status + p-value

4. OncoKB_annotate_variant(gene="KRAS", variant="G12C", tumor_type="LUAD")
   -> Clinical significance + approved therapies (sotorasib)
1. GDC_get_mutation_frequency(gene_symbol="KRAS")
   -> 泛癌症突变计数

2. GDC_get_ssm_by_gene(gene_symbol="KRAS", project_id="TCGA-LUAD", size=50)
   -> 肺腺癌中的特定氨基酸变化

3. GDC_get_survival(project_id="TCGA-LUAD", gene_symbol="KRAS")
   -> 按KRAS突变状态拆分的生存数据 + p值

4. OncoKB_annotate_variant(gene="KRAS", variant="G12C", tumor_type="LUAD")
   -> 临床意义 + 获批疗法(sotorasib)

Workflow 2: Cohort Clinical Summary

工作流2:队列临床总结

1. GDC_list_projects()  -> confirm TCGA-OV exists

2. GDC_get_clinical_data(project_id="TCGA-OV", size=100)
   -> Demographics, tumor stage, treatment history

3. GDC_get_survival(project_id="TCGA-OV")
   -> Baseline overall survival curve for the cohort
1. GDC_list_projects()  -> 确认TCGA-OV存在

2. GDC_get_clinical_data(project_id="TCGA-OV", size=100)
   -> 人口统计学数据、肿瘤分期、治疗史

3. GDC_get_survival(project_id="TCGA-OV")
   -> 队列的基线总生存曲线

Workflow 3: CNV Analysis for a Gene

工作流3:某一基因的CNV分析

1. Progenetix_search_biosamples(filters="NCIT:C3058", limit=10)
   -> GBM biosamples with CNV data

2. Progenetix_cnv_search(
       reference_name="refseq:NC_000007.14",
       start=55019017, end=55211628,
       variant_type="DUP", filters="NCIT:C3058"
   )
   -> GBM samples with EGFR amplification

1. Progenetix_search_biosamples(filters="NCIT:C3058", limit=10)
   -> 具有CNV数据的胶质母细胞瘤生物样本

2. Progenetix_cnv_search(
       reference_name="refseq:NC_000007.14",
       start=55019017, end=55211628,
       variant_type="DUP", filters="NCIT:C3058"
   )
   -> 存在EGFR扩增的胶质母细胞瘤样本

Reasoning Framework

推理框架

Evidence Grading

证据分级

TierDescriptionExample
T1FDA-recognized biomarker with approved therapyBRAF V600E in melanoma (vemurafenib)
T2Well-powered clinical study, standard-of-care relevanceKRAS G12C in NSCLC (sotorasib), OncoKB Level 2
T3Preclinical/small cohort evidence, biological plausibilityRecurrent hotspot in TCGA but no approved therapy
T4Computational prediction or variant of unknown significanceLow-frequency mutation, no functional data
层级描述示例
T1FDA认可的生物标志物,并有获批疗法黑色素瘤中的BRAF V600E(vemurafenib)
T2样本量充足的临床研究,具有标准治疗相关性非小细胞肺癌中的KRAS G12C(sotorasib),OncoKB Level 2
T3临床前/小队列证据,具有生物学合理性TCGA中反复出现的热点突变,但无获批疗法
T4计算预测或意义不明的变异低频率突变,无功能数据

Interpretation Guidance

解读指南

Mutation frequency: A gene mutated in >10% of a TCGA cohort is likely a driver candidate (e.g., TP53 in 36% of all TCGA). Mutations at <1% frequency are typically passengers unless they occur at known hotspots. Always cross-reference with OncoKB oncogenicity annotation.
Survival analysis (Kaplan-Meier): A log-rank p-value < 0.05 suggests the gene mutation is associated with differential survival. Hazard ratio (HR) > 1 indicates worse prognosis for the mutated group. Interpret cautiously: TCGA cohorts are retrospective and not treatment-stratified. Small subgroups (n < 20) produce unreliable survival estimates.
Copy number variation: Focal amplifications (narrow peaks) of oncogenes (EGFR, MYC, ERBB2) are more likely functionally relevant than broad arm-level events. Homozygous deletions of tumor suppressors (CDKN2A, PTEN, RB1) are strong loss-of-function signals. DUP count from Progenetix reflects sample frequency, not copy number magnitude.
突变频率:在某一TCGA队列中突变率>10%的基因可能是驱动候选基因(例如TP53在所有TCGA中占36%)。突变率<1%的通常是乘客突变,除非发生在已知热点区域。请始终结合OncoKB的致癌性注释进行交叉验证。
生存分析(Kaplan-Meier):log-rank p值<0.05表明基因突变与生存差异相关。风险比(HR)>1表示突变组预后更差。请谨慎解读:TCGA队列为回顾性队列,未按治疗分层。小亚组(n<20)会产生不可靠的生存估计值。
拷贝数变异:癌基因(EGFR、MYC、ERBB2)的局灶性扩增(窄峰)比广泛的臂级事件更可能具有功能相关性。肿瘤抑制基因(CDKN2A、PTEN、RB1)的纯合缺失是强烈的功能丧失信号。Progenetix返回的DUP计数反映样本频率,而非拷贝数大小。

Synthesis Questions

综合问题

A complete cancer genomics report should answer:
  1. What are the most frequently mutated genes in this cancer type, and which are known drivers?
  2. Does mutation status of the queried gene associate with survival (p < 0.05)?
  3. Are recurrent CNV events (amplifications or deletions) present at known oncogene/tumor suppressor loci?
  4. What is the OncoKB clinical actionability level for identified variants?
  5. How does the mutation landscape compare across TCGA cancer types (pan-cancer context)?

一份完整的癌症基因组学报告应回答以下问题:
  1. 该癌症类型中最常见的突变基因有哪些?哪些是已知的驱动基因?
  2. 查询基因的突变状态是否与生存相关(p<0.05)?
  3. 已知癌基因/肿瘤抑制基因位点是否存在反复出现的CNV事件(扩增或缺失)?
  4. 已识别变异的OncoKB临床可操作性级别是什么?
  5. 突变图谱在不同TCGA癌症类型之间的比较如何(泛癌症背景)?

Programmatic Access (Beyond Tools)

程序化访问(工具之外)

When ToolUniverse tools return truncated results or you need bulk data, use the GDC API directly:
python
import requests, pandas as pd
当ToolUniverse工具返回截断结果或你需要批量数据时,请直接使用GDC API:
python
import requests, pandas as pd

Bulk clinical data for a TCGA project

某一TCGA项目的批量临床数据

filters = {"op":"and","content":[ {"op":"=","content":{"field":"project.project_id","value":"TCGA-BRCA"}} ]} all_cases = [] offset = 0 while True: resp = requests.post("https://api.gdc.cancer.gov/cases", json={ "filters": filters, "size": 500, "from": offset, "fields": "submitter_id,demographic.vital_status,demographic.days_to_death,diagnoses.tumor_stage" }).json() hits = resp["data"]["hits"] if not hits: break all_cases.extend(hits) offset += len(hits) df = pd.json_normalize(all_cases)
filters = {"op":"and","content":[ {"op":"=","content":{"field":"project.project_id","value":"TCGA-BRCA"}} ]} all_cases = [] offset = 0 while True: resp = requests.post("https://api.gdc.cancer.gov/cases", json={ "filters": filters, "size": 500, "from": offset, "fields": "submitter_id,demographic.vital_status,demographic.days_to_death,diagnoses.tumor_stage" }).json() hits = resp["data"]["hits"] if not hits: break all_cases.extend(hits) offset += len(hits) df = pd.json_normalize(all_cases)

Download MAF mutation file by UUID

通过UUID下载MAF突变文件

file_uuid = "abc123-..." # from GDC_list_files result url = f"https://api.gdc.cancer.gov/data/{file_uuid}" content = requests.get(url, headers={"Content-Type": "application/json"}).content
file_uuid = "abc123-..." # 来自GDC_list_files的结果 url = f"https://api.gdc.cancer.gov/data/{file_uuid}" content = requests.get(url, headers={"Content-Type": "application/json"}).content

Gene expression: query files endpoint for HTSeq counts

基因表达:查询文件端点获取HTSeq计数

expr_filters = {"op":"and","content":[ {"op":"=","content":{"field":"cases.project.project_id","value":"TCGA-BRCA"}}, {"op":"=","content":{"field":"data_type","value":"Gene Expression Quantification"}} ]}

See `tooluniverse-data-wrangling` skill for pagination, error handling, and format parsing patterns.

---
expr_filters = {"op":"and","content":[ {"op":"=","content":{"field":"cases.project.project_id","value":"TCGA-BRCA"}}, {"op":"=","content":{"field":"data_type","value":"Gene Expression Quantification"}} ]}

有关分页、错误处理和格式解析模式,请查看`tooluniverse-data-wrangling`技能。

---

Limitations

局限性

  • GDC_get_survival
    with
    gene_symbol
    splits on mutation presence only; no multi-gene or stage-based stratification.
  • GDC_get_mutation_frequency
    returns pan-cancer total only; per-cancer frequencies require
    GDC_get_ssm_by_gene
    per project.
  • GDC_get_clinical_data
    returns up to 100 cases per call; use
    offset
    for pagination.
  • Progenetix uses GRCh38 coordinates; provide GRCh38 positions for
    Progenetix_cnv_search
    .
  • OncoKB_annotate_variant
    without ONCOKB_API_TOKEN operates in demo mode (limited to BRAF, TP53, ROS1).
  • Progenetix
    filters
    param requires NCIt CURIE format (e.g., "NCIT:C4017"), not free text.
  • 使用
    GDC_get_survival
    并指定
    gene_symbol
    时,仅按突变存在与否拆分队列;不支持多基因或基于分期的分层。
  • GDC_get_mutation_frequency
    仅返回泛癌症总计数;如需按癌症类型划分的频率,需针对每个项目使用
    GDC_get_ssm_by_gene
  • GDC_get_clinical_data
    每次调用最多返回100个病例;使用
    offset
    进行分页。
  • Progenetix使用GRCh38坐标;请为
    Progenetix_cnv_search
    提供GRCh38位置。
  • 未设置ONCOKB_API_TOKEN时,
    OncoKB_annotate_variant
    以演示模式运行(仅支持BRAF、TP53、ROS1)。
  • Progenetix的
    filters
    参数需要NCIt CURIE格式(例如"NCIT:C4017"),而非自由文本。