tooluniverse-cancer-genomics-tcga
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCancer Genomics / TCGA Analysis
癌症基因组学 / TCGA分析
TCGA analysis starts with: what cancer type? what data type? Build your cohort FIRST (GDC filters), then analyze. Don't query mutations without defining the cohort — pan-cancer counts from are uninformative without cancer-type context. A mutation frequency of 10% in one cancer type may be 0.5% in another; always specify . Survival analysis (Kaplan-Meier) is hypothesis-generating in retrospective TCGA data — always report sample size and p-value, and note that TCGA cohorts are not treatment-stratified.
GDC_get_mutation_frequencyproject_idLOOK UP DON'T GUESS: never assume TCGA project IDs, NCIt codes, or gene coordinates — use to confirm project IDs and for NCIt codes.
GDC_list_projectsProgenetix_list_filtering_termsSystematic TCGA/GDC analysis: define cohorts, retrieve clinical data, profile somatic
mutations, query copy number variations, run survival analysis, and interpret variants
with OncoKB.
TCGA分析始于:何种癌症类型?何种数据类型? 首先构建你的队列(使用GDC筛选器),然后再进行分析。不要在未定义队列的情况下查询突变——得出的泛癌症计数如果没有癌症类型背景信息是没有意义的。某一癌症类型中10%的突变频率在另一癌症类型中可能仅为0.5%;请始终指定。回顾性TCGA数据中的生存分析(Kaplan-Meier)用于生成假设——请始终报告样本量和p值,并注意TCGA队列未按治疗分层。
GDC_get_mutation_frequencyproject_id查资料,别猜测:永远不要假设TCGA项目ID、NCIt编码或基因坐标——使用确认项目ID,使用获取NCIt编码。
GDC_list_projectsProgenetix_list_filtering_terms系统化TCGA/GDC分析流程:定义队列、检索临床数据、分析体细胞突变、查询拷贝数变异、进行生存分析,并通过OncoKB解读变异。
When to Use
适用场景
- "What is the mutation frequency of TP53 in TCGA-BRCA?"
- "Get survival data for TCGA-LUAD patients"
- "Find clinical data for breast cancer cases in GDC"
- "Which TCGA projects have KRAS G12C mutations?"
- "Show CNV amplifications of EGFR in glioblastoma"
- "Annotate BRAF V600E for clinical significance in melanoma"
- "TCGA-BRCA中TP53的突变频率是多少?"
- "获取TCGA-LUAD患者的生存数据"
- "查找GDC中乳腺癌病例的临床数据"
- "哪些TCGA项目存在KRAS G12C突变?"
- "展示胶质母细胞瘤中EGFR的CNV扩增情况"
- "注释BRAF V600E在黑色素瘤中的临床意义"
NOT for (use other skills instead)
不适用场景(请使用其他技能)
- Precision oncology treatment recommendations -> Use
tooluniverse-precision-oncology - Rare disease gene discovery -> Use
tooluniverse-rare-disease-genomics - GWAS variant interpretation -> Use
tooluniverse-gwas-snp-interpretation
- 精准肿瘤治疗建议 -> 使用
tooluniverse-precision-oncology - 罕见病基因发现 -> 使用
tooluniverse-rare-disease-genomics - GWAS变异解读 -> 使用
tooluniverse-gwas-snp-interpretation
Workflow Overview
工作流概述
Input (cancer type / gene / TCGA project ID)
|
v
Phase 1: Study Selection -- GDC_list_projects, GDC_search_cases
|
v
Phase 2: Clinical Data -- GDC_get_clinical_data
|
v
Phase 3: Somatic Mutations -- GDC_get_ssm_by_gene, GDC_get_mutation_frequency
|
v
Phase 4: CNV Analysis -- Progenetix_cnv_search, Progenetix_search_biosamples
|
v
Phase 5: Survival Analysis -- GDC_get_survival
|
v
Phase 6: Variant Interpretation -- OncoKB_annotate_variant输入(癌症类型 / 基因 / TCGA项目ID)
|
v
阶段1:研究选择 -- GDC_list_projects, GDC_search_cases
|
v
阶段2:临床数据 -- GDC_get_clinical_data
|
v
阶段3:体细胞突变 -- GDC_get_ssm_by_gene, GDC_get_mutation_frequency
|
v
阶段4:CNV分析 -- Progenetix_cnv_search, Progenetix_search_biosamples
|
v
阶段5:生存分析 -- GDC_get_survival
|
v
阶段6:变异解读 -- OncoKB_annotate_variantKey Identifiers
关键标识符
| Data Type | Format | Example |
|---|---|---|
| GDC project | TCGA-{ABBREV} | TCGA-BRCA, TCGA-LUAD, TCGA-SKCM |
| GDC case | UUID | 3c6ef4c1-... |
| NCIt cancer code | NCIT:C###### | NCIT:C4017 (breast), NCIT:C3058 (GBM) |
| RefSeq chromosome | refseq:NC_###### | refseq:NC_000007.14 (chr7) |
| 数据类型 | 格式 | 示例 |
|---|---|---|
| GDC项目 | TCGA-{缩写} | TCGA-BRCA, TCGA-LUAD, TCGA-SKCM |
| GDC病例 | UUID | 3c6ef4c1-... |
| NCIt癌症编码 | NCIT:C###### | NCIT:C4017(乳腺癌), NCIT:C3058(胶质母细胞瘤) |
| RefSeq染色体 | refseq:NC_###### | refseq:NC_000007.14(第7号染色体) |
Common TCGA Project IDs
常见TCGA项目ID
| Cancer | Project ID | NCIt Code |
|---|---|---|
| Breast | TCGA-BRCA | NCIT:C4017 |
| Lung adenocarcinoma | TCGA-LUAD | NCIT:C3512 |
| Glioblastoma | TCGA-GBM | NCIT:C3058 |
| Melanoma | TCGA-SKCM | NCIT:C3510 |
| Colorectal | TCGA-COAD | NCIT:C4349 |
| Ovarian | TCGA-OV | NCIT:C4908 |
| Prostate | TCGA-PRAD | NCIT:C7378 |
| 癌症类型 | 项目ID | NCIt编码 |
|---|---|---|
| 乳腺癌 | TCGA-BRCA | NCIT:C4017 |
| 肺腺癌 | TCGA-LUAD | NCIT:C3512 |
| 胶质母细胞瘤 | TCGA-GBM | NCIT:C3058 |
| 黑色素瘤 | TCGA-SKCM | NCIT:C3510 |
| 结直肠癌 | TCGA-COAD | NCIT:C4349 |
| 卵巢癌 | TCGA-OV | NCIT:C4908 |
| 前列腺癌 | TCGA-PRAD | NCIT:C7378 |
Phase 1: Study Selection
阶段1:研究选择
GDC_list_projects: No params required. Returns all GDC/TCGA projects with case counts.
- Use to browse available projects and map cancer types to project IDs.
GDC_search_cases: (string, e.g., "TCGA-BRCA"), (int, default 10), (int).
Returns case UUIDs and basic metadata.
project_idsizeoffset- Use to confirm a project exists and retrieve case counts before deeper queries.
GDC_list_projects:无需参数。返回所有带病例数的GDC/TCGA项目。
- 用于浏览可用项目,并将癌症类型映射到项目ID。
GDC_search_cases:(字符串,例如"TCGA-BRCA")、(整数,默认10)、(整数)。
返回病例UUID和基础元数据。
project_idsizeoffset- 用于确认项目存在,并在深入查询前检索病例数。
Phase 2: Clinical Data
阶段2:临床数据
GDC_get_clinical_data: (string), (string, e.g., "Breast"), (string), ("Alive" or "Dead"), ("female"/"male"), (int, 1-100), (int).
Returns .
project_idprimary_sitedisease_typevital_statusgendersizeoffset{status, data: [{case_id, demographics: {gender, race, ethnicity, vital_status, age_at_index}, diagnoses: [{primary_diagnosis, tumor_stage, age_at_diagnosis, days_to_last_follow_up}], treatments: [{therapeutic_agents, treatment_type}]}]}- Use + optional filters to retrieve patient-level clinical attributes.
project_id - is in days; divide by 365.25 for years.
age_at_diagnosis - Multiple diagnoses or treatments per case are possible.
python
undefinedGDC_get_clinical_data:(字符串)、(字符串,例如"Breast")、(字符串)、("Alive"或"Dead")、("female"/"male")、(整数,1-100)、(整数)。
返回。
project_idprimary_sitedisease_typevital_statusgendersizeoffset{status, data: [{case_id, demographics: {gender, race, ethnicity, vital_status, age_at_index}, diagnoses: [{primary_diagnosis, tumor_stage, age_at_diagnosis, days_to_last_follow_up}], treatments: [{therapeutic_agents, treatment_type}]}]}- 使用加可选筛选器检索患者级别的临床属性。
project_id - 以天为单位;除以365.25转换为年。
age_at_diagnosis - 单个病例可能存在多个诊断或治疗记录。
python
undefinedGet clinical data for deceased BRCA patients
获取已故BRCA患者的临床数据
result = tu.tools.GDC_get_clinical_data(
project_id="TCGA-BRCA", vital_status="Dead", size=50
)
---result = tu.tools.GDC_get_clinical_data(
project_id="TCGA-BRCA", vital_status="Dead", size=50
)
---Phase 3: Somatic Mutations
阶段3:体细胞突变
GDC_get_mutation_frequency: (string REQUIRED, alias: ). Returns pan-cancer SSM occurrence count.
gene_symbolgene- Returns TOTAL count across all TCGA; no per-project breakdown.
- For cancer-specific data, use with
GDC_get_ssm_by_gene.project_id
GDC_get_ssm_by_gene: (string REQUIRED), (string, optional), (int, 1-100).
Returns .
gene_symbolproject_idsize{status, data: [{ssm_id, mutation_type, genomic_dna_change, aa_change, consequence_type}]}- : "Single base substitution", "Insertion", "Deletion".
mutation_type - : amino acid change notation (e.g., "Val600Glu").
aa_change
python
undefinedGDC_get_mutation_frequency:(必填字符串,别名:)。返回泛癌症SSM发生计数。
gene_symbolgene- 返回所有TCGA中的总计数;无按项目细分的数据。
- 如需癌症特异性数据,请结合使用
project_id。GDC_get_ssm_by_gene
GDC_get_ssm_by_gene:(必填字符串)、(字符串,可选)、(整数,1-100)。
返回。
gene_symbolproject_idsize{status, data: [{ssm_id, mutation_type, genomic_dna_change, aa_change, consequence_type}]}- :"Single base substitution"(单碱基替换)、"Insertion"(插入)、"Deletion"(缺失)。
mutation_type - :氨基酸变化表示法(例如"Val600Glu")。
aa_change
python
undefinedTP53 mutations in lung adenocarcinoma
肺腺癌中的TP53突变
mutations = tu.tools.GDC_get_ssm_by_gene(
gene_symbol="TP53", project_id="TCGA-LUAD", size=50
)
---mutations = tu.tools.GDC_get_ssm_by_gene(
gene_symbol="TP53", project_id="TCGA-LUAD", size=50
)
---Phase 4: CNV Analysis (Progenetix)
阶段4:CNV分析(Progenetix)
Progenetix_search_biosamples: (string REQUIRED, NCIt code e.g., "NCIT:C4017"), (int), (int).
Returns .
filterslimitskip{status, data: {biosamples: [{biosample_id, histological_diagnosis, pathological_stage, external_references}]}}- Use to find samples with CNV profiles for a given cancer type.
Progenetix_cnv_search: (string REQUIRED, RefSeq accession), (int REQUIRED, GRCh38 1-based), (int REQUIRED), ("DUP"/"DEL"), (string, NCIt code), (int).
Returns biosamples with CNV in the specified genomic region.
reference_namestartendvariant_typefilterslimit- for amplification,
variant_type="DUP"for deletion."DEL" - Use to restrict to a cancer type.
filters
python
undefinedProgenetix_search_biosamples:(必填字符串,NCIt编码例如"NCIT:C4017")、(整数)、(整数)。
返回。
filterslimitskip{status, data: {biosamples: [{biosample_id, histological_diagnosis, pathological_stage, external_references}]}}- 用于查找具有特定癌症类型CNV谱的样本。
Progenetix_cnv_search:(必填字符串,RefSeq登录号)、(必填整数,GRCh38 1-based)、(必填整数)、("DUP"/"DEL")、(字符串,NCIt编码)、(整数)。
返回指定基因组区域存在CNV的生物样本。
reference_namestartendvariant_typefilterslimit- 表示扩增,
variant_type="DUP"表示缺失。"DEL" - 使用限制为特定癌症类型。
filters
python
undefinedEGFR amplifications (chr7:55019017-55211628) in breast cancer
乳腺癌中EGFR的扩增(chr7:55019017-55211628)
result = tu.tools.Progenetix_cnv_search(
reference_name="refseq:NC_000007.14",
start=55019017, end=55211628,
variant_type="DUP", filters="NCIT:C4017", limit=10
)
**Progenetix_list_filtering_terms**: No params. Returns all available NCIt codes and labels.
- Use when you need to find the NCIt code for a cancer type.
**Progenetix_list_cohorts**: No params. Returns named cohorts available in Progenetix.
---result = tu.tools.Progenetix_cnv_search(
reference_name="refseq:NC_000007.14",
start=55019017, end=55211628,
variant_type="DUP", filters="NCIT:C4017", limit=10
)
**Progenetix_list_filtering_terms**:无参数。返回所有可用的NCIt编码和标签。
- 当你需要查找某一癌症类型的NCIt编码时使用。
**Progenetix_list_cohorts**:无参数。返回Progenetix中可用的命名队列。
---Phase 5: Survival Analysis
阶段5:生存分析
GDC_get_survival: (string REQUIRED, e.g., "TCGA-BRCA"), (string, optional -- filters to mutated cases).
Returns .
project_idgene_symbol{status, data: {donors: [{id, time, censored, survivalEstimate}], overallStats: {pValue}}}- Each donor has (days),
time(bool: False=death event, True=censored), andcensored.survivalEstimate - : log-rank p-value (present when
overallStats.pValuesplits cohort).gene_symbol - Without : returns full-cohort survival curve.
gene_symbol - With : returns survival split by mutation status (mutated vs. wild-type).
gene_symbol
python
undefinedGDC_get_survival:(必填字符串,例如"TCGA-BRCA")、(字符串,可选——筛选出突变病例)。
返回。
project_idgene_symbol{status, data: {donors: [{id, time, censored, survivalEstimate}], overallStats: {pValue}}}- 每个捐赠者包含(天数)、
time(布尔值:False=死亡事件,True=截尾)和censored。survivalEstimate - :log-rank p值(当
overallStats.pValue用于拆分队列时提供)。gene_symbol - 无:返回整个队列的生存曲线。
gene_symbol - 有:按突变状态(突变型vs野生型)拆分返回生存数据。
gene_symbol
python
undefinedSurvival for TCGA-BRCA split by TP53 mutation
按TP53突变状态拆分的TCGA-BRCA生存数据
surv = tu.tools.GDC_get_survival(project_id="TCGA-BRCA", gene_symbol="TP53")
pval = surv["data"]["overallStats"]["pValue"]
---surv = tu.tools.GDC_get_survival(project_id="TCGA-BRCA", gene_symbol="TP53")
pval = surv["data"]["overallStats"]["pValue"]
---Phase 6: Variant Interpretation (OncoKB)
阶段6:变异解读(OncoKB)
OncoKB_annotate_variant: (string, alias ), (string, alias , e.g., "V600E"), (string, OncoTree code e.g., "MEL").
Returns .
genegene_symbolvariantalterationtumor_type{status, data: {oncogenic, mutationEffect, highestSensitiveLevel, treatments: [{drugs, level, indication}]}}- : "Oncogenic", "Likely Oncogenic", "Neutral", "Inconclusive", "Unknown".
oncogenic - : FDA approval level ("LEVEL_1"=FDA-approved, "LEVEL_2"=standard of care, etc.).
highestSensitiveLevel - Demo mode available for BRAF, TP53, ROS1 without API key.
- Set ONCOKB_API_TOKEN for full access.
python
undefinedOncoKB_annotate_variant:(字符串,别名)、(字符串,别名,例如"V600E")、(字符串,OncoTree编码例如"MEL")。
返回。
genegene_symbolvariantalterationtumor_type{status, data: {oncogenic, mutationEffect, highestSensitiveLevel, treatments: [{drugs, level, indication}]}}- :"Oncogenic"(致癌性)、"Likely Oncogenic"(可能致癌)、"Neutral"(中性)、"Inconclusive"(不确定)、"Unknown"(未知)。
oncogenic - :FDA批准级别("LEVEL_1"=FDA批准,"LEVEL_2"=标准治疗等)。
highestSensitiveLevel - 无需API密钥即可使用BRAF、TP53、ROS1的演示模式。
- 设置ONCOKB_API_TOKEN以获得完整访问权限。
python
undefinedAnnotate KRAS G12C in lung adenocarcinoma
注释肺腺癌中的KRAS G12C
result = tu.tools.OncoKB_annotate_variant(
gene="KRAS", variant="G12C", tumor_type="LUAD"
)
---result = tu.tools.OncoKB_annotate_variant(
gene="KRAS", variant="G12C", tumor_type="LUAD"
)
---Tool Quick Reference
工具速查
| Tool | Key Params | Returns |
|---|---|---|
| GDC_list_projects | (none) | All TCGA/GDC projects with counts |
| GDC_search_cases | | Case UUIDs + metadata |
| GDC_get_clinical_data | | Demographics + diagnoses + treatments |
| GDC_get_mutation_frequency | | Pan-cancer SSM count |
| GDC_get_ssm_by_gene | | Per-mutation records with aa_change |
| GDC_get_survival | | Kaplan-Meier donor array + pValue |
| Progenetix_search_biosamples | | Biosample records |
| Progenetix_cnv_search | | Biosamples with CNV in region |
| Progenetix_list_filtering_terms | (none) | All NCIt codes in Progenetix |
| OncoKB_annotate_variant | | Oncogenicity + treatments |
| 工具 | 关键参数 | 返回内容 |
|---|---|---|
| GDC_list_projects | 无 | 所有带病例数的TCGA/GDC项目 |
| GDC_search_cases | | 病例UUID + 元数据 |
| GDC_get_clinical_data | | 人口统计学数据 + 诊断 + 治疗记录 |
| GDC_get_mutation_frequency | | 泛癌症SSM计数 |
| GDC_get_ssm_by_gene | | 带氨基酸变化的单个突变记录 |
| GDC_get_survival | | Kaplan-Meier捐赠者数组 + p值 |
| Progenetix_search_biosamples | | 生物样本记录 |
| Progenetix_cnv_search | | 指定区域存在CNV的生物样本 |
| Progenetix_list_filtering_terms | 无 | Progenetix中所有NCIt编码 |
| OncoKB_annotate_variant | | 致癌性 + 治疗方案 |
Example Workflows
示例工作流
Workflow 1: Gene-Centric Mutation + Survival Analysis
工作流1:以基因为中心的突变 + 生存分析
1. GDC_get_mutation_frequency(gene_symbol="KRAS")
-> Pan-cancer mutation count
2. GDC_get_ssm_by_gene(gene_symbol="KRAS", project_id="TCGA-LUAD", size=50)
-> Specific amino acid changes in lung adenocarcinoma
3. GDC_get_survival(project_id="TCGA-LUAD", gene_symbol="KRAS")
-> Survival split by KRAS mutation status + p-value
4. OncoKB_annotate_variant(gene="KRAS", variant="G12C", tumor_type="LUAD")
-> Clinical significance + approved therapies (sotorasib)1. GDC_get_mutation_frequency(gene_symbol="KRAS")
-> 泛癌症突变计数
2. GDC_get_ssm_by_gene(gene_symbol="KRAS", project_id="TCGA-LUAD", size=50)
-> 肺腺癌中的特定氨基酸变化
3. GDC_get_survival(project_id="TCGA-LUAD", gene_symbol="KRAS")
-> 按KRAS突变状态拆分的生存数据 + p值
4. OncoKB_annotate_variant(gene="KRAS", variant="G12C", tumor_type="LUAD")
-> 临床意义 + 获批疗法(sotorasib)Workflow 2: Cohort Clinical Summary
工作流2:队列临床总结
1. GDC_list_projects() -> confirm TCGA-OV exists
2. GDC_get_clinical_data(project_id="TCGA-OV", size=100)
-> Demographics, tumor stage, treatment history
3. GDC_get_survival(project_id="TCGA-OV")
-> Baseline overall survival curve for the cohort1. GDC_list_projects() -> 确认TCGA-OV存在
2. GDC_get_clinical_data(project_id="TCGA-OV", size=100)
-> 人口统计学数据、肿瘤分期、治疗史
3. GDC_get_survival(project_id="TCGA-OV")
-> 队列的基线总生存曲线Workflow 3: CNV Analysis for a Gene
工作流3:某一基因的CNV分析
1. Progenetix_search_biosamples(filters="NCIT:C3058", limit=10)
-> GBM biosamples with CNV data
2. Progenetix_cnv_search(
reference_name="refseq:NC_000007.14",
start=55019017, end=55211628,
variant_type="DUP", filters="NCIT:C3058"
)
-> GBM samples with EGFR amplification1. Progenetix_search_biosamples(filters="NCIT:C3058", limit=10)
-> 具有CNV数据的胶质母细胞瘤生物样本
2. Progenetix_cnv_search(
reference_name="refseq:NC_000007.14",
start=55019017, end=55211628,
variant_type="DUP", filters="NCIT:C3058"
)
-> 存在EGFR扩增的胶质母细胞瘤样本Reasoning Framework
推理框架
Evidence Grading
证据分级
| Tier | Description | Example |
|---|---|---|
| T1 | FDA-recognized biomarker with approved therapy | BRAF V600E in melanoma (vemurafenib) |
| T2 | Well-powered clinical study, standard-of-care relevance | KRAS G12C in NSCLC (sotorasib), OncoKB Level 2 |
| T3 | Preclinical/small cohort evidence, biological plausibility | Recurrent hotspot in TCGA but no approved therapy |
| T4 | Computational prediction or variant of unknown significance | Low-frequency mutation, no functional data |
| 层级 | 描述 | 示例 |
|---|---|---|
| T1 | FDA认可的生物标志物,并有获批疗法 | 黑色素瘤中的BRAF V600E(vemurafenib) |
| T2 | 样本量充足的临床研究,具有标准治疗相关性 | 非小细胞肺癌中的KRAS G12C(sotorasib),OncoKB Level 2 |
| T3 | 临床前/小队列证据,具有生物学合理性 | TCGA中反复出现的热点突变,但无获批疗法 |
| T4 | 计算预测或意义不明的变异 | 低频率突变,无功能数据 |
Interpretation Guidance
解读指南
Mutation frequency: A gene mutated in >10% of a TCGA cohort is likely a driver candidate (e.g., TP53 in 36% of all TCGA). Mutations at <1% frequency are typically passengers unless they occur at known hotspots. Always cross-reference with OncoKB oncogenicity annotation.
Survival analysis (Kaplan-Meier): A log-rank p-value < 0.05 suggests the gene mutation is associated with differential survival. Hazard ratio (HR) > 1 indicates worse prognosis for the mutated group. Interpret cautiously: TCGA cohorts are retrospective and not treatment-stratified. Small subgroups (n < 20) produce unreliable survival estimates.
Copy number variation: Focal amplifications (narrow peaks) of oncogenes (EGFR, MYC, ERBB2) are more likely functionally relevant than broad arm-level events. Homozygous deletions of tumor suppressors (CDKN2A, PTEN, RB1) are strong loss-of-function signals. DUP count from Progenetix reflects sample frequency, not copy number magnitude.
突变频率:在某一TCGA队列中突变率>10%的基因可能是驱动候选基因(例如TP53在所有TCGA中占36%)。突变率<1%的通常是乘客突变,除非发生在已知热点区域。请始终结合OncoKB的致癌性注释进行交叉验证。
生存分析(Kaplan-Meier):log-rank p值<0.05表明基因突变与生存差异相关。风险比(HR)>1表示突变组预后更差。请谨慎解读:TCGA队列为回顾性队列,未按治疗分层。小亚组(n<20)会产生不可靠的生存估计值。
拷贝数变异:癌基因(EGFR、MYC、ERBB2)的局灶性扩增(窄峰)比广泛的臂级事件更可能具有功能相关性。肿瘤抑制基因(CDKN2A、PTEN、RB1)的纯合缺失是强烈的功能丧失信号。Progenetix返回的DUP计数反映样本频率,而非拷贝数大小。
Synthesis Questions
综合问题
A complete cancer genomics report should answer:
- What are the most frequently mutated genes in this cancer type, and which are known drivers?
- Does mutation status of the queried gene associate with survival (p < 0.05)?
- Are recurrent CNV events (amplifications or deletions) present at known oncogene/tumor suppressor loci?
- What is the OncoKB clinical actionability level for identified variants?
- How does the mutation landscape compare across TCGA cancer types (pan-cancer context)?
一份完整的癌症基因组学报告应回答以下问题:
- 该癌症类型中最常见的突变基因有哪些?哪些是已知的驱动基因?
- 查询基因的突变状态是否与生存相关(p<0.05)?
- 已知癌基因/肿瘤抑制基因位点是否存在反复出现的CNV事件(扩增或缺失)?
- 已识别变异的OncoKB临床可操作性级别是什么?
- 突变图谱在不同TCGA癌症类型之间的比较如何(泛癌症背景)?
Programmatic Access (Beyond Tools)
程序化访问(工具之外)
When ToolUniverse tools return truncated results or you need bulk data, use the GDC API directly:
python
import requests, pandas as pd当ToolUniverse工具返回截断结果或你需要批量数据时,请直接使用GDC API:
python
import requests, pandas as pdBulk clinical data for a TCGA project
某一TCGA项目的批量临床数据
filters = {"op":"and","content":[
{"op":"=","content":{"field":"project.project_id","value":"TCGA-BRCA"}}
]}
all_cases = []
offset = 0
while True:
resp = requests.post("https://api.gdc.cancer.gov/cases", json={
"filters": filters, "size": 500, "from": offset,
"fields": "submitter_id,demographic.vital_status,demographic.days_to_death,diagnoses.tumor_stage"
}).json()
hits = resp["data"]["hits"]
if not hits: break
all_cases.extend(hits)
offset += len(hits)
df = pd.json_normalize(all_cases)
filters = {"op":"and","content":[
{"op":"=","content":{"field":"project.project_id","value":"TCGA-BRCA"}}
]}
all_cases = []
offset = 0
while True:
resp = requests.post("https://api.gdc.cancer.gov/cases", json={
"filters": filters, "size": 500, "from": offset,
"fields": "submitter_id,demographic.vital_status,demographic.days_to_death,diagnoses.tumor_stage"
}).json()
hits = resp["data"]["hits"]
if not hits: break
all_cases.extend(hits)
offset += len(hits)
df = pd.json_normalize(all_cases)
Download MAF mutation file by UUID
通过UUID下载MAF突变文件
file_uuid = "abc123-..." # from GDC_list_files result
url = f"https://api.gdc.cancer.gov/data/{file_uuid}"
content = requests.get(url, headers={"Content-Type": "application/json"}).content
file_uuid = "abc123-..." # 来自GDC_list_files的结果
url = f"https://api.gdc.cancer.gov/data/{file_uuid}"
content = requests.get(url, headers={"Content-Type": "application/json"}).content
Gene expression: query files endpoint for HTSeq counts
基因表达:查询文件端点获取HTSeq计数
expr_filters = {"op":"and","content":[
{"op":"=","content":{"field":"cases.project.project_id","value":"TCGA-BRCA"}},
{"op":"=","content":{"field":"data_type","value":"Gene Expression Quantification"}}
]}
See `tooluniverse-data-wrangling` skill for pagination, error handling, and format parsing patterns.
---expr_filters = {"op":"and","content":[
{"op":"=","content":{"field":"cases.project.project_id","value":"TCGA-BRCA"}},
{"op":"=","content":{"field":"data_type","value":"Gene Expression Quantification"}}
]}
有关分页、错误处理和格式解析模式,请查看`tooluniverse-data-wrangling`技能。
---Limitations
局限性
- with
GDC_get_survivalsplits on mutation presence only; no multi-gene or stage-based stratification.gene_symbol - returns pan-cancer total only; per-cancer frequencies require
GDC_get_mutation_frequencyper project.GDC_get_ssm_by_gene - returns up to 100 cases per call; use
GDC_get_clinical_datafor pagination.offset - Progenetix uses GRCh38 coordinates; provide GRCh38 positions for .
Progenetix_cnv_search - without ONCOKB_API_TOKEN operates in demo mode (limited to BRAF, TP53, ROS1).
OncoKB_annotate_variant - Progenetix param requires NCIt CURIE format (e.g., "NCIT:C4017"), not free text.
filters
- 使用并指定
GDC_get_survival时,仅按突变存在与否拆分队列;不支持多基因或基于分期的分层。gene_symbol - 仅返回泛癌症总计数;如需按癌症类型划分的频率,需针对每个项目使用
GDC_get_mutation_frequency。GDC_get_ssm_by_gene - 每次调用最多返回100个病例;使用
GDC_get_clinical_data进行分页。offset - Progenetix使用GRCh38坐标;请为提供GRCh38位置。
Progenetix_cnv_search - 未设置ONCOKB_API_TOKEN时,以演示模式运行(仅支持BRAF、TP53、ROS1)。
OncoKB_annotate_variant - Progenetix的参数需要NCIt CURIE格式(例如"NCIT:C4017"),而非自由文本。
filters