tooluniverse-cancer-classification

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cancer Classification via OncoTree

基于OncoTree的癌症分类

Standardize cancer type nomenclature using the OncoTree ontology. Resolves free-text tumor descriptions to structured codes with UMLS/NCI cross-references, enabling downstream use in OncoKB variant annotation and GDC cohort selection.
使用OncoTree本体标准化癌症类型命名。将自由文本肿瘤描述解析为带有UMLS/NCI交叉引用的结构化代码,支持下游用于OncoKB变异注释和GDC队列筛选。

When to Use

适用场景

Apply when researcher asks about:
  • "What is the OncoTree code for [tumor description]?"
  • "Find all subtypes of [cancer type]"
  • "What cancers originate in [tissue]?"
  • "I need the tumor type code for OncoKB annotation"
  • "What is the TCGA/COSMIC code for [cancer]?"
  • "List all CNS/Brain cancer subtypes"
  • "What NCI code corresponds to glioblastoma?"
当研究人员提出以下问题时适用:
  • "[肿瘤描述]对应的OncoTree代码是什么?"
  • "查找[癌症类型]的所有亚型"
  • "哪些癌症起源于[组织]?"
  • "我需要用于OncoKB注释的肿瘤类型代码"
  • "[癌症]对应的TCGA/COSMIC代码是什么?"
  • "列出所有中枢神经系统/脑部癌症亚型"
  • "胶质母细胞瘤对应的NCI代码是什么?"

Key Tools

核心工具

ToolPurposeKey Params
OncoTree_search
Free-text search for cancer types
query
(tumor name or description)
OncoTree_get_type
Full details for a known OncoTree code
code
(e.g., "LUAD", "AML")
OncoTree_list_tissues
List all 32 tissue categories(no params)
OncoKB_annotate_variant
Variant annotation using OncoTree code
gene
,
variant
,
tumor_type
GDC_get_mutation_frequency
Pan-cancer mutation frequency (TCGA)
gene_symbol
工具用途关键参数
OncoTree_search
癌症类型自由文本搜索
query
(肿瘤名称或描述)
OncoTree_get_type
获取已知OncoTree代码的完整详情
code
(例如:"LUAD", "AML")
OncoTree_list_tissues
列出全部32种组织类别(无参数)
OncoKB_annotate_variant
使用OncoTree代码进行变异注释
gene
,
variant
,
tumor_type
GDC_get_mutation_frequency
泛癌突变频率(TCGA)
gene_symbol

Workflow

工作流程

Phase 1: Cancer Type Discovery

阶段1:癌症类型发现

Start with free-text search to find matching OncoTree codes:
OncoTree_search(query="breast cancer")
-> Returns list: code, name, main_type, tissue, parent, level, external_references
Key response fields:
  • code
    : OncoTree code (e.g., "BRCA", "IBC") — use this in OncoKB calls
  • level
    : hierarchy depth (1=tissue, 2=main type, 3-5=subtypes)
  • parent
    : parent node code for navigating the hierarchy
  • external_references.UMLS
    : UMLS CUI list
  • external_references.NCI
    : NCI thesaurus code list
Search tips:
  • Broad terms ("lung cancer") return many results; narrow by tissue or level
  • Use tissue-specific terms ("invasive breast carcinoma") for precise matching
  • Acronyms work: query="GBM" finds glioblastoma, query="AML" finds leukemia types
首先通过自由文本搜索找到匹配的OncoTree代码:
OncoTree_search(query="breast cancer")
-> 返回列表:code, name, main_type, tissue, parent, level, external_references
关键响应字段:
  • code
    : OncoTree代码(例如:"BRCA", "IBC")——用于OncoKB调用
  • level
    : 层级深度(1=组织,2=主类型,3-5=亚型)
  • parent
    : 用于导航层级结构的父节点代码
  • external_references.UMLS
    : UMLS CUI列表
  • external_references.NCI
    : NCI叙词表代码列表
搜索技巧:
  • 宽泛术语(如"肺癌")会返回大量结果;可通过组织或层级缩小范围
  • 使用组织特异性术语(如"浸润性乳腺癌")以获得精确匹配
  • 缩写有效:query="GBM"会找到胶质母细胞瘤,query="AML"会找到白血病类型

Phase 2: Code Validation and Detail Retrieval

阶段2:代码验证与详情检索

Once you have a candidate code, retrieve full details:
OncoTree_get_type(code="LUAD")
-> Returns: name, main_type, tissue, color, parent, level, history, external_references
Note: Not all codes are valid. "GBM" returns 404 — correct code is "GB" (Glioblastoma, IDH-Wildtype). Always validate via
OncoTree_get_type
before using in downstream tools.
获得候选代码后,检索完整详情:
OncoTree_get_type(code="LUAD")
-> 返回:name, main_type, tissue, color, parent, level, history, external_references
注意:并非所有代码都有效。"GBM"会返回404——正确代码为"GB"(IDH野生型胶质母细胞瘤)。在下游工具中使用前,务必通过
OncoTree_get_type
进行验证。

Phase 3: Tissue-Level Exploration

阶段3:组织层级探索

When the user wants all cancers in a tissue category:
OncoTree_list_tissues()
-> Returns 32 tissue names: "Breast", "CNS/Brain", "Lung", "Myeloid", ...

OncoTree_search(query="CNS/Brain")
-> All cancer types with tissue="CNS/Brain"
当用户需要某一组织类别下的所有癌症时:
OncoTree_list_tissues()
-> 返回32种组织名称:"Breast", "CNS/Brain", "Lung", "Myeloid", ...

OncoTree_search(query="CNS/Brain")
-> 所有tissue="CNS/Brain"的癌症类型

Phase 4: Downstream Use in Variant Annotation

阶段4:下游用于变异注释

Pass validated OncoTree code to OncoKB for cancer-type-specific therapeutic levels:
OncoKB_annotate_variant(gene="EGFR", variant="L858R", tumor_type="LUAD")
-> highestSensitiveLevel: "1" (FDA-approved therapy for this tumor+variant)
Without
tumor_type
, OncoKB returns pan-cancer levels which may be less specific.
将验证后的OncoTree代码传入OncoKB,获取癌症类型特异性治疗等级:
OncoKB_annotate_variant(gene="EGFR", variant="L858R", tumor_type="LUAD")
-> highestSensitiveLevel: "1"(针对该肿瘤+变异的FDA批准疗法)
若不传入
tumor_type
,OncoKB会返回泛癌等级,特异性可能不足。

Tool Parameter Reference

工具参数参考

ToolRequiredOptionalNotes
OncoTree_search
query
Free text; returns list sorted by relevance
OncoTree_get_type
code
Case-sensitive; "BRCA" not "brca". Returns 404 for invalid codes
OncoTree_list_tissues
No params; returns list of 32 tissue strings
OncoKB_annotate_variant
gene
,
variant
tumor_type
tumor_type
is OncoTree code; omit for pan-cancer
GDC_get_mutation_frequency
gene_symbol
Pan-cancer TCGA only; no per-subtype breakdown
工具必填参数可选参数说明
OncoTree_search
query
自由文本;返回按相关性排序的列表
OncoTree_get_type
code
区分大小写;"BRCA"而非"brca"。无效代码会返回404
OncoTree_list_tissues
无参数;返回32种组织字符串的列表
OncoKB_annotate_variant
gene
,
variant
tumor_type
tumor_type
为OncoTree代码;省略则返回泛癌结果
GDC_get_mutation_frequency
gene_symbol
仅支持TCGA泛癌;无亚型细分数据

Common OncoTree Codes (verified working)

常用有效OncoTree代码

CodeNameTissue
BRCA
Invasive Breast CarcinomaBreast
LUAD
Lung AdenocarcinomaLung
LUSC
Lung Squamous Cell CarcinomaLung
MEL
MelanomaSkin
CRC
Colorectal CancerBowel
PAAD
Pancreatic AdenocarcinomaPancreas
GBM
(invalid — use
GB
)
CNS/Brain
GB
Glioblastoma, IDH-WildtypeCNS/Brain
AML
Acute Myeloid LeukemiaMyeloid
PRAD
Prostate AdenocarcinomaProstate
代码名称组织
BRCA
浸润性乳腺癌乳腺
LUAD
肺腺癌
LUSC
肺鳞状细胞癌
MEL
黑色素瘤皮肤
CRC
结直肠癌
PAAD
胰腺腺癌胰腺
GBM
(无效——使用
GB
中枢神经系统/脑
GB
IDH野生型胶质母细胞瘤中枢神经系统/脑
AML
急性髓系白血病髓系
PRAD
前列腺腺癌前列腺

Common Patterns

常用模式

python
undefined
python
undefined

Pattern: Resolve free-text to OncoTree code

模式:将自由文本解析为OncoTree代码

results = OncoTree_search(query="pancreatic ductal adenocarcinoma")
results = OncoTree_search(query="pancreatic ductal adenocarcinoma")

Pick result with lowest level number (most specific match)

选择层级编号最小的结果(最精确匹配)

code = results["data"][0]["code"] # e.g., "PAAD"
code = results["data"][0]["code"] # 例如:"PAAD"

Pattern: Get all subtypes within a main type

模式:获取主类型下的所有亚型

results = OncoTree_search(query="Glioma") subtypes = [r for r in results["data"] if r["main_type"] == "Glioma"]
results = OncoTree_search(query="Glioma") subtypes = [r for r in results["data"] if r["main_type"] == "Glioma"]

Pattern: Validate code before OncoKB call

模式:在调用OncoKB前验证代码

detail = OncoTree_get_type(code="GB") if detail["status"] == "success": OncoKB_annotate_variant(gene="IDH1", variant="R132H", tumor_type="GB")
undefined
detail = OncoTree_get_type(code="GB") if detail["status"] == "success": OncoKB_annotate_variant(gene="IDH1", variant="R132H", tumor_type="GB")
undefined

Tumor Classification Reasoning (CRITICAL)

肿瘤分类推理(至关重要)

LOOK UP DON'T GUESS -- tumor classification determines treatment. Always verify codes and biomarker interpretation via tools rather than relying on memory.
务必查询,切勿猜测——肿瘤分类决定治疗方案。始终通过工具验证代码和生物标志物解读,而非依赖记忆。

Histological vs Molecular Classification

组织学分类vs分子分类

Tumors are classified on TWO axes -- both matter for treatment selection:
  • Histological (what it looks like under microscope): adenocarcinoma, squamous, small cell, etc. This determines the OncoTree hierarchy level 3+.
  • Molecular (what mutations/alterations drive it): EGFR-mutant, HER2-amplified, MSI-high, etc. This determines OncoKB therapeutic levels.
A tumor can be histologically identical to another but molecularly different, requiring different treatment. Example: two lung adenocarcinomas (both LUAD) but one is EGFR-mutant (targeted therapy) and another is KRAS-mutant (different targeted therapy). Always check both axes.
肿瘤从两个维度进行分类——两者对治疗选择都至关重要:
  • 组织学(显微镜下形态):腺癌、鳞状细胞癌、小细胞癌等。决定OncoTree层级的3级及以上。
  • 分子(驱动突变/改变):EGFR突变型、HER2扩增型、MSI高表达型等。决定OncoKB治疗等级。 两种肿瘤可能组织学相同但分子特征不同,需采用不同治疗方案。例如:两个肺腺癌(均为LUAD),一个是EGFR突变型(靶向治疗),另一个是KRAS突变型(不同靶向治疗)。务必同时检查两个维度。

Biomarker Interpretation Strategy

生物标志物解读策略

When interpreting cancer biomarkers, use OncoKB for actionability:
  • HER2: Positive = IHC 3+ or FISH-amplified. Use
    OncoKB_annotate_variant(gene="ERBB2", variant="Amplification", tumor_type="BRCA")
    for therapeutic level
  • ER/PR: Positive = hormone-receptor positive breast cancer. Changes treatment class (endocrine therapy)
  • Ki67: Proliferation index. High (>20%) suggests aggressive biology; used in breast cancer grading (Luminal A vs B)
  • TMB (Tumor Mutational Burden): High TMB (>10 mut/Mb) predicts immunotherapy response across tumor types. Use
    OncoKB_annotate_variant(gene="Other Biomarkers", variant="TMB-H")
  • MSI (Microsatellite Instability): MSI-High is FDA-approved biomarker for pembrolizumab pan-cancer. Use
    OncoKB_annotate_variant(gene="Other Biomarkers", variant="MSI-H")
解读癌症生物标志物时,使用OncoKB评估临床实用性:
  • HER2:阳性=IHC 3+或FISH扩增。使用
    OncoKB_annotate_variant(gene="ERBB2", variant="Amplification", tumor_type="BRCA")
    获取治疗等级
  • ER/PR:阳性=激素受体阳性乳腺癌。会改变治疗类别(内分泌治疗)
  • Ki67:增殖指数。高表达(>20%)提示侵袭性生物学特征;用于乳腺癌分级( Luminal A vs B)
  • TMB(肿瘤突变负荷):高TMB(>10 mut/Mb)可预测跨肿瘤类型的免疫治疗响应。使用
    OncoKB_annotate_variant(gene="Other Biomarkers", variant="TMB-H")
  • MSI(微卫星不稳定性):MSI高表达是帕博利珠单抗泛癌适应症的FDA批准生物标志物。使用
    OncoKB_annotate_variant(gene="Other Biomarkers", variant="MSI-H")

Staging vs Grading -- Different Concepts

分期vs分级——不同概念

  • Stage (TNM): How far has it spread? T=tumor size, N=lymph nodes, M=metastasis. Stage I-IV. Determines prognosis and surgery eligibility.
  • Grade: How abnormal do the cells look? Grade 1 (well-differentiated, slow) to Grade 3 (poorly-differentiated, aggressive). Determines aggressiveness.
  • A Stage I, Grade 3 tumor (small but aggressive) has different implications than Stage III, Grade 1 (spread but slow-growing).
  • 分期(TNM):肿瘤扩散程度?T=肿瘤大小,N=淋巴结,M=转移。分期I-IV。决定预后和手术资格。
  • 分级:细胞形态异常程度?分级1(高分化,生长缓慢)至分级3(低分化,侵袭性强)。决定侵袭性。
  • I期3级肿瘤(体积小但侵袭性强)与III期1级肿瘤(已扩散但生长缓慢)的临床意义不同。

Actionability Assessment

实用性评估

After classifying the tumor, assess whether findings are clinically actionable:
  1. Level 1 (FDA-approved, specific tumor type): Immediate treatment implication. Example: EGFR L858R in LUAD
  2. Level 2 (Standard care): Strong evidence but context-dependent
  3. Level 3 (Compelling evidence): Clinical trial candidates
  4. Level 4 (Biological evidence): Research-stage only
  5. Always provide the OncoTree code to OncoKB -- without it, you get pan-cancer levels which may understate or overstate actionability for the specific tumor type
分类肿瘤后,评估结果是否具有临床实用性:
  1. 1级(FDA批准,特定肿瘤类型):直接指导治疗。例如:LUAD中的EGFR L858R突变
  2. 2级(标准治疗):证据充分但需结合临床场景
  3. 3级(有力证据):临床试验候选者
  4. 4级(生物学证据):仅处于研究阶段
  5. 务必向OncoKB提供OncoTree代码——若无该代码,将返回泛癌等级,可能无法准确反映特定肿瘤类型的实用性

Reasoning Framework for Result Interpretation

结果解读推理框架

Evidence Grading

证据分级

GradeCriteriaExample
ConfirmedExact OncoTree code validated via
OncoTree_get_type
, UMLS + NCI cross-refs present
LUAD: validated, UMLS C0152013, NCI C3512
ProbableOncoTree search returns match, but code not yet validated or missing cross-refsSearch for "cholangiocarcinoma" returns CHOL with partial external refs
AmbiguousMultiple OncoTree codes match the description at different hierarchy levels"Breast cancer" matches BRCA (invasive), BREAST (tissue), IBC (inflammatory)
UnresolvedNo OncoTree match; tumor type too rare or novel for the ontologyUltra-rare sarcoma subtype not in OncoTree
等级标准示例
确认精确的OncoTree代码已通过
OncoTree_get_type
验证,且存在UMLS + NCI交叉引用
LUAD:已验证,UMLS C0152013,NCI C3512
可能OncoTree搜索返回匹配结果,但代码尚未验证或缺少交叉引用搜索"胆管癌"返回CHOL,仅部分外部引用
模糊多个OncoTree代码在不同层级匹配描述"乳腺癌"匹配BRCA(浸润性)、BREAST(组织)、IBC(炎性)
未解决无OncoTree匹配结果;肿瘤类型过于罕见或新颖,未纳入本体超罕见肉瘤亚型未收录于OncoTree

Interpretation Guidance

解读指南

  • OncoTree code confidence: Always validate candidate codes with
    OncoTree_get_type
    before downstream use. Some common acronyms (e.g., "GBM") are NOT valid OncoTree codes (correct code is "GB"). A validated code with UMLS and NCI cross-references is highest confidence.
  • UMLS/NCI cross-reference priority: For standardized reporting, NCI Thesaurus codes are preferred for cancer-specific contexts (used by caDSR, GDC). UMLS CUIs are broader (cross-disease) and useful for literature mining. When both are available, report both; when only one exists, NCI is preferred for oncology workflows.
  • Tissue hierarchy interpretation: OncoTree levels represent specificity: Level 1 = tissue of origin (e.g., "Lung"), Level 2 = main cancer type (e.g., "Non-Small Cell Lung Cancer"), Level 3+ = histological subtypes (e.g., "Lung Adenocarcinoma"). For OncoKB variant annotation, use the most specific (deepest) level that accurately describes the tumor. For cohort-level analysis (e.g., TCGA), the Level 2-3 code is typically appropriate.
  • OncoKB tumor type impact: Providing a tumor type code to OncoKB can change the therapeutic level (e.g., EGFR L858R is Level 1 in LUAD but Level 3B pan-cancer). Always use the validated OncoTree code for the patient's specific tumor type.
  • Deprecated or renamed codes: OncoTree evolves across versions. The
    history
    field in
    OncoTree_get_type
    response shows prior names. Always use the current code.
  • OncoTree代码可信度:在下游使用前,务必通过
    OncoTree_get_type
    验证候选代码。部分常见缩写(如"GBM")并非有效的OncoTree代码(正确代码为"GB")。带有UMLS和NCI交叉引用的已验证代码可信度最高。
  • UMLS/NCI交叉引用优先级:对于标准化报告,NCI叙词表代码在癌症特定场景中更受青睐(被caDSR、GDC使用)。UMLS CUI范围更广(跨疾病),适用于文献挖掘。若两者均存在,同时报告;若仅存在其一,肿瘤学工作流优先使用NCI代码。
  • 组织层级解读:OncoTree层级代表特异性:1级=起源组织(例如:"肺"),2级=主癌症类型(例如:"非小细胞肺癌"),3级及以上=组织学亚型(例如:"肺腺癌")。对于OncoKB变异注释,使用最能准确描述肿瘤的最具体(最深层)层级代码。对于队列分析(如TCGA),通常使用2-3级代码。
  • OncoKB肿瘤类型影响:向OncoKB提供肿瘤类型代码可能改变治疗等级(例如:EGFR L858R在LUAD中为1级,但泛癌为3B级)。始终使用患者特定肿瘤类型的已验证OncoTree代码。
  • 已弃用或重命名的代码:OncoTree版本迭代会更新代码。
    OncoTree_get_type
    响应中的
    history
    字段显示曾用名。始终使用当前有效代码。

Synthesis Questions

综合问题

  1. Does the chosen OncoTree code represent the most specific histological subtype, or could a more precise code provide better therapeutic annotation in OncoKB?
  2. When the free-text tumor description maps to multiple OncoTree codes, which hierarchy level best balances specificity and coverage for the analysis goal (variant annotation vs cohort selection)?
  3. Are the UMLS/NCI cross-references consistent with external classifications (WHO, ICD-O), or are there discrepancies that need resolution?

  1. 所选OncoTree代码是否代表最具体的组织学亚型?是否有更精确的代码能在OncoKB中提供更好的治疗注释?
  2. 当自由文本肿瘤描述匹配多个OncoTree代码时,哪个层级最能平衡分析目标(变异注释vs队列筛选)的特异性和覆盖范围?
  3. UMLS/NCI交叉引用是否与外部分类标准(WHO、ICD-O)一致?是否存在需要解决的差异?

Fallback Chains

备选流程链

PrimaryFallbackWhen
OncoTree_get_type(code="GBM")
OncoTree_search(query="glioblastoma")
404 for common aliases
OncoTree_search
(no results)
OncoTree_list_tissues
+ tissue-level search
Very rare/novel tumor types
OncoTree code for OncoKBOmit
tumor_type
param
Code not recognized by OncoKB
主流程备选流程触发场景
OncoTree_get_type(code="GBM")
OncoTree_search(query="glioblastoma")
常见别名返回404
OncoTree_search
(无结果)
OncoTree_list_tissues
+ 组织层级搜索
罕见/新型肿瘤类型
OncoTree代码用于OncoKB省略
tumor_type
参数
OncoKB不识别该代码