tooluniverse-plant-genomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Plant Genomics & Biology

植物基因组学与生物学

Pipeline for investigating plant genes, metabolic pathways, species taxonomy, and comparative plant biology using ToolUniverse tools.
利用ToolUniverse工具开展植物基因、代谢通路、物种分类及比较植物生物学研究的流程。

Reasoning Strategy

推理策略

Plant genomes are large (wheat is ~17 Gb, vs. 3 Gb for human) and often polyploid — wheat is hexaploid (AABBDD), meaning there are three homeologous copies of most genes. When comparing plant genes to Arabidopsis, always account for whole-genome duplications: a single Arabidopsis gene may have 2–4 paralogs in a crop species, all potentially with diverged functions. Gene families are massively expanded in plants relative to animals (e.g., receptor-like kinases, cytochrome P450s, transcription factors) — a BLAST hit does not mean functional equivalence. Arabidopsis thaliana is the primary model, but its small genome and rapid life cycle mean some features (wood formation, nitrogen fixation symbiosis, C4 photosynthesis) are absent and must be studied in other species.
LOOK UP DON'T GUESS: Do not assume gene function by sequence similarity alone in polyploid species; look up functional validation evidence via UniProt (reviewed entries) or PlantReactome. Do not assume KEGG organism codes — use the table or query
kegg_search_pathway
with the species name to confirm availability.
Key principles:
  1. Plant-specific pathways — photosynthesis, secondary metabolism, hormone signaling are unique to plants
  2. PlantReactome as foundation — curated plant pathway database with cross-species coverage (Oryza, Arabidopsis, Zea mays, etc.)
  3. Ensembl Plants for genomics — use Ensembl with plant species names for gene lookup and annotation
  4. KEGG for metabolism — KEGG has plant-specific organism codes (ath=Arabidopsis, osa=rice, zma=maize)
  5. Evidence grading — T1: functional validation (mutant phenotype), T2: expression/localization data, T3: ortholog-based prediction, T4: computational annotation only

植物基因组规模庞大(小麦基因组约17Gb,人类仅为3Gb),且多为多倍体——小麦是六倍体(AABBDD),意味着大多数基因存在三个同源拷贝。在将植物基因与拟南芥进行比较时,必须考虑全基因组复制事件:拟南芥中的单个基因在作物物种中可能有2–4个旁系同源基因,且这些基因的功能可能已经分化。相较于动物,植物的基因家族大幅扩张(如类受体激酶、细胞色素P450、转录因子)——BLAST比对结果并不等同于功能一致。拟南芥是主要的模式植物,但它的基因组较小、生命周期较短,导致部分特征(如木材形成、固氮共生、C4光合作用)缺失,必须在其他物种中开展研究。
查资料而非猜测:在多倍体物种中,不能仅通过序列相似性推断基因功能;需通过UniProt(已评审条目)或PlantReactome查找功能验证证据。不要主观假设KEGG物种编码——请使用表格或通过
kegg_search_pathway
工具输入物种名称确认可用性。
核心原则:
  1. 植物特有通路 —— 光合作用、次生代谢、激素信号通路是植物特有的
  2. 以PlantReactome为基础 —— 经过人工注释的植物通路数据库,支持跨物种覆盖(水稻、拟南芥、玉米等)
  3. 用Ensembl Plants做基因组分析 —— 结合植物物种名称,使用Ensembl进行基因查询与注释
  4. 用KEGG做代谢分析 —— KEGG有植物专属物种编码(ath=拟南芥,osa=水稻,zma=玉米)
  5. 证据分级 —— T1:功能验证(突变体表型),T2:表达/定位数据,T3:基于同源基因的预测,T4:仅为计算注释

When to Use

适用场景

  • "What pathway is [plant gene] involved in?"
  • "Find genes in the flavonoid biosynthesis pathway"
  • "Compare [gene] across Arabidopsis and rice"
  • "What species is [plant name]?"
  • "Plant hormone signaling pathways"
  • "Photosynthesis gene annotation"
Not this skill: For general pathway analysis (human/mouse), use
tooluniverse-systems-biology
. For phylogenetics, use
tooluniverse-phylogenetics
.

  • "[植物基因]参与哪些通路?"
  • "查找类黄酮生物合成通路中的基因"
  • "比较拟南芥与水稻中的[基因]"
  • "[植物名称]属于什么物种?"
  • "植物激素信号通路"
  • "光合作用基因注释"
非本技能适用场景:若进行通用通路分析(人类/小鼠),请使用
tooluniverse-systems-biology
。若开展系统发育研究,请使用
tooluniverse-phylogenetics

Core Tools

核心工具

ToolUse For
PlantReactome_search_pathways
Search plant-specific pathways by keyword
PlantReactome_get_pathway
Get pathway details (genes, reactions, species)
PlantReactome_list_species
List all species covered by PlantReactome
POWO_search_plants
Search Plants of the World Online (taxonomy, distribution)
ensembl_lookup_gene
Gene lookup — use with plant species (e.g.,
species="arabidopsis_thaliana"
)
kegg_search_pathway
Search KEGG pathways (use plant organism codes: ath, osa, zma)
KEGG_get_pathway_genes
Get genes in a plant pathway (e.g.,
pathway_id="ath00941"
for flavonoid in Arabidopsis)
UniProt_search
Search plant protein sequences (add
taxonomy_id:3702
for Arabidopsis)
UniProt_get_function_by_accession
Get protein function annotation
PubMed_search_articles
Plant biology literature
EnsemblCompara_get_orthologues
Cross-species plant gene comparison

工具用途
PlantReactome_search_pathways
按关键词搜索植物特有的通路
PlantReactome_get_pathway
获取通路详情(基因、反应、物种)
PlantReactome_list_species
列出PlantReactome覆盖的所有物种
POWO_search_plants
搜索世界植物在线(POWO)的分类学、分布信息
ensembl_lookup_gene
基因查询——需搭配植物物种使用(如
species="arabidopsis_thaliana"
kegg_search_pathway
搜索KEGG通路(使用植物物种编码:ath、osa、zma)
KEGG_get_pathway_genes
获取植物通路中的基因(如
pathway_id="ath00941"
对应拟南芥类黄酮通路)
UniProt_search
搜索植物蛋白序列(添加
taxonomy_id:3702
可筛选拟南芥)
UniProt_get_function_by_accession
获取蛋白功能注释
PubMed_search_articles
查找植物生物学相关文献
EnsemblCompara_get_orthologues
植物基因跨物种比较

Workflow

工作流程

Phase 0: Species & Gene Identification
  Species name → POWO taxonomy; Gene symbol → Ensembl/UniProt IDs
    |
Phase 1: Gene Function & Annotation
  UniProt function, Ensembl annotation, InterPro domains
    |
Phase 2: Pathway Analysis
  PlantReactome → plant-specific pathways; KEGG → metabolism
    |
Phase 3: Cross-Species Comparison
  Ensembl Compara → orthologs in other plant species
    |
Phase 4: Literature & Report
  PubMed → published studies; synthesis
阶段0:物种与基因鉴定
  物种名称 → POWO分类学信息;基因符号 → Ensembl/UniProt ID
    |
阶段1:基因功能与注释
  UniProt功能信息、Ensembl注释、InterPro结构域
    |
阶段2:通路分析
  PlantReactome → 植物特有通路;KEGG → 代谢通路
    |
阶段3:跨物种比较
  Ensembl Compara → 其他植物物种中的同源基因
    |
阶段4:文献与报告
  PubMed → 已发表研究;结果整合

Phase 1: Gene Function

阶段1:基因功能

python
undefined
python
undefined

Look up an Arabidopsis gene

查询拟南芥基因

ensembl_lookup_gene(gene_symbol="CHS", species="arabidopsis_thaliana")
ensembl_lookup_gene(gene_symbol="CHS", species="arabidopsis_thaliana")

Get protein function

获取蛋白功能

UniProt_search(query="CHS AND taxonomy_id:3702 AND reviewed:true")
undefined
UniProt_search(query="CHS AND taxonomy_id:3702 AND reviewed:true")
undefined

Phase 2: Plant Pathway Analysis

阶段2:植物通路分析

Key plant-specific KEGG pathways:
PathwayKEGG ID (Arabidopsis)Biological Significance
Photosynthesisath00195Light reactions, electron transport
Carbon fixation (Calvin cycle)ath00710CO2 → sugar
Flavonoid biosynthesisath00941UV protection, pigmentation, defense
Carotenoid biosynthesisath00906Photoprotection, vitamin A precursors
Auxin signalingath04075Growth, tropisms
Brassinosteroid signalingath04712Cell elongation, stress response
Circadian rhythm (plant)ath04712Photoperiod, flowering time
Terpenoid backboneath00900Secondary metabolite precursors
Starch/sucrose metabolismath00500Carbon partitioning
Nitrogen metabolismath00910Nitrogen assimilation
python
undefined
关键植物特有KEGG通路:
通路KEGG ID(拟南芥)生物学意义
光合作用ath00195光反应、电子传递
碳固定(卡尔文循环)ath00710CO₂ → 糖类
类黄酮生物合成ath00941UV防护、色素沉积、防御
类胡萝卜素生物合成ath00906光保护、维生素A前体
生长素信号通路ath04075生长、向性
油菜素甾醇信号通路ath04712细胞伸长、胁迫响应
植物昼夜节律ath04712光周期、开花时间
萜类骨架合成ath00900次生代谢物前体
淀粉/蔗糖代谢ath00500碳分配
氮代谢ath00910氮同化
python
undefined

Search PlantReactome for flavonoid pathway

在PlantReactome中搜索类黄酮通路

PlantReactome_search_pathways(query="flavonoid")
PlantReactome_search_pathways(query="flavonoid")

Get genes in Arabidopsis flavonoid biosynthesis

获取拟南芥类黄酮生物合成通路中的基因

KEGG_get_pathway_genes(pathway_id="ath00941")
undefined
KEGG_get_pathway_genes(pathway_id="ath00941")
undefined

Phase 3: Species Comparison

阶段3:物种比较

KEGG organism codes for major crops:
SpeciesCodeCommon Name
Arabidopsis thalianaathThale cress (model plant)
Oryza sativaosaRice
Zea mayszmaMaize/corn
Triticum aestivumtaeWheat
Glycine maxgmxSoybean
Solanum lycopersicumslyTomato
Nicotiana tabacumntaTobacco
Medicago truncatulamtrBarrel medic (legume model)
主要作物的KEGG物种编码:
物种编码通用名称
Arabidopsis thalianaath拟南芥(模式植物)
Oryza sativaosa水稻
Zea mayszma玉米
Triticum aestivumtae小麦
Glycine maxgmx大豆
Solanum lycopersicumsly番茄
Nicotiana tabacumnta烟草
Medicago truncatulamtr蒺藜苜蓿(豆科模式植物)

Phase 4: Interpretation Framework

阶段4:解读框架

Evidence grading: T1 = mutant phenotype confirms function; T2 = expression/localization data; T3 = ortholog has validated function in model species; T4 = computational annotation only (domain/GO term). Prioritize T1/T2 evidence; treat T3/T4 as hypotheses requiring further validation.
证据分级:T1 = 突变体表型确认功能;T2 = 表达/定位数据;T3 = 模式物种中同源基因功能已验证;T4 = 仅为计算注释(结构域/GO术语)。优先采用T1/T2级证据;将T3/T4视为需进一步验证的假说。

Synthesis Questions

整合分析问题

  1. Is the gene plant-specific or conserved? (Plant-specific genes often in secondary metabolism; conserved genes in primary metabolism)
  2. Which tissues/developmental stages express it? (Root vs shoot vs flower vs seed)
  3. Is there a crop improvement application? (Yield, stress tolerance, nutritional quality)
  4. What regulatory mechanisms control it? (Hormone-responsive, light-regulated, circadian)
  5. Are there natural variants with known phenotypes? (Accession diversity in Arabidopsis 1001 Genomes)

  1. 该基因是植物特有还是保守基因?(植物特有基因常参与次生代谢;保守基因多参与初级代谢)
  2. 它在哪些组织/发育阶段表达?(根 vs 茎 vs 花 vs 种子)
  3. 是否存在作物改良应用场景?(产量、抗逆性、营养品质)
  4. 哪些调控机制控制它的表达?(激素响应、光调控、昼夜节律)
  5. 是否存在具有已知表型的自然变异?(拟南芥1001基因组中的种质多样性)

Limitations

局限性

  • No TAIR tool — The Arabidopsis Information Resource has no public REST API. Use Ensembl Plants and UniProt as alternatives for Arabidopsis gene data.
  • PlantReactome coverage — Focused on Oryza sativa (rice) with cross-references to Arabidopsis. Not all plant species equally covered.
  • No crop breeding tools — This skill covers gene/pathway analysis, not marker-assisted selection or breeding simulation.
  • POWO is taxonomy-focused — Plants of the World Online provides species identification and distribution, not genomics data.
  • 无TAIR工具 —— 拟南芥信息资源(TAIR)无公开REST API。请使用Ensembl Plants和UniProt作为拟南芥基因数据的替代来源。
  • PlantReactome覆盖范围 —— 以水稻(Oryza sativa)为重点,交叉引用拟南芥数据。并非所有植物物种的覆盖程度都相同。
  • 无作物育种工具 —— 本技能涵盖基因/通路分析,不涉及分子标记辅助选择或育种模拟。
  • POWO侧重分类学 —— 世界植物在线(POWO)提供物种鉴定与分布信息,不包含基因组学数据。