tooluniverse-plant-genomics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePlant Genomics & Biology
植物基因组学与生物学
Pipeline for investigating plant genes, metabolic pathways, species taxonomy, and comparative plant biology using ToolUniverse tools.
利用ToolUniverse工具开展植物基因、代谢通路、物种分类及比较植物生物学研究的流程。
Reasoning Strategy
推理策略
Plant genomes are large (wheat is ~17 Gb, vs. 3 Gb for human) and often polyploid — wheat is hexaploid (AABBDD), meaning there are three homeologous copies of most genes. When comparing plant genes to Arabidopsis, always account for whole-genome duplications: a single Arabidopsis gene may have 2–4 paralogs in a crop species, all potentially with diverged functions. Gene families are massively expanded in plants relative to animals (e.g., receptor-like kinases, cytochrome P450s, transcription factors) — a BLAST hit does not mean functional equivalence. Arabidopsis thaliana is the primary model, but its small genome and rapid life cycle mean some features (wood formation, nitrogen fixation symbiosis, C4 photosynthesis) are absent and must be studied in other species.
LOOK UP DON'T GUESS: Do not assume gene function by sequence similarity alone in polyploid species; look up functional validation evidence via UniProt (reviewed entries) or PlantReactome. Do not assume KEGG organism codes — use the table or query with the species name to confirm availability.
kegg_search_pathwayKey principles:
- Plant-specific pathways — photosynthesis, secondary metabolism, hormone signaling are unique to plants
- PlantReactome as foundation — curated plant pathway database with cross-species coverage (Oryza, Arabidopsis, Zea mays, etc.)
- Ensembl Plants for genomics — use Ensembl with plant species names for gene lookup and annotation
- KEGG for metabolism — KEGG has plant-specific organism codes (ath=Arabidopsis, osa=rice, zma=maize)
- Evidence grading — T1: functional validation (mutant phenotype), T2: expression/localization data, T3: ortholog-based prediction, T4: computational annotation only
植物基因组规模庞大(小麦基因组约17Gb,人类仅为3Gb),且多为多倍体——小麦是六倍体(AABBDD),意味着大多数基因存在三个同源拷贝。在将植物基因与拟南芥进行比较时,必须考虑全基因组复制事件:拟南芥中的单个基因在作物物种中可能有2–4个旁系同源基因,且这些基因的功能可能已经分化。相较于动物,植物的基因家族大幅扩张(如类受体激酶、细胞色素P450、转录因子)——BLAST比对结果并不等同于功能一致。拟南芥是主要的模式植物,但它的基因组较小、生命周期较短,导致部分特征(如木材形成、固氮共生、C4光合作用)缺失,必须在其他物种中开展研究。
查资料而非猜测:在多倍体物种中,不能仅通过序列相似性推断基因功能;需通过UniProt(已评审条目)或PlantReactome查找功能验证证据。不要主观假设KEGG物种编码——请使用表格或通过工具输入物种名称确认可用性。
kegg_search_pathway核心原则:
- 植物特有通路 —— 光合作用、次生代谢、激素信号通路是植物特有的
- 以PlantReactome为基础 —— 经过人工注释的植物通路数据库,支持跨物种覆盖(水稻、拟南芥、玉米等)
- 用Ensembl Plants做基因组分析 —— 结合植物物种名称,使用Ensembl进行基因查询与注释
- 用KEGG做代谢分析 —— KEGG有植物专属物种编码(ath=拟南芥,osa=水稻,zma=玉米)
- 证据分级 —— T1:功能验证(突变体表型),T2:表达/定位数据,T3:基于同源基因的预测,T4:仅为计算注释
When to Use
适用场景
- "What pathway is [plant gene] involved in?"
- "Find genes in the flavonoid biosynthesis pathway"
- "Compare [gene] across Arabidopsis and rice"
- "What species is [plant name]?"
- "Plant hormone signaling pathways"
- "Photosynthesis gene annotation"
Not this skill: For general pathway analysis (human/mouse), use . For phylogenetics, use .
tooluniverse-systems-biologytooluniverse-phylogenetics- "[植物基因]参与哪些通路?"
- "查找类黄酮生物合成通路中的基因"
- "比较拟南芥与水稻中的[基因]"
- "[植物名称]属于什么物种?"
- "植物激素信号通路"
- "光合作用基因注释"
非本技能适用场景:若进行通用通路分析(人类/小鼠),请使用。若开展系统发育研究,请使用。
tooluniverse-systems-biologytooluniverse-phylogeneticsCore Tools
核心工具
| Tool | Use For |
|---|---|
| Search plant-specific pathways by keyword |
| Get pathway details (genes, reactions, species) |
| List all species covered by PlantReactome |
| Search Plants of the World Online (taxonomy, distribution) |
| Gene lookup — use with plant species (e.g., |
| Search KEGG pathways (use plant organism codes: ath, osa, zma) |
| Get genes in a plant pathway (e.g., |
| Search plant protein sequences (add |
| Get protein function annotation |
| Plant biology literature |
| Cross-species plant gene comparison |
| 工具 | 用途 |
|---|---|
| 按关键词搜索植物特有的通路 |
| 获取通路详情(基因、反应、物种) |
| 列出PlantReactome覆盖的所有物种 |
| 搜索世界植物在线(POWO)的分类学、分布信息 |
| 基因查询——需搭配植物物种使用(如 |
| 搜索KEGG通路(使用植物物种编码:ath、osa、zma) |
| 获取植物通路中的基因(如 |
| 搜索植物蛋白序列(添加 |
| 获取蛋白功能注释 |
| 查找植物生物学相关文献 |
| 植物基因跨物种比较 |
Workflow
工作流程
Phase 0: Species & Gene Identification
Species name → POWO taxonomy; Gene symbol → Ensembl/UniProt IDs
|
Phase 1: Gene Function & Annotation
UniProt function, Ensembl annotation, InterPro domains
|
Phase 2: Pathway Analysis
PlantReactome → plant-specific pathways; KEGG → metabolism
|
Phase 3: Cross-Species Comparison
Ensembl Compara → orthologs in other plant species
|
Phase 4: Literature & Report
PubMed → published studies; synthesis阶段0:物种与基因鉴定
物种名称 → POWO分类学信息;基因符号 → Ensembl/UniProt ID
|
阶段1:基因功能与注释
UniProt功能信息、Ensembl注释、InterPro结构域
|
阶段2:通路分析
PlantReactome → 植物特有通路;KEGG → 代谢通路
|
阶段3:跨物种比较
Ensembl Compara → 其他植物物种中的同源基因
|
阶段4:文献与报告
PubMed → 已发表研究;结果整合Phase 1: Gene Function
阶段1:基因功能
python
undefinedpython
undefinedLook up an Arabidopsis gene
查询拟南芥基因
ensembl_lookup_gene(gene_symbol="CHS", species="arabidopsis_thaliana")
ensembl_lookup_gene(gene_symbol="CHS", species="arabidopsis_thaliana")
Get protein function
获取蛋白功能
UniProt_search(query="CHS AND taxonomy_id:3702 AND reviewed:true")
undefinedUniProt_search(query="CHS AND taxonomy_id:3702 AND reviewed:true")
undefinedPhase 2: Plant Pathway Analysis
阶段2:植物通路分析
Key plant-specific KEGG pathways:
| Pathway | KEGG ID (Arabidopsis) | Biological Significance |
|---|---|---|
| Photosynthesis | ath00195 | Light reactions, electron transport |
| Carbon fixation (Calvin cycle) | ath00710 | CO2 → sugar |
| Flavonoid biosynthesis | ath00941 | UV protection, pigmentation, defense |
| Carotenoid biosynthesis | ath00906 | Photoprotection, vitamin A precursors |
| Auxin signaling | ath04075 | Growth, tropisms |
| Brassinosteroid signaling | ath04712 | Cell elongation, stress response |
| Circadian rhythm (plant) | ath04712 | Photoperiod, flowering time |
| Terpenoid backbone | ath00900 | Secondary metabolite precursors |
| Starch/sucrose metabolism | ath00500 | Carbon partitioning |
| Nitrogen metabolism | ath00910 | Nitrogen assimilation |
python
undefined关键植物特有KEGG通路:
| 通路 | KEGG ID(拟南芥) | 生物学意义 |
|---|---|---|
| 光合作用 | ath00195 | 光反应、电子传递 |
| 碳固定(卡尔文循环) | ath00710 | CO₂ → 糖类 |
| 类黄酮生物合成 | ath00941 | UV防护、色素沉积、防御 |
| 类胡萝卜素生物合成 | ath00906 | 光保护、维生素A前体 |
| 生长素信号通路 | ath04075 | 生长、向性 |
| 油菜素甾醇信号通路 | ath04712 | 细胞伸长、胁迫响应 |
| 植物昼夜节律 | ath04712 | 光周期、开花时间 |
| 萜类骨架合成 | ath00900 | 次生代谢物前体 |
| 淀粉/蔗糖代谢 | ath00500 | 碳分配 |
| 氮代谢 | ath00910 | 氮同化 |
python
undefinedSearch PlantReactome for flavonoid pathway
在PlantReactome中搜索类黄酮通路
PlantReactome_search_pathways(query="flavonoid")
PlantReactome_search_pathways(query="flavonoid")
Get genes in Arabidopsis flavonoid biosynthesis
获取拟南芥类黄酮生物合成通路中的基因
KEGG_get_pathway_genes(pathway_id="ath00941")
undefinedKEGG_get_pathway_genes(pathway_id="ath00941")
undefinedPhase 3: Species Comparison
阶段3:物种比较
KEGG organism codes for major crops:
| Species | Code | Common Name |
|---|---|---|
| Arabidopsis thaliana | ath | Thale cress (model plant) |
| Oryza sativa | osa | Rice |
| Zea mays | zma | Maize/corn |
| Triticum aestivum | tae | Wheat |
| Glycine max | gmx | Soybean |
| Solanum lycopersicum | sly | Tomato |
| Nicotiana tabacum | nta | Tobacco |
| Medicago truncatula | mtr | Barrel medic (legume model) |
主要作物的KEGG物种编码:
| 物种 | 编码 | 通用名称 |
|---|---|---|
| Arabidopsis thaliana | ath | 拟南芥(模式植物) |
| Oryza sativa | osa | 水稻 |
| Zea mays | zma | 玉米 |
| Triticum aestivum | tae | 小麦 |
| Glycine max | gmx | 大豆 |
| Solanum lycopersicum | sly | 番茄 |
| Nicotiana tabacum | nta | 烟草 |
| Medicago truncatula | mtr | 蒺藜苜蓿(豆科模式植物) |
Phase 4: Interpretation Framework
阶段4:解读框架
Evidence grading: T1 = mutant phenotype confirms function; T2 = expression/localization data; T3 = ortholog has validated function in model species; T4 = computational annotation only (domain/GO term). Prioritize T1/T2 evidence; treat T3/T4 as hypotheses requiring further validation.
证据分级:T1 = 突变体表型确认功能;T2 = 表达/定位数据;T3 = 模式物种中同源基因功能已验证;T4 = 仅为计算注释(结构域/GO术语)。优先采用T1/T2级证据;将T3/T4视为需进一步验证的假说。
Synthesis Questions
整合分析问题
- Is the gene plant-specific or conserved? (Plant-specific genes often in secondary metabolism; conserved genes in primary metabolism)
- Which tissues/developmental stages express it? (Root vs shoot vs flower vs seed)
- Is there a crop improvement application? (Yield, stress tolerance, nutritional quality)
- What regulatory mechanisms control it? (Hormone-responsive, light-regulated, circadian)
- Are there natural variants with known phenotypes? (Accession diversity in Arabidopsis 1001 Genomes)
- 该基因是植物特有还是保守基因?(植物特有基因常参与次生代谢;保守基因多参与初级代谢)
- 它在哪些组织/发育阶段表达?(根 vs 茎 vs 花 vs 种子)
- 是否存在作物改良应用场景?(产量、抗逆性、营养品质)
- 哪些调控机制控制它的表达?(激素响应、光调控、昼夜节律)
- 是否存在具有已知表型的自然变异?(拟南芥1001基因组中的种质多样性)
Limitations
局限性
- No TAIR tool — The Arabidopsis Information Resource has no public REST API. Use Ensembl Plants and UniProt as alternatives for Arabidopsis gene data.
- PlantReactome coverage — Focused on Oryza sativa (rice) with cross-references to Arabidopsis. Not all plant species equally covered.
- No crop breeding tools — This skill covers gene/pathway analysis, not marker-assisted selection or breeding simulation.
- POWO is taxonomy-focused — Plants of the World Online provides species identification and distribution, not genomics data.
- 无TAIR工具 —— 拟南芥信息资源(TAIR)无公开REST API。请使用Ensembl Plants和UniProt作为拟南芥基因数据的替代来源。
- PlantReactome覆盖范围 —— 以水稻(Oryza sativa)为重点,交叉引用拟南芥数据。并非所有植物物种的覆盖程度都相同。
- 无作物育种工具 —— 本技能涵盖基因/通路分析,不涉及分子标记辅助选择或育种模拟。
- POWO侧重分类学 —— 世界植物在线(POWO)提供物种鉴定与分布信息,不包含基因组学数据。