kegg-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKEGG Database
KEGG数据库
Overview
概述
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis and molecular interaction networks.
Important: KEGG API is made available only for academic use by academic users.
KEGG(京都基因与基因组百科全书)是一个用于生物通路分析和分子相互作用网络的综合性生物信息学资源。
重要提示:KEGG API仅对学术用户开放学术用途。
When to Use This Skill
何时使用本工具
This skill should be used when querying pathways, genes, compounds, enzymes, diseases, and drugs across multiple organisms using KEGG's REST API.
当你需要通过KEGG的REST API查询多物种的通路、基因、化合物、酶、疾病和药物相关数据时,可使用本工具。
Quick Start
快速开始
The skill provides:
- Python helper functions () for all KEGG REST API operations
scripts/kegg_api.py - Comprehensive reference documentation () with detailed API specifications
references/kegg_reference.md
When users request KEGG data, determine which operation is needed and use the appropriate function from .
scripts/kegg_api.py本工具提供:
- 用于所有KEGG REST API操作的Python辅助函数()
scripts/kegg_api.py - 包含详细API规范的综合性参考文档()
references/kegg_reference.md
当用户请求KEGG数据时,确定所需操作类型,然后使用中的对应函数。
scripts/kegg_api.pyCore Operations
核心操作
1. Database Information (kegg_info
)
kegg_info1. 数据库信息查询(kegg_info
)
kegg_infoRetrieve metadata and statistics about KEGG databases.
When to use: Understanding database structure, checking available data, getting release information.
Usage:
python
from scripts.kegg_api import kegg_info获取KEGG数据库的元数据和统计信息。
适用场景:了解数据库结构、检查可用数据、获取版本发布信息。
使用示例:
python
from scripts.kegg_api import kegg_infoGet pathway database info
获取通路数据库信息
info = kegg_info('pathway')
info = kegg_info('pathway')
Get organism-specific info
获取物种特异性信息
hsa_info = kegg_info('hsa') # Human genome
**Common databases**: `kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`hsa_info = kegg_info('hsa') # 人类基因组
**常见数据库**:`kegg`, `pathway`, `module`, `brite`, `genes`, `genome`, `compound`, `glycan`, `reaction`, `enzyme`, `disease`, `drug`2. Listing Entries (kegg_list
)
kegg_list2. 条目列表查询(kegg_list
)
kegg_listList entry identifiers and names from KEGG databases.
When to use: Getting all pathways for an organism, listing genes, retrieving compound catalogs.
Usage:
python
from scripts.kegg_api import kegg_list列出KEGG数据库中的条目标识符和名称。
适用场景:获取某物种的所有通路、列出基因、检索化合物目录。
使用示例:
python
from scripts.kegg_api import kegg_listList all reference pathways
列出所有参考通路
pathways = kegg_list('pathway')
pathways = kegg_list('pathway')
List human-specific pathways
列出人类特异性通路
hsa_pathways = kegg_list('pathway', 'hsa')
hsa_pathways = kegg_list('pathway', 'hsa')
List specific genes (max 10)
列出特定基因(最多10个)
genes = kegg_list('hsa:10458+hsa:10459')
**Common organism codes**: `hsa` (human), `mmu` (mouse), `dme` (fruit fly), `sce` (yeast), `eco` (E. coli)genes = kegg_list('hsa:10458+hsa:10459')
**常见物种代码**:`hsa`(人类)、`mmu`(小鼠)、`dme`(果蝇)、`sce`(酵母)、`eco`(大肠杆菌)3. Searching (kegg_find
)
kegg_find3. 搜索功能(kegg_find
)
kegg_findSearch KEGG databases by keywords or molecular properties.
When to use: Finding genes by name/description, searching compounds by formula or mass, discovering entries by keywords.
Usage:
python
from scripts.kegg_api import kegg_find通过关键词或分子属性搜索KEGG数据库。
适用场景:按名称/描述查找基因、按分子式或质量搜索化合物、通过关键词发现相关条目。
使用示例:
python
from scripts.kegg_api import kegg_findKeyword search
关键词搜索
results = kegg_find('genes', 'p53')
shiga_toxin = kegg_find('genes', 'shiga toxin')
results = kegg_find('genes', 'p53')
shiga_toxin = kegg_find('genes', 'shiga toxin')
Chemical formula search (exact match)
分子式搜索(精确匹配)
compounds = kegg_find('compound', 'C7H10N4O2', 'formula')
compounds = kegg_find('compound', 'C7H10N4O2', 'formula')
Molecular weight range search
精确质量范围搜索
drugs = kegg_find('drug', '300-310', 'exact_mass')
**Search options**: `formula` (exact match), `exact_mass` (range), `mol_weight` (range)drugs = kegg_find('drug', '300-310', 'exact_mass')
**搜索选项**:`formula`(精确匹配)、`exact_mass`(范围)、`mol_weight`(范围)4. Retrieving Entries (kegg_get
)
kegg_get4. 条目详情获取(kegg_get
)
kegg_getGet complete database entries or specific data formats.
When to use: Retrieving pathway details, getting gene/protein sequences, downloading pathway maps, accessing compound structures.
Usage:
python
from scripts.kegg_api import kegg_get获取完整的数据库条目或特定格式的数据。
适用场景:获取通路详情、获取基因/蛋白质序列、下载通路图谱、访问化合物结构。
使用示例:
python
from scripts.kegg_api import kegg_getGet pathway entry
获取通路条目
pathway = kegg_get('hsa00010') # Glycolysis pathway
pathway = kegg_get('hsa00010') # 糖酵解通路
Get multiple entries (max 10)
获取多个条目(最多10个)
genes = kegg_get(['hsa:10458', 'hsa:10459'])
genes = kegg_get(['hsa:10458', 'hsa:10459'])
Get protein sequence (FASTA)
获取蛋白质序列(FASTA格式)
sequence = kegg_get('hsa:10458', 'aaseq')
sequence = kegg_get('hsa:10458', 'aaseq')
Get nucleotide sequence
获取核苷酸序列
nt_seq = kegg_get('hsa:10458', 'ntseq')
nt_seq = kegg_get('hsa:10458', 'ntseq')
Get compound structure
获取化合物结构(MOL格式)
mol_file = kegg_get('cpd:C00002', 'mol') # ATP in MOL format
mol_file = kegg_get('cpd:C00002', 'mol') # ATP的MOL格式文件
Get pathway as JSON (single entry only)
获取通路的JSON格式数据(仅支持单个条目)
pathway_json = kegg_get('hsa05130', 'json')
pathway_json = kegg_get('hsa05130', 'json')
Get pathway image (single entry only)
获取通路图片(仅支持单个条目)
pathway_img = kegg_get('hsa05130', 'image')
**Output formats**: `aaseq` (protein FASTA), `ntseq` (nucleotide FASTA), `mol` (MOL format), `kcf` (KCF format), `image` (PNG), `kgml` (XML), `json` (pathway JSON)
**Important**: Image, KGML, and JSON formats allow only one entry at a time.pathway_img = kegg_get('hsa05130', 'image')
**输出格式**:`aaseq`(蛋白质FASTA)、`ntseq`(核苷酸FASTA)、`mol`(MOL格式)、`kcf`(KCF格式)、`image`(PNG图片)、`kgml`(XML格式)、`json`(通路JSON格式)
**重要提示**:图片、KGML和JSON格式仅支持单次查询一个条目。5. ID Conversion (kegg_conv
)
kegg_conv5. ID转换(kegg_conv
)
kegg_convConvert identifiers between KEGG and external databases.
When to use: Integrating KEGG data with other databases, mapping gene IDs, converting compound identifiers.
Usage:
python
from scripts.kegg_api import kegg_conv在KEGG数据库与外部数据库之间转换标识符。
适用场景:整合KEGG数据与其他数据库、映射基因ID、转换化合物标识符。
使用示例:
python
from scripts.kegg_api import kegg_convConvert all human genes to NCBI Gene IDs
将所有人类基因ID转换为NCBI Gene ID
conversions = kegg_conv('ncbi-geneid', 'hsa')
conversions = kegg_conv('ncbi-geneid', 'hsa')
Convert specific gene
转换特定基因ID
gene_id = kegg_conv('ncbi-geneid', 'hsa:10458')
gene_id = kegg_conv('ncbi-geneid', 'hsa:10458')
Convert to UniProt
转换为UniProt ID
uniprot_id = kegg_conv('uniprot', 'hsa:10458')
uniprot_id = kegg_conv('uniprot', 'hsa:10458')
Convert compounds to PubChem
将化合物ID转换为PubChem ID
pubchem_ids = kegg_conv('pubchem', 'compound')
pubchem_ids = kegg_conv('pubchem', 'compound')
Reverse conversion (NCBI Gene ID to KEGG)
反向转换(NCBI Gene ID转KEGG ID)
kegg_id = kegg_conv('hsa', 'ncbi-geneid')
**Supported conversions**: `ncbi-geneid`, `ncbi-proteinid`, `uniprot`, `pubchem`, `chebi`kegg_id = kegg_conv('hsa', 'ncbi-geneid')
**支持的转换类型**:`ncbi-geneid`、`ncbi-proteinid`、`uniprot`、`pubchem`、`chebi`6. Cross-Referencing (kegg_link
)
kegg_link6. 交叉引用(kegg_link
)
kegg_linkFind related entries within and between KEGG databases.
When to use: Finding pathways containing genes, getting genes in a pathway, mapping genes to KO groups, finding compounds in pathways.
Usage:
python
from scripts.kegg_api import kegg_link在KEGG数据库内部和数据库之间查找相关条目。
适用场景:查找包含某基因的通路、获取某通路中的基因、将基因映射到KO组、查找某通路中的化合物。
使用示例:
python
from scripts.kegg_api import kegg_linkFind pathways linked to human genes
查找与人类基因相关的通路
pathways = kegg_link('pathway', 'hsa')
pathways = kegg_link('pathway', 'hsa')
Get genes in a specific pathway
获取特定通路中的基因
genes = kegg_link('genes', 'hsa00010') # Glycolysis genes
genes = kegg_link('genes', 'hsa00010') # 糖酵解通路中的基因
Find pathways containing a specific gene
查找包含特定基因的通路
gene_pathways = kegg_link('pathway', 'hsa:10458')
gene_pathways = kegg_link('pathway', 'hsa:10458')
Find compounds in a pathway
查找某通路中的化合物
compounds = kegg_link('compound', 'hsa00010')
compounds = kegg_link('compound', 'hsa00010')
Map genes to KO (orthology) groups
将基因映射到KO(直系同源)组
ko_groups = kegg_link('ko', 'hsa:10458')
**Common links**: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)ko_groups = kegg_link('ko', 'hsa:10458')
**常见关联类型**:基因↔通路、通路↔化合物、通路↔酶、基因↔ko(直系同源)7. Drug-Drug Interactions (kegg_ddi
)
kegg_ddi7. 药物-药物相互作用(kegg_ddi
)
kegg_ddiCheck for drug-drug interactions.
When to use: Analyzing drug combinations, checking for contraindications, pharmacological research.
Usage:
python
from scripts.kegg_api import kegg_ddi检查药物之间的相互作用。
适用场景:分析药物组合、检查禁忌、药理学研究。
使用示例:
python
from scripts.kegg_api import kegg_ddiCheck single drug
检查单个药物的相互作用
interactions = kegg_ddi('D00001')
interactions = kegg_ddi('D00001')
Check multiple drugs (max 10)
检查多个药物的相互作用(最多10个)
interactions = kegg_ddi(['D00001', 'D00002', 'D00003'])
undefinedinteractions = kegg_ddi(['D00001', 'D00002', 'D00003'])
undefinedCommon Analysis Workflows
常见分析工作流
Workflow 1: Gene to Pathway Mapping
工作流1:基因到通路的映射
Use case: Finding pathways associated with genes of interest (e.g., for pathway enrichment analysis).
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get适用场景:查找与目标基因相关的通路(例如用于通路富集分析)。
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_getStep 1: Find gene ID by name
步骤1:通过名称查找基因ID
gene_results = kegg_find('genes', 'p53')
gene_results = kegg_find('genes', 'p53')
Step 2: Link gene to pathways
步骤2:将基因关联到通路
pathways = kegg_link('pathway', 'hsa:7157') # TP53 gene
pathways = kegg_link('pathway', 'hsa:7157') # TP53基因
Step 3: Get detailed pathway information
步骤3:获取详细的通路信息
for pathway_line in pathways.split('\n'):
if pathway_line:
pathway_id = pathway_line.split('\t')[1].replace('path:', '')
pathway_info = kegg_get(pathway_id)
# Process pathway information
undefinedfor pathway_line in pathways.split('\n'):
if pathway_line:
pathway_id = pathway_line.split('\t')[1].replace('path:', '')
pathway_info = kegg_get(pathway_id)
# 处理通路信息
undefinedWorkflow 2: Pathway Enrichment Context
工作流2:通路富集分析上下文
Use case: Getting all genes in organism pathways for enrichment analysis.
python
from scripts.kegg_api import kegg_list, kegg_link适用场景:获取某物种所有通路中的基因,用于富集分析。
python
from scripts.kegg_api import kegg_list, kegg_linkStep 1: List all human pathways
步骤1:列出所有人类通路
pathways = kegg_list('pathway', 'hsa')
pathways = kegg_list('pathway', 'hsa')
Step 2: For each pathway, get associated genes
步骤2:为每个通路获取关联的基因
for pathway_line in pathways.split('\n'):
if pathway_line:
pathway_id = pathway_line.split('\t')[0]
genes = kegg_link('genes', pathway_id)
# Process genes for enrichment analysis
undefinedfor pathway_line in pathways.split('\n'):
if pathway_line:
pathway_id = pathway_line.split('\t')[0]
genes = kegg_link('genes', pathway_id)
# 处理基因数据用于富集分析
undefinedWorkflow 3: Compound to Pathway Analysis
工作流3:化合物到通路的分析
Use case: Finding metabolic pathways containing compounds of interest.
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_get适用场景:查找包含目标化合物的代谢通路。
python
from scripts.kegg_api import kegg_find, kegg_link, kegg_getStep 1: Search for compound
步骤1:搜索化合物
compound_results = kegg_find('compound', 'glucose')
compound_results = kegg_find('compound', 'glucose')
Step 2: Link compound to reactions
步骤2:将化合物关联到反应
reactions = kegg_link('reaction', 'cpd:C00031') # Glucose
reactions = kegg_link('reaction', 'cpd:C00031') # 葡萄糖
Step 3: Link reactions to pathways
步骤3:将反应关联到通路
pathways = kegg_link('pathway', 'rn:R00299') # Specific reaction
pathways = kegg_link('pathway', 'rn:R00299') # 特定反应
Step 4: Get pathway details
步骤4:获取通路详情
pathway_info = kegg_get('map00010') # Glycolysis
undefinedpathway_info = kegg_get('map00010') # 糖酵解通路
undefinedWorkflow 4: Cross-Database Integration
工作流4:跨数据库整合
Use case: Integrating KEGG data with UniProt, NCBI, or PubChem databases.
python
from scripts.kegg_api import kegg_conv, kegg_get适用场景:将KEGG数据与UniProt、NCBI或PubChem数据库整合。
python
from scripts.kegg_api import kegg_conv, kegg_getStep 1: Convert KEGG gene IDs to external database IDs
步骤1:将KEGG基因ID转换为外部数据库ID
uniprot_map = kegg_conv('uniprot', 'hsa')
ncbi_map = kegg_conv('ncbi-geneid', 'hsa')
uniprot_map = kegg_conv('uniprot', 'hsa')
ncbi_map = kegg_conv('ncbi-geneid', 'hsa')
Step 2: Parse conversion results
步骤2:解析转换结果
for line in uniprot_map.split('\n'):
if line:
kegg_id, uniprot_id = line.split('\t')
# Use external IDs for integration
for line in uniprot_map.split('\n'):
if line:
kegg_id, uniprot_id = line.split('\t')
# 使用外部ID进行整合
Step 3: Get sequences using KEGG
步骤3:通过KEGG获取序列
sequence = kegg_get('hsa:10458', 'aaseq')
undefinedsequence = kegg_get('hsa:10458', 'aaseq')
undefinedWorkflow 5: Organism-Specific Pathway Analysis
工作流5:物种特异性通路分析
Use case: Comparing pathways across different organisms.
python
from scripts.kegg_api import kegg_list, kegg_get适用场景:比较不同物种之间的通路。
python
from scripts.kegg_api import kegg_list, kegg_getStep 1: List pathways for multiple organisms
步骤1:列出多个物种的通路
human_pathways = kegg_list('pathway', 'hsa')
mouse_pathways = kegg_list('pathway', 'mmu')
yeast_pathways = kegg_list('pathway', 'sce')
human_pathways = kegg_list('pathway', 'hsa')
mouse_pathways = kegg_list('pathway', 'mmu')
yeast_pathways = kegg_list('pathway', 'sce')
Step 2: Get reference pathway for comparison
步骤2:获取参考通路用于比较
ref_pathway = kegg_get('map00010') # Reference glycolysis
ref_pathway = kegg_get('map00010') # 参考糖酵解通路
Step 3: Get organism-specific versions
步骤3:获取物种特异性版本的通路
hsa_glycolysis = kegg_get('hsa00010')
mmu_glycolysis = kegg_get('mmu00010')
undefinedhsa_glycolysis = kegg_get('hsa00010')
mmu_glycolysis = kegg_get('mmu00010')
undefinedPathway Categories
通路分类
KEGG organizes pathways into seven major categories. When interpreting pathway IDs or recommending pathways to users:
- Metabolism (e.g., - Glycolysis,
map00010- Oxidative phosphorylation)map00190 - Genetic Information Processing (e.g., - Ribosome,
map03010- Spliceosome)map03040 - Environmental Information Processing (e.g., - MAPK signaling,
map04010- ABC transporters)map02010 - Cellular Processes (e.g., - Autophagy,
map04140- Apoptosis)map04210 - Organismal Systems (e.g., - Complement cascade,
map04610- Insulin signaling)map04910 - Human Diseases (e.g., - Pathways in cancer,
map05200- Alzheimer disease)map05010 - Drug Development (chronological and target-based classifications)
Reference for detailed pathway lists and classifications.
references/kegg_reference.mdKEGG将通路分为7个主要类别。在解析通路ID或向用户推荐通路时,请参考以下分类:
- 代谢(例如 - 糖酵解、
map00010- 氧化磷酸化)map00190 - 遗传信息处理(例如 - 核糖体、
map03010- 剪接体)map03040 - 环境信息处理(例如 - MAPK信号通路、
map04010- ABC转运蛋白)map02010 - 细胞过程(例如 - 自噬、
map04140- 细胞凋亡)map04210 - 有机体系统(例如 - 补体激活、
map04610- 胰岛素信号通路)map04910 - 人类疾病(例如 - 癌症通路、
map05200- 阿尔茨海默病)map05010 - 药物开发(按时间顺序和靶点分类)
如需详细的通路列表和分类,请参考。
references/kegg_reference.mdImportant Identifiers and Formats
重要标识符和格式
Pathway IDs
通路ID
- - Reference pathway (generic, not organism-specific)
map##### - - Human pathway
hsa##### - - Mouse pathway
mmu#####
- - 参考通路(通用型,非物种特异性)
map##### - - 人类通路
hsa##### - - 小鼠通路
mmu#####
Gene IDs
基因ID
- Format: (e.g.,
organism:gene_number)hsa:10458
- 格式:(例如
物种代码:基因编号)hsa:10458
Compound IDs
化合物ID
- Format: (e.g.,
cpd:C#####for ATP)cpd:C00002
- 格式:(例如
cpd:C#####代表ATP)cpd:C00002
Drug IDs
药物ID
- Format: (e.g.,
dr:D#####)dr:D00001
- 格式:(例如
dr:D#####)dr:D00001
Enzyme IDs
酶ID
- Format: (e.g.,
ec:EC_number)ec:1.1.1.1
- 格式:(例如
ec:EC编号)ec:1.1.1.1
KO (KEGG Orthology) IDs
KO(KEGG直系同源组)ID
- Format: (e.g.,
ko:K#####)ko:K00001
- 格式:(例如
ko:K#####)ko:K00001
API Limitations
API限制
Respect these constraints when using the KEGG API:
- Entry limits: Maximum 10 entries per operation (except image/kgml/json: 1 entry only)
- Academic use: API is for academic use only; commercial use requires licensing
- HTTP status codes: Check for 200 (success), 400 (bad request), 404 (not found)
- Rate limiting: No explicit limit, but avoid rapid-fire requests
使用KEGG API时请遵守以下约束:
- 条目数量限制:每次操作最多查询10个条目(图片/KGML/JSON格式仅支持1个条目)
- 学术用途限制:API仅用于学术用途;商业使用需要授权
- HTTP状态码:检查状态码判断结果:200(成功)、400(请求错误)、404(未找到)
- 请求频率限制:无明确限制,但请避免连续快速请求
Detailed Reference
详细参考文档
For comprehensive API documentation, database specifications, organism codes, and advanced usage, refer to . This includes:
references/kegg_reference.md- Complete list of KEGG databases
- Detailed API operation syntax
- All organism codes
- HTTP status codes and error handling
- Integration with Biopython and R/Bioconductor
- Best practices for API usage
如需完整的API文档、数据库规范、物种代码和高级使用方法,请参考。其中包括:
references/kegg_reference.md- KEGG数据库的完整列表
- 详细的API操作语法
- 所有物种代码
- HTTP状态码和错误处理
- 与Biopython和R/Bioconductor的集成方法
- API使用的最佳实践
Troubleshooting
故障排除
404 Not Found: Entry or database doesn't exist; verify IDs and organism codes
400 Bad Request: Syntax error in API call; check parameter formatting
Empty results: Search term may not match entries; try broader keywords
Image/KGML errors: These formats only work with single entries; remove batch processing
404 Not Found:条目或数据库不存在;请验证ID和物种代码
400 Bad Request:API调用语法错误;请检查参数格式
结果为空:搜索词可能未匹配到条目;尝试更宽泛的关键词
图片/KGML错误:这些格式仅支持单个条目;请取消批量处理
Additional Tools
其他工具
For interactive pathway visualization and annotation:
- KEGG Mapper: https://www.kegg.jp/kegg/mapper/
- BlastKOALA: Automated genome annotation
- GhostKOALA: Metagenome/metatranscriptome annotation
如需交互式通路可视化和注释工具:
- KEGG Mapper:https://www.kegg.jp/kegg/mapper/
- BlastKOALA:自动化基因组注释工具
- GhostKOALA:宏基因组/宏转录组注释工具