bioservices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BioServices

BioServices

Overview

概述

BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.
BioServices是一个Python包,提供对约40种生物信息学网络服务和数据库的程序化访问。可在Python工作流中检索生物数据、执行跨数据库查询、映射标识符、分析序列并整合多种生物资源。该包可透明处理REST和SOAP/WSDL协议。

When to Use This Skill

何时使用该技能

This skill should be used when:
  • Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
  • Analyzing metabolic pathways and gene functions via KEGG or Reactome
  • Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
  • Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
  • Running sequence similarity searches (BLAST, MUSCLE alignment)
  • Querying gene ontology terms (QuickGO, GO annotations)
  • Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
  • Mining genomic data (BioMart, ArrayExpress, ENA)
  • Integrating data from multiple bioinformatics resources in a single workflow
当您需要以下操作时,应使用该技能:
  • 从UniProt、PDB、Pfam检索蛋白质序列、注释或结构
  • 通过KEGG或Reactome分析代谢通路和基因功能
  • 在化合物数据库(ChEBI、ChEMBL、PubChem)中搜索化学信息
  • 在不同生物数据库间转换标识符(KEGG↔UniProt、化合物ID)
  • 运行序列相似性搜索(BLAST、MUSCLE比对)
  • 查询基因本体论术语(QuickGO、GO注释)
  • 访问蛋白质-蛋白质相互作用数据(PSICQUIC、IntactComplex)
  • 挖掘基因组数据(BioMart、ArrayExpress、ENA)
  • 在单个工作流中整合多种生物信息学资源

Core Capabilities

核心功能

1. Protein Analysis

1. 蛋白质分析

Retrieve protein information, sequences, and functional annotations:
python
from bioservices import UniProt

u = UniProt(verbose=False)
检索蛋白质信息、序列和功能注释:
python
from bioservices import UniProt

u = UniProt(verbose=False)

Search for protein by name

Search for protein by name

results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")

Retrieve FASTA sequence

Retrieve FASTA sequence

sequence = u.retrieve("P43403", "fasta")
sequence = u.retrieve("P43403", "fasta")

Map identifiers between databases

Map identifiers between databases

kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

**Key methods:**
- `search()`: Query UniProt with flexible search terms
- `retrieve()`: Get protein entries in various formats (FASTA, XML, tab)
- `mapping()`: Convert identifiers between databases

Reference: `references/services_reference.md` for complete UniProt API details.
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

**关键方法:**
- `search()`:使用灵活的搜索词查询UniProt
- `retrieve()`:以多种格式(FASTA、XML、表格)获取蛋白质条目
- `mapping()`:在不同数据库间转换标识符

参考:`references/services_reference.md` 获取完整的UniProt API详情。

2. Pathway Discovery and Analysis

2. 通路发现与分析

Access KEGG pathway information for genes and organisms:
python
from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # Set to human
访问针对基因和物种的KEGG通路信息:
python
from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # Set to human

Search for organisms

Search for organisms

k.lookfor_organism("droso") # Find Drosophila species
k.lookfor_organism("droso") # Find Drosophila species

Find pathways by name

Find pathways by name

k.lookfor_pathway("B cell") # Returns matching pathway IDs
k.lookfor_pathway("B cell") # Returns matching pathway IDs

Get pathways containing specific genes

Get pathways containing specific genes

pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene
pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene

Retrieve and parse pathway data

Retrieve and parse pathway data

data = k.get("hsa04660") parsed = k.parse(data)
data = k.get("hsa04660") parsed = k.parse(data)

Extract pathway interactions

Extract pathway interactions

interactions = k.parse_kgml_pathway("hsa04660") relations = interactions['relations'] # Protein-protein interactions
interactions = k.parse_kgml_pathway("hsa04660") relations = interactions['relations'] # Protein-protein interactions

Convert to Simple Interaction Format

Convert to Simple Interaction Format

sif_data = k.pathway2sif("hsa04660")

**Key methods:**
- `lookfor_organism()`, `lookfor_pathway()`: Search by name
- `get_pathway_by_gene()`: Find pathways containing genes
- `parse_kgml_pathway()`: Extract structured pathway data
- `pathway2sif()`: Get protein interaction networks

Reference: `references/workflow_patterns.md` for complete pathway analysis workflows.
sif_data = k.pathway2sif("hsa04660")

**关键方法:**
- `lookfor_organism()`、`lookfor_pathway()`:按名称搜索
- `get_pathway_by_gene()`:查找包含指定基因的通路
- `parse_kgml_pathway()`:提取结构化通路数据
- `pathway2sif()`:获取蛋白质相互作用网络

参考:`references/workflow_patterns.md` 获取完整的通路分析工作流。

3. Compound Database Searches

3. 化合物数据库搜索

Search and cross-reference compounds across multiple databases:
python
from bioservices import KEGG, UniChem

k = KEGG()
在多个数据库中搜索并交叉引用化合物:
python
from bioservices import KEGG, UniChem

k = KEGG()

Search compounds by name

Search compounds by name

results = k.find("compound", "Geldanamycin") # Returns cpd:C11222
results = k.find("compound", "Geldanamycin") # Returns cpd:C11222

Get compound information with database links

Get compound information with database links

compound_info = k.get("cpd:C11222") # Includes ChEBI links
compound_info = k.get("cpd:C11222") # Includes ChEBI links

Cross-reference KEGG → ChEMBL using UniChem

Cross-reference KEGG → ChEMBL using UniChem

u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315

**Common workflow:**
1. Search compound by name in KEGG
2. Extract KEGG compound ID
3. Use UniChem for KEGG → ChEMBL mapping
4. ChEBI IDs are often provided in KEGG entries

Reference: `references/identifier_mapping.md` for complete cross-database mapping guide.
u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315

**常见工作流:**
1. 在KEGG中按名称搜索化合物
2. 提取KEGG化合物ID
3. 使用UniChem完成KEGG → ChEMBL映射
4. KEGG条目通常包含ChEBI标识符

参考:`references/identifier_mapping.md` 获取完整的跨数据库映射指南。

4. Sequence Analysis

4. 序列分析

Run BLAST searches and sequence alignments:
python
from bioservices import NCBIblast

s = NCBIblast(verbose=False)
运行BLAST搜索和序列比对:
python
from bioservices import NCBIblast

s = NCBIblast(verbose=False)

Run BLASTP against UniProtKB

Run BLASTP against UniProtKB

jobid = s.run( program="blastp", sequence=protein_sequence, stype="protein", database="uniprotkb", email="your.email@example.com" # Required by NCBI )
jobid = s.run( program="blastp", sequence=protein_sequence, stype="protein", database="uniprotkb", email="your.email@example.com" # Required by NCBI )

Check job status and retrieve results

Check job status and retrieve results

s.getStatus(jobid) results = s.getResult(jobid, "out")

**Note:** BLAST jobs are asynchronous. Check status before retrieving results.
s.getStatus(jobid) results = s.getResult(jobid, "out")

**注意:** BLAST任务为异步执行。获取结果前请检查状态。

5. Identifier Mapping

5. 标识符映射

Convert identifiers between different biological databases:
python
from bioservices import UniProt, KEGG
在不同生物数据库间转换标识符:
python
from bioservices import UniProt, KEGG

UniProt mapping (many database pairs supported)

UniProt mapping (many database pairs supported)

u = UniProt() results = u.mapping( fr="UniProtKB_AC-ID", # Source database to="KEGG", # Target database query="P43403" # Identifier(s) to convert )
u = UniProt() results = u.mapping( fr="UniProtKB_AC-ID", # Source database to="KEGG", # Target database query="P43403" # Identifier(s) to convert )

KEGG gene ID → UniProt

KEGG gene ID → UniProt

kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")

For compounds, use UniChem

For compounds, use UniChem

from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222")

**Supported mappings (UniProt):**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- And many more (see `references/identifier_mapping.md`)
from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222")

**支持的映射(UniProt):**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- 以及更多(详见`references/identifier_mapping.md`)

6. Gene Ontology Queries

6. 基因本体论查询

Access GO terms and annotations:
python
from bioservices import QuickGO

g = QuickGO(verbose=False)
访问GO术语和注释:
python
from bioservices import QuickGO

g = QuickGO(verbose=False)

Retrieve GO term information

Retrieve GO term information

term_info = g.Term("GO:0003824", frmt="obo")
term_info = g.Term("GO:0003824", frmt="obo")

Search annotations

Search annotations

annotations = g.Annotation(protein="P43403", format="tsv")
undefined
annotations = g.Annotation(protein="P43403", format="tsv")
undefined

7. Protein-Protein Interactions

7. 蛋白质-蛋白质相互作用

Query interaction databases via PSICQUIC:
python
from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)
通过PSICQUIC查询相互作用数据库:
python
from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

Query specific database (e.g., MINT)

Query specific database (e.g., MINT)

interactions = s.query("mint", "ZAP70 AND species:9606")
interactions = s.query("mint", "ZAP70 AND species:9606")

List available interaction databases

List available interaction databases

databases = s.activeDBs

**Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others.
databases = s.activeDBs

**可用数据库:** MINT、IntAct、BioGRID、DIP及其他30余种。

Multi-Service Integration Workflows

多服务整合工作流

BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
BioServices擅长组合多种服务以实现全面分析。常见整合模式:

Complete Protein Analysis Pipeline

完整蛋白质分析流水线

Execute a full protein characterization workflow:
bash
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com
This script demonstrates:
  1. UniProt search for protein entry
  2. FASTA sequence retrieval
  3. BLAST similarity search
  4. KEGG pathway discovery
  5. PSICQUIC interaction mapping
执行完整的蛋白质表征工作流:
bash
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com
该脚本演示:
  1. 搜索UniProt蛋白质条目
  2. 检索FASTA序列
  3. BLAST相似性搜索
  4. KEGG通路发现
  5. PSICQUIC相互作用映射

Pathway Network Analysis

通路网络分析

Analyze all pathways for an organism:
bash
python scripts/pathway_analysis.py hsa output_directory/
Extracts and analyzes:
  • All pathway IDs for organism
  • Protein-protein interactions per pathway
  • Interaction type distributions
  • Exports to CSV/SIF formats
分析某一物种的所有通路:
bash
python scripts/pathway_analysis.py hsa output_directory/
提取并分析:
  • 该物种的所有通路ID
  • 每个通路的蛋白质-蛋白质相互作用
  • 相互作用类型分布
  • 导出为CSV/SIF格式

Cross-Database Compound Search

跨数据库化合物搜索

Map compound identifiers across databases:
bash
python scripts/compound_cross_reference.py Geldanamycin
Retrieves:
  • KEGG compound ID
  • ChEBI identifier
  • ChEMBL identifier
  • Basic compound properties
在不同数据库间映射化合物标识符:
bash
python scripts/compound_cross_reference.py Geldanamycin
检索内容:
  • KEGG化合物ID
  • ChEBI标识符
  • ChEMBL标识符
  • 基本化合物属性

Batch Identifier Conversion

批量标识符转换

Convert multiple identifiers at once:
bash
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG
一次性转换多个标识符:
bash
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

Best Practices

最佳实践

Output Format Handling

输出格式处理

Different services return data in various formats:
  • XML: Parse using BeautifulSoup (most SOAP services)
  • Tab-separated (TSV): Pandas DataFrames for tabular data
  • Dictionary/JSON: Direct Python manipulation
  • FASTA: BioPython integration for sequence analysis
不同服务返回的数据格式各异:
  • XML:使用BeautifulSoup解析(大多数SOAP服务)
  • 制表分隔(TSV):使用Pandas DataFrames处理表格数据
  • 字典/JSON:直接进行Python操作
  • FASTA:与BioPython整合进行序列分析

Rate Limiting and Verbosity

请求频率限制与详细程度

Control API request behavior:
python
from bioservices import KEGG

k = KEGG(verbose=False)  # Suppress HTTP request details
k.TIMEOUT = 30  # Adjust timeout for slow connections
控制API请求行为:
python
from bioservices import KEGG

k = KEGG(verbose=False)  # Suppress HTTP request details
k.TIMEOUT = 30  # Adjust timeout for slow connections

Error Handling

错误处理

Wrap service calls in try-except blocks:
python
try:
    results = u.search("ambiguous_query")
    if results:
        # Process results
        pass
except Exception as e:
    print(f"Search failed: {e}")
将服务调用包裹在try-except块中:
python
try:
    results = u.search("ambiguous_query")
    if results:
        # Process results
        pass
except Exception as e:
    print(f"Search failed: {e}")

Organism Codes

物种代码

Use standard organism abbreviations:
  • hsa
    : Homo sapiens (human)
  • mmu
    : Mus musculus (mouse)
  • dme
    : Drosophila melanogaster
  • sce
    : Saccharomyces cerevisiae (yeast)
List all organisms:
k.list("organism")
or
k.organismIds
使用标准物种缩写:
  • hsa
    :智人(人类)
  • mmu
    :小家鼠(小鼠)
  • dme
    :黑腹果蝇
  • sce
    :酿酒酵母(酵母)
查看所有物种:
k.list("organism")
k.organismIds

Integration with Other Tools

与其他工具整合

BioServices works well with:
  • BioPython: Sequence analysis on retrieved FASTA data
  • Pandas: Tabular data manipulation
  • PyMOL: 3D structure visualization (retrieve PDB IDs)
  • NetworkX: Network analysis of pathway interactions
  • Galaxy: Custom tool wrappers for workflow platforms
BioServices可与以下工具良好协作:
  • BioPython:对检索到的FASTA数据进行序列分析
  • Pandas:表格数据处理
  • PyMOL:3D结构可视化(检索PDB ID)
  • NetworkX:通路相互作用的网络分析
  • Galaxy:为工作流平台定制工具包装器

Resources

资源

scripts/

scripts/

Executable Python scripts demonstrating complete workflows:
  • protein_analysis_workflow.py
    : End-to-end protein characterization
  • pathway_analysis.py
    : KEGG pathway discovery and network extraction
  • compound_cross_reference.py
    : Multi-database compound searching
  • batch_id_converter.py
    : Bulk identifier mapping utility
Scripts can be executed directly or adapted for specific use cases.
可执行Python脚本,演示完整工作流:
  • protein_analysis_workflow.py
    :端到端蛋白质表征
  • pathway_analysis.py
    :KEGG通路发现与网络提取
  • compound_cross_reference.py
    :多数据库化合物搜索
  • batch_id_converter.py
    :批量标识符映射工具
脚本可直接执行,也可根据特定需求调整。

references/

references/

Detailed documentation loaded as needed:
  • services_reference.md
    : Comprehensive list of all 40+ services with methods
  • workflow_patterns.md
    : Detailed multi-step analysis workflows
  • identifier_mapping.md
    : Complete guide to cross-database ID conversion
Load references when working with specific services or complex integration tasks.
按需加载的详细文档:
  • services_reference.md
    :所有40余种服务的完整列表及方法
  • workflow_patterns.md
    :详细的多步骤分析工作流
  • identifier_mapping.md
    :完整的跨数据库ID转换指南
处理特定服务或复杂整合任务时,请加载对应参考文档。

Installation

安装

bash
uv pip install bioservices
Dependencies are automatically managed. Package is tested on Python 3.9-3.12.
bash
uv pip install bioservices
依赖项将自动管理。该包已在Python 3.9-3.12版本中测试。

Additional Information

补充信息

For detailed API documentation and advanced features, refer to:
如需详细API文档和高级功能,请参考: