bioservices

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

BioServices

Overview

概述

BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.

BioServices是一个Python包，提供对约40种生物信息学网络服务和数据库的程序化访问。可在Python工作流中检索生物数据、执行跨数据库查询、映射标识符、分析序列并整合多种生物资源。该包可透明处理REST和SOAP/WSDL协议。

When to Use This Skill

何时使用该技能

This skill should be used when:

Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
Analyzing metabolic pathways and gene functions via KEGG or Reactome
Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
Running sequence similarity searches (BLAST, MUSCLE alignment)
Querying gene ontology terms (QuickGO, GO annotations)
Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
Mining genomic data (BioMart, ArrayExpress, ENA)
Integrating data from multiple bioinformatics resources in a single workflow

当您需要以下操作时，应使用该技能：

从UniProt、PDB、Pfam检索蛋白质序列、注释或结构
通过KEGG或Reactome分析代谢通路和基因功能
在化合物数据库（ChEBI、ChEMBL、PubChem）中搜索化学信息
在不同生物数据库间转换标识符（KEGG↔UniProt、化合物ID）
运行序列相似性搜索（BLAST、MUSCLE比对）
查询基因本体论术语（QuickGO、GO注释）
访问蛋白质-蛋白质相互作用数据（PSICQUIC、IntactComplex）
挖掘基因组数据（BioMart、ArrayExpress、ENA）
在单个工作流中整合多种生物信息学资源

Core Capabilities

核心功能

1. Protein Analysis

1. 蛋白质分析

Retrieve protein information, sequences, and functional annotations:

python

from bioservices import UniProt

u = UniProt(verbose=False)

检索蛋白质信息、序列和功能注释：

python

from bioservices import UniProt

u = UniProt(verbose=False)

Search for protein by name

results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")

Retrieve FASTA sequence

sequence = u.retrieve("P43403", "fasta")

Map identifiers between databases

kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")


**Key methods:**
- `search()`: Query UniProt with flexible search terms
- `retrieve()`: Get protein entries in various formats (FASTA, XML, tab)
- `mapping()`: Convert identifiers between databases

Reference: `references/services_reference.md` for complete UniProt API details.

kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")


**关键方法：**
- `search()`：使用灵活的搜索词查询UniProt
- `retrieve()`：以多种格式（FASTA、XML、表格）获取蛋白质条目
- `mapping()`：在不同数据库间转换标识符

参考：`references/services_reference.md` 获取完整的UniProt API详情。

2. Pathway Discovery and Analysis

2. 通路发现与分析

Access KEGG pathway information for genes and organisms:

python

from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # Set to human

访问针对基因和物种的KEGG通路信息：

python

from bioservices import KEGG

k = KEGG()
k.organism = "hsa"  # Set to human

Search for organisms

k.lookfor_organism("droso") # Find Drosophila species

Find pathways by name

k.lookfor_pathway("B cell") # Returns matching pathway IDs

Get pathways containing specific genes

pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene

Retrieve and parse pathway data

data = k.get("hsa04660") parsed = k.parse(data)

Extract pathway interactions

interactions = k.parse_kgml_pathway("hsa04660") relations = interactions['relations'] # Protein-protein interactions

Convert to Simple Interaction Format

sif_data = k.pathway2sif("hsa04660")


**Key methods:**
- `lookfor_organism()`, `lookfor_pathway()`: Search by name
- `get_pathway_by_gene()`: Find pathways containing genes
- `parse_kgml_pathway()`: Extract structured pathway data
- `pathway2sif()`: Get protein interaction networks

Reference: `references/workflow_patterns.md` for complete pathway analysis workflows.

sif_data = k.pathway2sif("hsa04660")


**关键方法：**
- `lookfor_organism()`、`lookfor_pathway()`：按名称搜索
- `get_pathway_by_gene()`：查找包含指定基因的通路
- `parse_kgml_pathway()`：提取结构化通路数据
- `pathway2sif()`：获取蛋白质相互作用网络

参考：`references/workflow_patterns.md` 获取完整的通路分析工作流。

3. Compound Database Searches

3. 化合物数据库搜索

Search and cross-reference compounds across multiple databases:

python

from bioservices import KEGG, UniChem

k = KEGG()

在多个数据库中搜索并交叉引用化合物：

python

from bioservices import KEGG, UniChem

k = KEGG()

Search compounds by name

results = k.find("compound", "Geldanamycin") # Returns cpd:C11222

Get compound information with database links

compound_info = k.get("cpd:C11222") # Includes ChEBI links

Cross-reference KEGG → ChEMBL using UniChem

u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315


**Common workflow:**
1. Search compound by name in KEGG
2. Extract KEGG compound ID
3. Use UniChem for KEGG → ChEMBL mapping
4. ChEBI IDs are often provided in KEGG entries

Reference: `references/identifier_mapping.md` for complete cross-database mapping guide.

u = UniChem() chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315


**常见工作流：**
1. 在KEGG中按名称搜索化合物
2. 提取KEGG化合物ID
3. 使用UniChem完成KEGG → ChEMBL映射
4. KEGG条目通常包含ChEBI标识符

参考：`references/identifier_mapping.md` 获取完整的跨数据库映射指南。

4. Sequence Analysis

4. 序列分析

Run BLAST searches and sequence alignments:

python

from bioservices import NCBIblast

s = NCBIblast(verbose=False)

运行BLAST搜索和序列比对：

python

from bioservices import NCBIblast

s = NCBIblast(verbose=False)

Run BLASTP against UniProtKB

jobid = s.run( program="blastp", sequence=protein_sequence, stype="protein", database="uniprotkb", email="your.email@example.com" # Required by NCBI )

Check job status and retrieve results

s.getStatus(jobid) results = s.getResult(jobid, "out")


**Note:** BLAST jobs are asynchronous. Check status before retrieving results.

s.getStatus(jobid) results = s.getResult(jobid, "out")


**注意：** BLAST任务为异步执行。获取结果前请检查状态。

5. Identifier Mapping

5. 标识符映射

Convert identifiers between different biological databases:

python

from bioservices import UniProt, KEGG

在不同生物数据库间转换标识符：

python

from bioservices import UniProt, KEGG

UniProt mapping (many database pairs supported)

u = UniProt() results = u.mapping( fr="UniProtKB_AC-ID", # Source database to="KEGG", # Target database query="P43403" # Identifier(s) to convert )

KEGG gene ID → UniProt

kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")

For compounds, use UniChem

from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222")


**Supported mappings (UniProt):**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- And many more (see `references/identifier_mapping.md`)

from bioservices import UniChem u = UniChem() chembl_from_kegg = u.get_compound_id_from_kegg("C11222")


**支持的映射（UniProt）：**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- 以及更多（详见`references/identifier_mapping.md`）

6. Gene Ontology Queries

6. 基因本体论查询

Access GO terms and annotations:

python

from bioservices import QuickGO

g = QuickGO(verbose=False)

访问GO术语和注释：

python

from bioservices import QuickGO

g = QuickGO(verbose=False)

Retrieve GO term information

term_info = g.Term("GO:0003824", frmt="obo")

Search annotations

annotations = g.Annotation(protein="P43403", format="tsv")

undefined

annotations = g.Annotation(protein="P43403", format="tsv")

undefined

7. Protein-Protein Interactions

7. 蛋白质-蛋白质相互作用

Query interaction databases via PSICQUIC:

python

from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

通过PSICQUIC查询相互作用数据库：

python

from bioservices import PSICQUIC

s = PSICQUIC(verbose=False)

Query specific database (e.g., MINT)

interactions = s.query("mint", "ZAP70 AND species:9606")

List available interaction databases

databases = s.activeDBs


**Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others.

databases = s.activeDBs


**可用数据库：** MINT、IntAct、BioGRID、DIP及其他30余种。

Multi-Service Integration Workflows

多服务整合工作流

BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:

BioServices擅长组合多种服务以实现全面分析。常见整合模式：

Complete Protein Analysis Pipeline

完整蛋白质分析流水线

Execute a full protein characterization workflow:

bash

python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com

This script demonstrates:

UniProt search for protein entry
FASTA sequence retrieval
BLAST similarity search
KEGG pathway discovery
PSICQUIC interaction mapping

执行完整的蛋白质表征工作流：

bash

python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com

该脚本演示：

搜索UniProt蛋白质条目
检索FASTA序列
BLAST相似性搜索
KEGG通路发现
PSICQUIC相互作用映射

Pathway Network Analysis

通路网络分析

Analyze all pathways for an organism:

bash

python scripts/pathway_analysis.py hsa output_directory/

Extracts and analyzes:

All pathway IDs for organism
Protein-protein interactions per pathway
Interaction type distributions
Exports to CSV/SIF formats

分析某一物种的所有通路：

bash

python scripts/pathway_analysis.py hsa output_directory/

提取并分析：

该物种的所有通路ID
每个通路的蛋白质-蛋白质相互作用
相互作用类型分布
导出为CSV/SIF格式

Cross-Database Compound Search

跨数据库化合物搜索

Map compound identifiers across databases:

bash

python scripts/compound_cross_reference.py Geldanamycin

Retrieves:

KEGG compound ID
ChEBI identifier
ChEMBL identifier
Basic compound properties

在不同数据库间映射化合物标识符：

bash

python scripts/compound_cross_reference.py Geldanamycin

检索内容：

KEGG化合物ID
ChEBI标识符
ChEMBL标识符
基本化合物属性

Batch Identifier Conversion

批量标识符转换

Convert multiple identifiers at once:

bash

python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

一次性转换多个标识符：

bash

python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG

Best Practices

最佳实践

Output Format Handling

输出格式处理

Different services return data in various formats:

XML: Parse using BeautifulSoup (most SOAP services)
Tab-separated (TSV): Pandas DataFrames for tabular data
Dictionary/JSON: Direct Python manipulation
FASTA: BioPython integration for sequence analysis

不同服务返回的数据格式各异：

XML：使用BeautifulSoup解析（大多数SOAP服务）
制表分隔（TSV）：使用Pandas DataFrames处理表格数据
字典/JSON：直接进行Python操作
FASTA：与BioPython整合进行序列分析

Rate Limiting and Verbosity

请求频率限制与详细程度

Control API request behavior:

python

from bioservices import KEGG

k = KEGG(verbose=False)  # Suppress HTTP request details
k.TIMEOUT = 30  # Adjust timeout for slow connections

控制API请求行为：

python

from bioservices import KEGG

k = KEGG(verbose=False)  # Suppress HTTP request details
k.TIMEOUT = 30  # Adjust timeout for slow connections

Error Handling

错误处理

Wrap service calls in try-except blocks:

python

try:
    results = u.search("ambiguous_query")
    if results:
        # Process results
        pass
except Exception as e:
    print(f"Search failed: {e}")

将服务调用包裹在try-except块中：

python

try:
    results = u.search("ambiguous_query")
    if results:
        # Process results
        pass
except Exception as e:
    print(f"Search failed: {e}")

Organism Codes

物种代码

Use standard organism abbreviations:

```
hsa
```
: Homo sapiens (human)
```
mmu
```
: Mus musculus (mouse)
```
dme
```
: Drosophila melanogaster
```
sce
```
: Saccharomyces cerevisiae (yeast)

List all organisms:

k.list("organism")

k.organismIds

使用标准物种缩写：

```
hsa
```
：智人（人类）
```
mmu
```
：小家鼠（小鼠）
```
dme
```
：黑腹果蝇
```
sce
```
：酿酒酵母（酵母）

查看所有物种：

k.list("organism")

或

k.organismIds

Integration with Other Tools

与其他工具整合

BioServices works well with:

BioPython: Sequence analysis on retrieved FASTA data
Pandas: Tabular data manipulation
PyMOL: 3D structure visualization (retrieve PDB IDs)
NetworkX: Network analysis of pathway interactions
Galaxy: Custom tool wrappers for workflow platforms

BioServices可与以下工具良好协作：

BioPython：对检索到的FASTA数据进行序列分析
Pandas：表格数据处理
PyMOL：3D结构可视化（检索PDB ID）
NetworkX：通路相互作用的网络分析
Galaxy：为工作流平台定制工具包装器

Resources

资源

scripts/

Executable Python scripts demonstrating complete workflows:

```
protein_analysis_workflow.py
```
: End-to-end protein characterization
```
pathway_analysis.py
```
: KEGG pathway discovery and network extraction
```
compound_cross_reference.py
```
: Multi-database compound searching
```
batch_id_converter.py
```
: Bulk identifier mapping utility

Scripts can be executed directly or adapted for specific use cases.

可执行Python脚本，演示完整工作流：

```
protein_analysis_workflow.py
```
：端到端蛋白质表征
```
pathway_analysis.py
```
：KEGG通路发现与网络提取
```
compound_cross_reference.py
```
：多数据库化合物搜索
```
batch_id_converter.py
```
：批量标识符映射工具

脚本可直接执行，也可根据特定需求调整。

references/

Detailed documentation loaded as needed:

```
services_reference.md
```
: Comprehensive list of all 40+ services with methods
```
workflow_patterns.md
```
: Detailed multi-step analysis workflows
```
identifier_mapping.md
```
: Complete guide to cross-database ID conversion

Load references when working with specific services or complex integration tasks.

按需加载的详细文档：

```
services_reference.md
```
：所有40余种服务的完整列表及方法
```
workflow_patterns.md
```
：详细的多步骤分析工作流
```
identifier_mapping.md
```
：完整的跨数据库ID转换指南

处理特定服务或复杂整合任务时，请加载对应参考文档。

Installation

安装

bash

uv pip install bioservices

Dependencies are automatically managed. Package is tested on Python 3.9-3.12.

bash

uv pip install bioservices

依赖项将自动管理。该包已在Python 3.9-3.12版本中测试。

Additional Information

补充信息

For detailed API documentation and advanced features, refer to:

Official documentation: https://bioservices.readthedocs.io/
Source code: https://github.com/cokelaer/bioservices
Service-specific references in
```
references/services_reference.md
```

如需详细API文档和高级功能，请参考：

官方文档：https://bioservices.readthedocs.io/
源代码：https://github.com/cokelaer/bioservices
```
references/services_reference.md
```
中的服务特定参考