bioservices
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBioServices
BioServices
Overview
概述
BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.
BioServices是一个Python包,提供对约40种生物信息学网络服务和数据库的程序化访问。可在Python工作流中检索生物数据、执行跨数据库查询、映射标识符、分析序列并整合多种生物资源。该包可透明处理REST和SOAP/WSDL协议。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
- Analyzing metabolic pathways and gene functions via KEGG or Reactome
- Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
- Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
- Running sequence similarity searches (BLAST, MUSCLE alignment)
- Querying gene ontology terms (QuickGO, GO annotations)
- Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
- Mining genomic data (BioMart, ArrayExpress, ENA)
- Integrating data from multiple bioinformatics resources in a single workflow
当您需要以下操作时,应使用该技能:
- 从UniProt、PDB、Pfam检索蛋白质序列、注释或结构
- 通过KEGG或Reactome分析代谢通路和基因功能
- 在化合物数据库(ChEBI、ChEMBL、PubChem)中搜索化学信息
- 在不同生物数据库间转换标识符(KEGG↔UniProt、化合物ID)
- 运行序列相似性搜索(BLAST、MUSCLE比对)
- 查询基因本体论术语(QuickGO、GO注释)
- 访问蛋白质-蛋白质相互作用数据(PSICQUIC、IntactComplex)
- 挖掘基因组数据(BioMart、ArrayExpress、ENA)
- 在单个工作流中整合多种生物信息学资源
Core Capabilities
核心功能
1. Protein Analysis
1. 蛋白质分析
Retrieve protein information, sequences, and functional annotations:
python
from bioservices import UniProt
u = UniProt(verbose=False)检索蛋白质信息、序列和功能注释:
python
from bioservices import UniProt
u = UniProt(verbose=False)Search for protein by name
Search for protein by name
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")
Retrieve FASTA sequence
Retrieve FASTA sequence
sequence = u.retrieve("P43403", "fasta")
sequence = u.retrieve("P43403", "fasta")
Map identifiers between databases
Map identifiers between databases
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")
**Key methods:**
- `search()`: Query UniProt with flexible search terms
- `retrieve()`: Get protein entries in various formats (FASTA, XML, tab)
- `mapping()`: Convert identifiers between databases
Reference: `references/services_reference.md` for complete UniProt API details.kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")
**关键方法:**
- `search()`:使用灵活的搜索词查询UniProt
- `retrieve()`:以多种格式(FASTA、XML、表格)获取蛋白质条目
- `mapping()`:在不同数据库间转换标识符
参考:`references/services_reference.md` 获取完整的UniProt API详情。2. Pathway Discovery and Analysis
2. 通路发现与分析
Access KEGG pathway information for genes and organisms:
python
from bioservices import KEGG
k = KEGG()
k.organism = "hsa" # Set to human访问针对基因和物种的KEGG通路信息:
python
from bioservices import KEGG
k = KEGG()
k.organism = "hsa" # Set to humanSearch for organisms
Search for organisms
k.lookfor_organism("droso") # Find Drosophila species
k.lookfor_organism("droso") # Find Drosophila species
Find pathways by name
Find pathways by name
k.lookfor_pathway("B cell") # Returns matching pathway IDs
k.lookfor_pathway("B cell") # Returns matching pathway IDs
Get pathways containing specific genes
Get pathways containing specific genes
pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene
pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene
Retrieve and parse pathway data
Retrieve and parse pathway data
data = k.get("hsa04660")
parsed = k.parse(data)
data = k.get("hsa04660")
parsed = k.parse(data)
Extract pathway interactions
Extract pathway interactions
interactions = k.parse_kgml_pathway("hsa04660")
relations = interactions['relations'] # Protein-protein interactions
interactions = k.parse_kgml_pathway("hsa04660")
relations = interactions['relations'] # Protein-protein interactions
Convert to Simple Interaction Format
Convert to Simple Interaction Format
sif_data = k.pathway2sif("hsa04660")
**Key methods:**
- `lookfor_organism()`, `lookfor_pathway()`: Search by name
- `get_pathway_by_gene()`: Find pathways containing genes
- `parse_kgml_pathway()`: Extract structured pathway data
- `pathway2sif()`: Get protein interaction networks
Reference: `references/workflow_patterns.md` for complete pathway analysis workflows.sif_data = k.pathway2sif("hsa04660")
**关键方法:**
- `lookfor_organism()`、`lookfor_pathway()`:按名称搜索
- `get_pathway_by_gene()`:查找包含指定基因的通路
- `parse_kgml_pathway()`:提取结构化通路数据
- `pathway2sif()`:获取蛋白质相互作用网络
参考:`references/workflow_patterns.md` 获取完整的通路分析工作流。3. Compound Database Searches
3. 化合物数据库搜索
Search and cross-reference compounds across multiple databases:
python
from bioservices import KEGG, UniChem
k = KEGG()在多个数据库中搜索并交叉引用化合物:
python
from bioservices import KEGG, UniChem
k = KEGG()Search compounds by name
Search compounds by name
results = k.find("compound", "Geldanamycin") # Returns cpd:C11222
results = k.find("compound", "Geldanamycin") # Returns cpd:C11222
Get compound information with database links
Get compound information with database links
compound_info = k.get("cpd:C11222") # Includes ChEBI links
compound_info = k.get("cpd:C11222") # Includes ChEBI links
Cross-reference KEGG → ChEMBL using UniChem
Cross-reference KEGG → ChEMBL using UniChem
u = UniChem()
chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315
**Common workflow:**
1. Search compound by name in KEGG
2. Extract KEGG compound ID
3. Use UniChem for KEGG → ChEMBL mapping
4. ChEBI IDs are often provided in KEGG entries
Reference: `references/identifier_mapping.md` for complete cross-database mapping guide.u = UniChem()
chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315
**常见工作流:**
1. 在KEGG中按名称搜索化合物
2. 提取KEGG化合物ID
3. 使用UniChem完成KEGG → ChEMBL映射
4. KEGG条目通常包含ChEBI标识符
参考:`references/identifier_mapping.md` 获取完整的跨数据库映射指南。4. Sequence Analysis
4. 序列分析
Run BLAST searches and sequence alignments:
python
from bioservices import NCBIblast
s = NCBIblast(verbose=False)运行BLAST搜索和序列比对:
python
from bioservices import NCBIblast
s = NCBIblast(verbose=False)Run BLASTP against UniProtKB
Run BLASTP against UniProtKB
jobid = s.run(
program="blastp",
sequence=protein_sequence,
stype="protein",
database="uniprotkb",
email="your.email@example.com" # Required by NCBI
)
jobid = s.run(
program="blastp",
sequence=protein_sequence,
stype="protein",
database="uniprotkb",
email="your.email@example.com" # Required by NCBI
)
Check job status and retrieve results
Check job status and retrieve results
s.getStatus(jobid)
results = s.getResult(jobid, "out")
**Note:** BLAST jobs are asynchronous. Check status before retrieving results.s.getStatus(jobid)
results = s.getResult(jobid, "out")
**注意:** BLAST任务为异步执行。获取结果前请检查状态。5. Identifier Mapping
5. 标识符映射
Convert identifiers between different biological databases:
python
from bioservices import UniProt, KEGG在不同生物数据库间转换标识符:
python
from bioservices import UniProt, KEGGUniProt mapping (many database pairs supported)
UniProt mapping (many database pairs supported)
u = UniProt()
results = u.mapping(
fr="UniProtKB_AC-ID", # Source database
to="KEGG", # Target database
query="P43403" # Identifier(s) to convert
)
u = UniProt()
results = u.mapping(
fr="UniProtKB_AC-ID", # Source database
to="KEGG", # Target database
query="P43403" # Identifier(s) to convert
)
KEGG gene ID → UniProt
KEGG gene ID → UniProt
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")
For compounds, use UniChem
For compounds, use UniChem
from bioservices import UniChem
u = UniChem()
chembl_from_kegg = u.get_compound_id_from_kegg("C11222")
**Supported mappings (UniProt):**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- And many more (see `references/identifier_mapping.md`)from bioservices import UniChem
u = UniChem()
chembl_from_kegg = u.get_compound_id_from_kegg("C11222")
**支持的映射(UniProt):**
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- 以及更多(详见`references/identifier_mapping.md`)6. Gene Ontology Queries
6. 基因本体论查询
Access GO terms and annotations:
python
from bioservices import QuickGO
g = QuickGO(verbose=False)访问GO术语和注释:
python
from bioservices import QuickGO
g = QuickGO(verbose=False)Retrieve GO term information
Retrieve GO term information
term_info = g.Term("GO:0003824", frmt="obo")
term_info = g.Term("GO:0003824", frmt="obo")
Search annotations
Search annotations
annotations = g.Annotation(protein="P43403", format="tsv")
undefinedannotations = g.Annotation(protein="P43403", format="tsv")
undefined7. Protein-Protein Interactions
7. 蛋白质-蛋白质相互作用
Query interaction databases via PSICQUIC:
python
from bioservices import PSICQUIC
s = PSICQUIC(verbose=False)通过PSICQUIC查询相互作用数据库:
python
from bioservices import PSICQUIC
s = PSICQUIC(verbose=False)Query specific database (e.g., MINT)
Query specific database (e.g., MINT)
interactions = s.query("mint", "ZAP70 AND species:9606")
interactions = s.query("mint", "ZAP70 AND species:9606")
List available interaction databases
List available interaction databases
databases = s.activeDBs
**Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others.databases = s.activeDBs
**可用数据库:** MINT、IntAct、BioGRID、DIP及其他30余种。Multi-Service Integration Workflows
多服务整合工作流
BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
BioServices擅长组合多种服务以实现全面分析。常见整合模式:
Complete Protein Analysis Pipeline
完整蛋白质分析流水线
Execute a full protein characterization workflow:
bash
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.comThis script demonstrates:
- UniProt search for protein entry
- FASTA sequence retrieval
- BLAST similarity search
- KEGG pathway discovery
- PSICQUIC interaction mapping
执行完整的蛋白质表征工作流:
bash
python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com该脚本演示:
- 搜索UniProt蛋白质条目
- 检索FASTA序列
- BLAST相似性搜索
- KEGG通路发现
- PSICQUIC相互作用映射
Pathway Network Analysis
通路网络分析
Analyze all pathways for an organism:
bash
python scripts/pathway_analysis.py hsa output_directory/Extracts and analyzes:
- All pathway IDs for organism
- Protein-protein interactions per pathway
- Interaction type distributions
- Exports to CSV/SIF formats
分析某一物种的所有通路:
bash
python scripts/pathway_analysis.py hsa output_directory/提取并分析:
- 该物种的所有通路ID
- 每个通路的蛋白质-蛋白质相互作用
- 相互作用类型分布
- 导出为CSV/SIF格式
Cross-Database Compound Search
跨数据库化合物搜索
Map compound identifiers across databases:
bash
python scripts/compound_cross_reference.py GeldanamycinRetrieves:
- KEGG compound ID
- ChEBI identifier
- ChEMBL identifier
- Basic compound properties
在不同数据库间映射化合物标识符:
bash
python scripts/compound_cross_reference.py Geldanamycin检索内容:
- KEGG化合物ID
- ChEBI标识符
- ChEMBL标识符
- 基本化合物属性
Batch Identifier Conversion
批量标识符转换
Convert multiple identifiers at once:
bash
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG一次性转换多个标识符:
bash
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGGBest Practices
最佳实践
Output Format Handling
输出格式处理
Different services return data in various formats:
- XML: Parse using BeautifulSoup (most SOAP services)
- Tab-separated (TSV): Pandas DataFrames for tabular data
- Dictionary/JSON: Direct Python manipulation
- FASTA: BioPython integration for sequence analysis
不同服务返回的数据格式各异:
- XML:使用BeautifulSoup解析(大多数SOAP服务)
- 制表分隔(TSV):使用Pandas DataFrames处理表格数据
- 字典/JSON:直接进行Python操作
- FASTA:与BioPython整合进行序列分析
Rate Limiting and Verbosity
请求频率限制与详细程度
Control API request behavior:
python
from bioservices import KEGG
k = KEGG(verbose=False) # Suppress HTTP request details
k.TIMEOUT = 30 # Adjust timeout for slow connections控制API请求行为:
python
from bioservices import KEGG
k = KEGG(verbose=False) # Suppress HTTP request details
k.TIMEOUT = 30 # Adjust timeout for slow connectionsError Handling
错误处理
Wrap service calls in try-except blocks:
python
try:
results = u.search("ambiguous_query")
if results:
# Process results
pass
except Exception as e:
print(f"Search failed: {e}")将服务调用包裹在try-except块中:
python
try:
results = u.search("ambiguous_query")
if results:
# Process results
pass
except Exception as e:
print(f"Search failed: {e}")Organism Codes
物种代码
Use standard organism abbreviations:
- : Homo sapiens (human)
hsa - : Mus musculus (mouse)
mmu - : Drosophila melanogaster
dme - : Saccharomyces cerevisiae (yeast)
sce
List all organisms: or
k.list("organism")k.organismIds使用标准物种缩写:
- :智人(人类)
hsa - :小家鼠(小鼠)
mmu - :黑腹果蝇
dme - :酿酒酵母(酵母)
sce
查看所有物种: 或
k.list("organism")k.organismIdsIntegration with Other Tools
与其他工具整合
BioServices works well with:
- BioPython: Sequence analysis on retrieved FASTA data
- Pandas: Tabular data manipulation
- PyMOL: 3D structure visualization (retrieve PDB IDs)
- NetworkX: Network analysis of pathway interactions
- Galaxy: Custom tool wrappers for workflow platforms
BioServices可与以下工具良好协作:
- BioPython:对检索到的FASTA数据进行序列分析
- Pandas:表格数据处理
- PyMOL:3D结构可视化(检索PDB ID)
- NetworkX:通路相互作用的网络分析
- Galaxy:为工作流平台定制工具包装器
Resources
资源
scripts/
scripts/
Executable Python scripts demonstrating complete workflows:
- : End-to-end protein characterization
protein_analysis_workflow.py - : KEGG pathway discovery and network extraction
pathway_analysis.py - : Multi-database compound searching
compound_cross_reference.py - : Bulk identifier mapping utility
batch_id_converter.py
Scripts can be executed directly or adapted for specific use cases.
可执行Python脚本,演示完整工作流:
- :端到端蛋白质表征
protein_analysis_workflow.py - :KEGG通路发现与网络提取
pathway_analysis.py - :多数据库化合物搜索
compound_cross_reference.py - :批量标识符映射工具
batch_id_converter.py
脚本可直接执行,也可根据特定需求调整。
references/
references/
Detailed documentation loaded as needed:
- : Comprehensive list of all 40+ services with methods
services_reference.md - : Detailed multi-step analysis workflows
workflow_patterns.md - : Complete guide to cross-database ID conversion
identifier_mapping.md
Load references when working with specific services or complex integration tasks.
按需加载的详细文档:
- :所有40余种服务的完整列表及方法
services_reference.md - :详细的多步骤分析工作流
workflow_patterns.md - :完整的跨数据库ID转换指南
identifier_mapping.md
处理特定服务或复杂整合任务时,请加载对应参考文档。
Installation
安装
bash
uv pip install bioservicesDependencies are automatically managed. Package is tested on Python 3.9-3.12.
bash
uv pip install bioservices依赖项将自动管理。该包已在Python 3.9-3.12版本中测试。
Additional Information
补充信息
For detailed API documentation and advanced features, refer to:
- Official documentation: https://bioservices.readthedocs.io/
- Source code: https://github.com/cokelaer/bioservices
- Service-specific references in
references/services_reference.md
如需详细API文档和高级功能,请参考:
- 官方文档:https://bioservices.readthedocs.io/
- 源代码:https://github.com/cokelaer/bioservices
- 中的服务特定参考
references/services_reference.md