chembl-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ChEMBL Database

ChEMBL数据库

Overview

概述

ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
ChEMBL是由欧洲生物信息研究所(EBI)维护的人工整理的生物活性分子数据库,包含超过200万种化合物、1900万条生物活性测量数据、13000多个药物靶点,以及已获批药物和临床候选药物的数据。可通过ChEMBL Python客户端以编程方式访问和查询这些数据,用于药物发现和药物化学研究。

When to Use This Skill

何时使用该技能

This skill should be used when:
  • Compound searches: Finding molecules by name, structure, or properties
  • Target information: Retrieving data about proteins, enzymes, or biological targets
  • Bioactivity data: Querying IC50, Ki, EC50, or other activity measurements
  • Drug information: Looking up approved drugs, mechanisms, or indications
  • Structure searches: Performing similarity or substructure searches
  • Cheminformatics: Analyzing molecular properties and drug-likeness
  • Target-ligand relationships: Exploring compound-target interactions
  • Drug discovery: Identifying inhibitors, agonists, or bioactive molecules
在以下场景中可使用本技能:
  • 化合物搜索:通过名称、结构或属性查找分子
  • 靶点信息:检索蛋白质、酶或生物靶点的数据
  • 生物活性数据:查询IC50、Ki、EC50或其他活性测量数据
  • 药物信息:查找已获批药物的作用机制或适应症
  • 结构搜索:进行相似性或子结构搜索
  • 化学信息学:分析分子属性和类药性
  • 靶点-配体关系:探索化合物与靶点的相互作用
  • 药物发现:识别抑制剂、激动剂或生物活性分子

Installation and Setup

安装与设置

Python Client

Python客户端

The ChEMBL Python client is required for programmatic access:
bash
uv pip install chembl_webresource_client
以编程方式访问需要使用ChEMBL Python客户端:
bash
uv pip install chembl_webresource_client

Basic Usage Pattern

基本使用模式

python
from chembl_webresource_client.new_client import new_client
python
from chembl_webresource_client.new_client import new_client

Access different endpoints

访问不同的端点

molecule = new_client.molecule target = new_client.target activity = new_client.activity drug = new_client.drug
undefined
molecule = new_client.molecule target = new_client.target activity = new_client.activity drug = new_client.drug
undefined

Core Capabilities

核心功能

1. Molecule Queries

1. 分子查询

Retrieve by ChEMBL ID:
python
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
Search by name:
python
results = molecule.filter(pref_name__icontains='aspirin')
Filter by properties:
python
undefined
通过ChEMBL ID检索:
python
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
通过名称搜索:
python
results = molecule.filter(pref_name__icontains='aspirin')
通过属性过滤:
python
undefined

Find small molecules (MW <= 500) with favorable LogP

查找分子量(MW)≤500且LogP值理想的小分子

results = molecule.filter( molecule_properties__mw_freebase__lte=500, molecule_properties__alogp__lte=5 )
undefined
results = molecule.filter( molecule_properties__mw_freebase__lte=500, molecule_properties__alogp__lte=5 )
undefined

2. Target Queries

2. 靶点查询

Retrieve target information:
python
target = new_client.target
egfr = target.get('CHEMBL203')
Search for specific target types:
python
undefined
检索靶点信息:
python
target = new_client.target
egfr = target.get('CHEMBL203')
搜索特定类型的靶点:
python
undefined

Find all kinase targets

查找所有激酶靶点

kinases = target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )
undefined
kinases = target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )
undefined

3. Bioactivity Data

3. 生物活性数据

Query activities for a target:
python
activity = new_client.activity
查询靶点的活性数据:
python
activity = new_client.activity

Find potent EGFR inhibitors

查找强效EGFR抑制剂

results = activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=100, standard_units='nM' )

**Get all activities for a compound:**
```python
compound_activities = activity.filter(
    molecule_chembl_id='CHEMBL25',
    pchembl_value__isnull=False
)
results = activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=100, standard_units='nM' )

**获取化合物的所有活性数据:**
```python
compound_activities = activity.filter(
    molecule_chembl_id='CHEMBL25',
    pchembl_value__isnull=False
)

4. Structure-Based Searches

4. 基于结构的搜索

Similarity search:
python
similarity = new_client.similarity
相似性搜索:
python
similarity = new_client.similarity

Find compounds similar to aspirin

查找与阿司匹林相似的化合物

similar = similarity.filter( smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85 # 85% similarity threshold )

**Substructure search:**
```python
substructure = new_client.substructure
similar = similarity.filter( smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85 # 85%相似性阈值 )

**子结构搜索:**
```python
substructure = new_client.substructure

Find compounds containing benzene ring

查找包含苯环的化合物

results = substructure.filter(smiles='c1ccccc1')
undefined
results = substructure.filter(smiles='c1ccccc1')
undefined

5. Drug Information

5. 药物信息

Retrieve drug data:
python
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
Get mechanisms of action:
python
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
Query drug indications:
python
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
检索药物数据:
python
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
获取作用机制:
python
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
查询药物适应症:
python
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')

Query Workflow

查询工作流

Workflow 1: Finding Inhibitors for a Target

工作流1:为靶点寻找抑制剂

  1. Identify the target by searching by name:
    python
    targets = new_client.target.filter(pref_name__icontains='EGFR')
    target_id = targets[0]['target_chembl_id']
  2. Query bioactivity data for that target:
    python
    activities = new_client.activity.filter(
        target_chembl_id=target_id,
        standard_type='IC50',
        standard_value__lte=100
    )
  3. Extract compound IDs and retrieve details:
    python
    compound_ids = [act['molecule_chembl_id'] for act in activities]
    compounds = [new_client.molecule.get(cid) for cid in compound_ids]
  1. 通过名称搜索确定靶点
    python
    targets = new_client.target.filter(pref_name__icontains='EGFR')
    target_id = targets[0]['target_chembl_id']
  2. 查询该靶点的生物活性数据
    python
    activities = new_client.activity.filter(
        target_chembl_id=target_id,
        standard_type='IC50',
        standard_value__lte=100
    )
  3. 提取化合物ID并检索详情
    python
    compound_ids = [act['molecule_chembl_id'] for act in activities]
    compounds = [new_client.molecule.get(cid) for cid in compound_ids]

Workflow 2: Analyzing a Known Drug

工作流2:分析已知药物

  1. Get drug information:
    python
    drug_info = new_client.drug.get('CHEMBL1234')
  2. Retrieve mechanisms:
    python
    mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
  3. Find all bioactivities:
    python
    activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
  1. 获取药物信息
    python
    drug_info = new_client.drug.get('CHEMBL1234')
  2. 检索作用机制
    python
    mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
  3. 查找所有活性数据
    python
    activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')

Workflow 3: Structure-Activity Relationship (SAR) Study

工作流3:构效关系(SAR)研究

  1. Find similar compounds:
    python
    similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
  2. Get activities for each compound:
    python
    for compound in similar:
        activities = new_client.activity.filter(
            molecule_chembl_id=compound['molecule_chembl_id']
        )
  3. Analyze property-activity relationships using molecular properties from results.
  1. 查找相似化合物
    python
    similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
  2. 获取每个化合物的活性数据
    python
    for compound in similar:
        activities = new_client.activity.filter(
            molecule_chembl_id=compound['molecule_chembl_id']
        )
  3. 利用结果中的分子属性分析构效关系

Filter Operators

过滤操作符

ChEMBL supports Django-style query filters:
  • __exact
    - Exact match
  • __iexact
    - Case-insensitive exact match
  • __contains
    /
    __icontains
    - Substring matching
  • __startswith
    /
    __endswith
    - Prefix/suffix matching
  • __gt
    ,
    __gte
    ,
    __lt
    ,
    __lte
    - Numeric comparisons
  • __range
    - Value in range
  • __in
    - Value in list
  • __isnull
    - Null/not null check
ChEMBL支持类Django的查询过滤操作符:
  • __exact
    - 精确匹配
  • __iexact
    - 不区分大小写的精确匹配
  • __contains
    /
    __icontains
    - 子串匹配
  • __startswith
    /
    __endswith
    - 前缀/后缀匹配
  • __gt
    ,
    __gte
    ,
    __lt
    ,
    __lte
    - 数值比较
  • __range
    - 范围值匹配
  • __in
    - 列表值匹配
  • __isnull
    - 空值/非空值检查

Data Export and Analysis

数据导出与分析

Convert results to pandas DataFrame for analysis:
python
import pandas as pd

activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
将结果转换为pandas DataFrame以便分析:
python
import pandas as pd

activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))

Analyze results

分析结果

print(df['standard_value'].describe()) print(df.groupby('standard_type').size())
undefined
print(df['standard_value'].describe()) print(df.groupby('standard_type').size())
undefined

Performance Optimization

性能优化

Caching

缓存

The client automatically caches results for 24 hours. Configure caching:
python
from chembl_webresource_client.settings import Settings
客户端会自动将结果缓存24小时。可配置缓存:
python
from chembl_webresource_client.settings import Settings

Disable caching

禁用缓存

Settings.Instance().CACHING = False
Settings.Instance().CACHING = False

Adjust cache expiration (seconds)

调整缓存过期时间(秒)

Settings.Instance().CACHE_EXPIRE = 86400
undefined
Settings.Instance().CACHE_EXPIRE = 86400
undefined

Lazy Evaluation

延迟计算

Queries execute only when data is accessed. Convert to list to force execution:
python
undefined
查询仅在访问数据时执行。转换为列表可强制执行查询:
python
undefined

Query is not executed yet

查询尚未执行

results = molecule.filter(pref_name__icontains='aspirin')
results = molecule.filter(pref_name__icontains='aspirin')

Force execution

强制执行

results_list = list(results)
undefined
results_list = list(results)
undefined

Pagination

分页

Results are paginated automatically. Iterate through all results:
python
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
    # Process each activity
    print(activity['molecule_chembl_id'])
结果会自动分页。可遍历所有结果:
python
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
    # 处理每个活性数据
    print(activity['molecule_chembl_id'])

Common Use Cases

常见用例

Find Kinase Inhibitors

寻找激酶抑制剂

python
undefined
python
undefined

Identify kinase targets

识别激酶靶点

kinases = new_client.target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )
kinases = new_client.target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )

Get potent inhibitors

获取强效抑制剂

for kinase in kinases[:5]: # First 5 kinases activities = new_client.activity.filter( target_chembl_id=kinase['target_chembl_id'], standard_type='IC50', standard_value__lte=50 )
undefined
for kinase in kinases[:5]: # 前5个激酶 activities = new_client.activity.filter( target_chembl_id=kinase['target_chembl_id'], standard_type='IC50', standard_value__lte=50 )
undefined

Explore Drug Repurposing

探索药物重定位

python
undefined
python
undefined

Get approved drugs

获取已获批药物

drugs = new_client.drug.filter()
drugs = new_client.drug.filter()

For each drug, find all targets

为每种药物查找所有靶点

for drug in drugs[:10]: mechanisms = new_client.mechanism.filter( molecule_chembl_id=drug['molecule_chembl_id'] )
undefined
for drug in drugs[:10]: mechanisms = new_client.mechanism.filter( molecule_chembl_id=drug['molecule_chembl_id'] )
undefined

Virtual Screening

虚拟筛选

python
undefined
python
undefined

Find compounds with desired properties

查找具有理想属性的化合物

candidates = new_client.molecule.filter( molecule_properties__mw_freebase__range=[300, 500], molecule_properties__alogp__lte=5, molecule_properties__hba__lte=10, molecule_properties__hbd__lte=5 )
undefined
candidates = new_client.molecule.filter( molecule_properties__mw_freebase__range=[300, 500], molecule_properties__alogp__lte=5, molecule_properties__hba__lte=10, molecule_properties__hbd__lte=5 )
undefined

Resources

资源

scripts/example_queries.py

scripts/example_queries.py

Ready-to-use Python functions demonstrating common ChEMBL query patterns:
  • get_molecule_info()
    - Retrieve molecule details by ID
  • search_molecules_by_name()
    - Name-based molecule search
  • find_molecules_by_properties()
    - Property-based filtering
  • get_bioactivity_data()
    - Query bioactivities for targets
  • find_similar_compounds()
    - Similarity searching
  • substructure_search()
    - Substructure matching
  • get_drug_info()
    - Retrieve drug information
  • find_kinase_inhibitors()
    - Specialized kinase inhibitor search
  • export_to_dataframe()
    - Convert results to pandas DataFrame
Consult this script for implementation details and usage examples.
包含可直接使用的Python函数,展示常见的ChEMBL查询模式:
  • get_molecule_info()
    - 通过ID检索分子详情
  • search_molecules_by_name()
    - 基于名称的分子搜索
  • find_molecules_by_properties()
    - 基于属性的筛选
  • get_bioactivity_data()
    - 查询靶点的生物活性数据
  • find_similar_compounds()
    - 相似性搜索
  • substructure_search()
    - 子结构匹配
  • get_drug_info()
    - 检索药物信息
  • find_kinase_inhibitors()
    - 激酶抑制剂专项搜索
  • export_to_dataframe()
    - 将结果转换为pandas DataFrame
可参考该脚本了解实现细节和使用示例。

references/api_reference.md

references/api_reference.md

Comprehensive API documentation including:
  • Complete endpoint listing (molecule, target, activity, assay, drug, etc.)
  • All filter operators and query patterns
  • Molecular properties and bioactivity fields
  • Advanced query examples
  • Configuration and performance tuning
  • Error handling and rate limiting
Refer to this document when detailed API information is needed or when troubleshooting queries.
全面的API文档,包括:
  • 完整的端点列表(molecule、target、activity、assay、drug等)
  • 所有过滤操作符和查询模式
  • 分子属性和生物活性字段
  • 高级查询示例
  • 配置与性能调优
  • 错误处理与速率限制
当需要详细API信息或排查查询问题时,可参考该文档。

Important Notes

重要说明

Data Reliability

数据可靠性

  • ChEMBL data is manually curated but may contain inconsistencies
  • Always check
    data_validity_comment
    field in activity records
  • Be aware of
    potential_duplicate
    flags
  • ChEMBL数据经过人工整理,但可能存在不一致性
  • 请务必检查活性记录中的
    data_validity_comment
    字段
  • 注意
    potential_duplicate
    标记

Units and Standards

单位与标准

  • Bioactivity values use standard units (nM, uM, etc.)
  • pchembl_value
    provides normalized activity (-log scale)
  • Check
    standard_type
    to understand measurement type (IC50, Ki, EC50, etc.)
  • 生物活性值使用标准单位(nM、uM等)
  • pchembl_value
    提供标准化的活性值(-log刻度)
  • 请检查
    standard_type
    以了解测量类型(IC50、Ki、EC50等)

Rate Limiting

速率限制

  • Respect ChEMBL's fair usage policies
  • Use caching to minimize repeated requests
  • Consider bulk downloads for large datasets
  • Avoid hammering the API with rapid consecutive requests
  • 遵守ChEMBL的合理使用政策
  • 使用缓存减少重复请求
  • 对于大型数据集,考虑批量下载
  • 避免频繁连续请求API

Chemical Structure Formats

化学结构格式

  • SMILES strings are the primary structure format
  • InChI keys available for compounds
  • SVG images can be generated via the image endpoint
  • SMILES字符串是主要的结构格式
  • 化合物提供InChI键
  • 可通过图像端点生成SVG图像

Additional Resources

额外资源