chembl-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ChEMBL Database

ChEMBL数据库

Overview

概述

ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.

ChEMBL是由欧洲生物信息研究所（EBI）维护的人工整理的生物活性分子数据库，包含超过200万种化合物、1900万条生物活性测量数据、13000多个药物靶点，以及已获批药物和临床候选药物的数据。可通过ChEMBL Python客户端以编程方式访问和查询这些数据，用于药物发现和药物化学研究。

When to Use This Skill

何时使用该技能

This skill should be used when:

Compound searches: Finding molecules by name, structure, or properties
Target information: Retrieving data about proteins, enzymes, or biological targets
Bioactivity data: Querying IC50, Ki, EC50, or other activity measurements
Drug information: Looking up approved drugs, mechanisms, or indications
Structure searches: Performing similarity or substructure searches
Cheminformatics: Analyzing molecular properties and drug-likeness
Target-ligand relationships: Exploring compound-target interactions
Drug discovery: Identifying inhibitors, agonists, or bioactive molecules

在以下场景中可使用本技能：

化合物搜索：通过名称、结构或属性查找分子
靶点信息：检索蛋白质、酶或生物靶点的数据
生物活性数据：查询IC50、Ki、EC50或其他活性测量数据
药物信息：查找已获批药物的作用机制或适应症
结构搜索：进行相似性或子结构搜索
化学信息学：分析分子属性和类药性
靶点-配体关系：探索化合物与靶点的相互作用
药物发现：识别抑制剂、激动剂或生物活性分子

Installation and Setup

安装与设置

Python Client

Python客户端

The ChEMBL Python client is required for programmatic access:

bash

uv pip install chembl_webresource_client

以编程方式访问需要使用ChEMBL Python客户端：

bash

uv pip install chembl_webresource_client

Basic Usage Pattern

基本使用模式

python

from chembl_webresource_client.new_client import new_client

python

from chembl_webresource_client.new_client import new_client

Access different endpoints

访问不同的端点

molecule = new_client.molecule target = new_client.target activity = new_client.activity drug = new_client.drug

undefined

molecule = new_client.molecule target = new_client.target activity = new_client.activity drug = new_client.drug

undefined

Core Capabilities

核心功能

1. Molecule Queries

1. 分子查询

Retrieve by ChEMBL ID:

python

molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')

Search by name:

python

results = molecule.filter(pref_name__icontains='aspirin')

Filter by properties:

python

undefined

通过ChEMBL ID检索：

python

molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')

通过名称搜索：

python

results = molecule.filter(pref_name__icontains='aspirin')

通过属性过滤：

python

undefined

Find small molecules (MW <= 500) with favorable LogP

查找分子量（MW）≤500且LogP值理想的小分子

results = molecule.filter( molecule_properties__mw_freebase__lte=500, molecule_properties__alogp__lte=5 )

undefined

results = molecule.filter( molecule_properties__mw_freebase__lte=500, molecule_properties__alogp__lte=5 )

undefined

2. Target Queries

2. 靶点查询

Retrieve target information:

python

target = new_client.target
egfr = target.get('CHEMBL203')

Search for specific target types:

python

undefined

检索靶点信息：

python

target = new_client.target
egfr = target.get('CHEMBL203')

搜索特定类型的靶点：

python

undefined

Find all kinase targets

查找所有激酶靶点

kinases = target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )

undefined

kinases = target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )

undefined

3. Bioactivity Data

3. 生物活性数据

Query activities for a target:

python

activity = new_client.activity

查询靶点的活性数据：

python

activity = new_client.activity

Find potent EGFR inhibitors

查找强效EGFR抑制剂

results = activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=100, standard_units='nM' )


**Get all activities for a compound:**
```python
compound_activities = activity.filter(
    molecule_chembl_id='CHEMBL25',
    pchembl_value__isnull=False
)

results = activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=100, standard_units='nM' )


**获取化合物的所有活性数据：**
```python
compound_activities = activity.filter(
    molecule_chembl_id='CHEMBL25',
    pchembl_value__isnull=False
)

4. Structure-Based Searches

4. 基于结构的搜索

Similarity search:

python

similarity = new_client.similarity

相似性搜索：

python

similarity = new_client.similarity

Find compounds similar to aspirin

查找与阿司匹林相似的化合物

similar = similarity.filter( smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85 # 85% similarity threshold )


**Substructure search:**
```python
substructure = new_client.substructure

similar = similarity.filter( smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85 # 85%相似性阈值 )


**子结构搜索：**
```python
substructure = new_client.substructure

Find compounds containing benzene ring

查找包含苯环的化合物

results = substructure.filter(smiles='c1ccccc1')

undefined

results = substructure.filter(smiles='c1ccccc1')

undefined

5. Drug Information

5. 药物信息

Retrieve drug data:

python

drug = new_client.drug
drug_info = drug.get('CHEMBL25')

Get mechanisms of action:

python

mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')

Query drug indications:

python

drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')

检索药物数据：

python

drug = new_client.drug
drug_info = drug.get('CHEMBL25')

获取作用机制：

python

mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')

查询药物适应症：

python

drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')

Query Workflow

查询工作流

Workflow 1: Finding Inhibitors for a Target

工作流1：为靶点寻找抑制剂

Identify the target by searching by name:

python

targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']

Query bioactivity data for that target:

python

activities = new_client.activity.filter(
    target_chembl_id=target_id,
    standard_type='IC50',
    standard_value__lte=100
)

Extract compound IDs and retrieve details:

python

compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]

通过名称搜索确定靶点：

python

targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']

查询该靶点的生物活性数据：

python

activities = new_client.activity.filter(
    target_chembl_id=target_id,
    standard_type='IC50',
    standard_value__lte=100
)

提取化合物ID并检索详情：

python

compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]

Workflow 2: Analyzing a Known Drug

工作流2：分析已知药物

Get drug information:

python

drug_info = new_client.drug.get('CHEMBL1234')

Retrieve mechanisms:

python

mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')

Find all bioactivities:

python

activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')

获取药物信息：

python

drug_info = new_client.drug.get('CHEMBL1234')

检索作用机制：

python

mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')

查找所有活性数据：

python

activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')

Workflow 3: Structure-Activity Relationship (SAR) Study

工作流3：构效关系（SAR）研究

Find similar compounds:

python

similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)

Get activities for each compound:

python

for compound in similar:
    activities = new_client.activity.filter(
        molecule_chembl_id=compound['molecule_chembl_id']
    )

Analyze property-activity relationships using molecular properties from results.

查找相似化合物：

python

similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)

获取每个化合物的活性数据：

python

for compound in similar:
    activities = new_client.activity.filter(
        molecule_chembl_id=compound['molecule_chembl_id']
    )

利用结果中的分子属性分析构效关系。

Filter Operators

过滤操作符

ChEMBL supports Django-style query filters:

```
__exact
```
- Exact match
```
__iexact
```
- Case-insensitive exact match
```
__contains
```
/
```
__icontains
```
- Substring matching
```
__startswith
```
/
```
__endswith
```
- Prefix/suffix matching
```
__gt
```
,
```
__gte
```
,
```
__lt
```
,
```
__lte
```
- Numeric comparisons
```
__range
```
- Value in range
```
__in
```
- Value in list
```
__isnull
```
- Null/not null check

ChEMBL支持类Django的查询过滤操作符：

```
__exact
```
- 精确匹配
```
__iexact
```
- 不区分大小写的精确匹配
```
__contains
```
/
```
__icontains
```
- 子串匹配
```
__startswith
```
/
```
__endswith
```
- 前缀/后缀匹配
```
__gt
```
,
```
__gte
```
,
```
__lt
```
,
```
__lte
```
- 数值比较
```
__range
```
- 范围值匹配
```
__in
```
- 列表值匹配
```
__isnull
```
- 空值/非空值检查

Data Export and Analysis

数据导出与分析

Convert results to pandas DataFrame for analysis:

python

import pandas as pd

activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))

将结果转换为pandas DataFrame以便分析：

python

import pandas as pd

activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))

Analyze results

分析结果

print(df['standard_value'].describe()) print(df.groupby('standard_type').size())

undefined

print(df['standard_value'].describe()) print(df.groupby('standard_type').size())

undefined

Performance Optimization

性能优化

Caching

缓存

The client automatically caches results for 24 hours. Configure caching:

python

from chembl_webresource_client.settings import Settings

客户端会自动将结果缓存24小时。可配置缓存：

python

from chembl_webresource_client.settings import Settings

Disable caching

禁用缓存

Settings.Instance().CACHING = False

Adjust cache expiration (seconds)

调整缓存过期时间（秒）

Settings.Instance().CACHE_EXPIRE = 86400

undefined

Settings.Instance().CACHE_EXPIRE = 86400

undefined

Lazy Evaluation

延迟计算

Queries execute only when data is accessed. Convert to list to force execution:

python

undefined

查询仅在访问数据时执行。转换为列表可强制执行查询：

python

undefined

Query is not executed yet

查询尚未执行

results = molecule.filter(pref_name__icontains='aspirin')

Force execution

强制执行

results_list = list(results)

undefined

results_list = list(results)

undefined

Pagination

分页

Results are paginated automatically. Iterate through all results:

python

for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
    # Process each activity
    print(activity['molecule_chembl_id'])

结果会自动分页。可遍历所有结果：

python

for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
    # 处理每个活性数据
    print(activity['molecule_chembl_id'])

Common Use Cases

常见用例

Find Kinase Inhibitors

寻找激酶抑制剂

python

undefined

python

undefined

Identify kinase targets

识别激酶靶点

kinases = new_client.target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )

Get potent inhibitors

获取强效抑制剂

for kinase in kinases[:5]: # First 5 kinases activities = new_client.activity.filter( target_chembl_id=kinase['target_chembl_id'], standard_type='IC50', standard_value__lte=50 )

undefined

for kinase in kinases[:5]: # 前5个激酶 activities = new_client.activity.filter( target_chembl_id=kinase['target_chembl_id'], standard_type='IC50', standard_value__lte=50 )

undefined

Explore Drug Repurposing

探索药物重定位

python

undefined

python

undefined

Get approved drugs

获取已获批药物

drugs = new_client.drug.filter()

For each drug, find all targets

为每种药物查找所有靶点

for drug in drugs[:10]: mechanisms = new_client.mechanism.filter( molecule_chembl_id=drug['molecule_chembl_id'] )

undefined

for drug in drugs[:10]: mechanisms = new_client.mechanism.filter( molecule_chembl_id=drug['molecule_chembl_id'] )

undefined

Virtual Screening

虚拟筛选

python

undefined

python

undefined

Find compounds with desired properties

查找具有理想属性的化合物

candidates = new_client.molecule.filter( molecule_properties__mw_freebase__range=[300, 500], molecule_properties__alogp__lte=5, molecule_properties__hba__lte=10, molecule_properties__hbd__lte=5 )

undefined

candidates = new_client.molecule.filter( molecule_properties__mw_freebase__range=[300, 500], molecule_properties__alogp__lte=5, molecule_properties__hba__lte=10, molecule_properties__hbd__lte=5 )

undefined

Resources

资源

scripts/example_queries.py

Ready-to-use Python functions demonstrating common ChEMBL query patterns:

```
get_molecule_info()
```
- Retrieve molecule details by ID
```
search_molecules_by_name()
```
- Name-based molecule search
```
find_molecules_by_properties()
```
- Property-based filtering
```
get_bioactivity_data()
```
- Query bioactivities for targets
```
find_similar_compounds()
```
- Similarity searching
```
substructure_search()
```
- Substructure matching
```
get_drug_info()
```
- Retrieve drug information
```
find_kinase_inhibitors()
```
- Specialized kinase inhibitor search
```
export_to_dataframe()
```
- Convert results to pandas DataFrame

Consult this script for implementation details and usage examples.

包含可直接使用的Python函数，展示常见的ChEMBL查询模式：

```
get_molecule_info()
```
- 通过ID检索分子详情
```
search_molecules_by_name()
```
- 基于名称的分子搜索
```
find_molecules_by_properties()
```
- 基于属性的筛选
```
get_bioactivity_data()
```
- 查询靶点的生物活性数据
```
find_similar_compounds()
```
- 相似性搜索
```
substructure_search()
```
- 子结构匹配
```
get_drug_info()
```
- 检索药物信息
```
find_kinase_inhibitors()
```
- 激酶抑制剂专项搜索
```
export_to_dataframe()
```
- 将结果转换为pandas DataFrame

可参考该脚本了解实现细节和使用示例。

references/api_reference.md

Comprehensive API documentation including:

Complete endpoint listing (molecule, target, activity, assay, drug, etc.)
All filter operators and query patterns
Molecular properties and bioactivity fields
Advanced query examples
Configuration and performance tuning
Error handling and rate limiting

Refer to this document when detailed API information is needed or when troubleshooting queries.

全面的API文档，包括：

完整的端点列表（molecule、target、activity、assay、drug等）
所有过滤操作符和查询模式
分子属性和生物活性字段
高级查询示例
配置与性能调优
错误处理与速率限制

当需要详细API信息或排查查询问题时，可参考该文档。

Important Notes

重要说明

Data Reliability

数据可靠性

ChEMBL data is manually curated but may contain inconsistencies
Always check
```
data_validity_comment
```
field in activity records
Be aware of
```
potential_duplicate
```
flags

ChEMBL数据经过人工整理，但可能存在不一致性
请务必检查活性记录中的
```
data_validity_comment
```
字段
注意
```
potential_duplicate
```
标记

Units and Standards

单位与标准

Bioactivity values use standard units (nM, uM, etc.)
```
pchembl_value
```
provides normalized activity (-log scale)
Check
```
standard_type
```
to understand measurement type (IC50, Ki, EC50, etc.)

生物活性值使用标准单位（nM、uM等）
```
pchembl_value
```
提供标准化的活性值（-log刻度）
请检查
```
standard_type
```
以了解测量类型（IC50、Ki、EC50等）

Rate Limiting

速率限制

Respect ChEMBL's fair usage policies
Use caching to minimize repeated requests
Consider bulk downloads for large datasets
Avoid hammering the API with rapid consecutive requests

遵守ChEMBL的合理使用政策
使用缓存减少重复请求
对于大型数据集，考虑批量下载
避免频繁连续请求API

Chemical Structure Formats

化学结构格式

SMILES strings are the primary structure format
InChI keys available for compounds
SVG images can be generated via the image endpoint

SMILES字符串是主要的结构格式
化合物提供InChI键
可通过图像端点生成SVG图像

Additional Resources

额外资源

ChEMBL website: https://www.ebi.ac.uk/chembl/
API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
Python client GitHub: https://github.com/chembl/chembl_webresource_client
Interface documentation: https://chembl.gitbook.io/chembl-interface-documentation/
Example notebooks: https://github.com/chembl/notebooks

ChEMBL官网：https://www.ebi.ac.uk/chembl/
API文档：https://www.ebi.ac.uk/chembl/api/data/docs
Python客户端GitHub：https://github.com/chembl/chembl_webresource_client
接口文档：https://chembl.gitbook.io/chembl-interface-documentation/
示例笔记本：https://github.com/chembl/notebooks