cheminformatics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cheminformatics

化学信息学

Description

描述

Cheminformatics provides computational chemistry workflows using RDKit for molecular property prediction, virtual screening, ADMET analysis, molecular docking preparation, and chemical space exploration. The agent generates reproducible cheminformatics pipelines that transform molecular structures (SMILES, SDF) into actionable predictions about drug-likeness, toxicity, and binding affinity.
Drug discovery generates vast chemical libraries that cannot all be synthesized and tested. Cheminformatics narrows the search space computationally: filtering by Lipinski's Rule of Five, predicting ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity), scoring docking poses, and clustering chemical space to identify diverse lead candidates. Each step eliminates compounds that would fail in later, more expensive stages.
This skill covers the molecular informatics workflow from SMILES parsing through descriptor calculation, fingerprint generation, similarity searching, property prediction, and visualization. It integrates with databases like PubChem and ChEMBL for compound retrieval and benchmarking against known actives and inactives.
化学信息学提供基于RDKit的计算化学工作流,用于分子性质预测、虚拟筛选、ADMET分析、分子对接准备以及化学空间探索。该Agent可生成可复现的化学信息学管道,将分子结构(SMILES、SDF)转化为关于药物相似性、毒性和结合亲和力的可执行预测结果。
药物研发会产生海量的化学库,无法全部进行合成和测试。化学信息学通过计算方法缩小搜索范围:依据Lipinski五规则进行过滤、预测ADMET性质(吸收、分布、代谢、排泄、毒性)、对接构象打分,以及对化学空间进行聚类以识别多样化的先导化合物。每一步都会剔除那些在后续更昂贵阶段可能失败的化合物。
本技能涵盖从SMILES解析、描述符计算、指纹生成、相似性搜索、性质预测到可视化的分子信息学工作流。它可与PubChem和ChEMBL等数据库集成,用于化合物检索以及与已知活性和非活性化合物进行基准测试。

Use When

使用场景

  • Calculating molecular properties and descriptors
  • Screening compound libraries for drug-likeness
  • Predicting ADMET properties for lead compounds
  • Performing molecular similarity searches
  • Preparing structures for molecular docking
  • Visualizing chemical space and structure-activity relationships
  • 计算分子性质和描述符
  • 筛选化合物库的药物相似性
  • 预测先导化合物的ADMET性质
  • 执行分子相似性搜索
  • 为分子对接准备结构
  • 可视化化学空间和构效关系

How It Works

工作原理

mermaid
graph TD
    A[Molecular Input: SMILES/SDF] --> B[Parse + Validate Structures]
    B --> C[Calculate Descriptors]
    C --> D[Drug-likeness Filters]
    D --> E{Passes Lipinski?}
    E -->|No| F[Flag as Non-Drug-like]
    E -->|Yes| G[ADMET Prediction]
    G --> H[Virtual Screening Score]
    H --> I[Docking Preparation]
    I --> J[Ranked Candidate List]
    F --> K[Report with Flags]
    J --> K
Compounds flow through increasingly selective filters. Drug-likeness removes obviously non-viable candidates, ADMET prediction flags absorption and toxicity risks, and virtual screening ranks the survivors by predicted activity.
mermaid
graph TD
    A[Molecular Input: SMILES/SDF] --> B[Parse + Validate Structures]
    B --> C[Calculate Descriptors]
    C --> D[Drug-likeness Filters]
    D --> E{Passes Lipinski?}
    E -->|No| F[Flag as Non-Drug-like]
    E -->|Yes| G[ADMET Prediction]
    G --> H[Virtual Screening Score]
    H --> I[Docking Preparation]
    I --> J[Ranked Candidate List]
    F --> K[Report with Flags]
    J --> K
化合物会经过一系列筛选条件逐渐严格的过滤器。药物相似性过滤会剔除明显不可行的候选物,ADMET预测会标记吸收和毒性风险,虚拟筛选则会根据预测活性对剩余候选物进行排名。

Implementation

实现代码

python
from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem, Draw, Lipinski, DataStructs
from rdkit.Chem import rdMolDescriptors
import pandas as pd

def molecular_properties(smiles: str) -> dict:
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError(f"Invalid SMILES: {smiles}")
    return {
        "smiles": smiles,
        "mw": Descriptors.MolWt(mol),
        "logp": Descriptors.MolLogP(mol),
        "hbd": Descriptors.NumHDonors(mol),
        "hba": Descriptors.NumHAcceptors(mol),
        "tpsa": Descriptors.TPSA(mol),
        "rotatable_bonds": Descriptors.NumRotatableBonds(mol),
        "rings": Descriptors.RingCount(mol),
        "lipinski_violations": sum([
            Descriptors.MolWt(mol) > 500,
            Descriptors.MolLogP(mol) > 5,
            Descriptors.NumHDonors(mol) > 5,
            Descriptors.NumHAcceptors(mol) > 10,
        ]),
    }

def lipinski_filter(df: pd.DataFrame) -> pd.DataFrame:
    return df[df["lipinski_violations"] <= 1].copy()

def similarity_search(query_smiles: str, library: list[str], threshold: float = 0.7) -> list[dict]:
    query_mol = Chem.MolFromSmiles(query_smiles)
    query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, radius=2, nBits=2048)

    results = []
    for smi in library:
        mol = Chem.MolFromSmiles(smi)
        if mol is None:
            continue
        fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)
        tanimoto = DataStructs.TanimotoSimilarity(query_fp, fp)
        if tanimoto >= threshold:
            results.append({"smiles": smi, "tanimoto": tanimoto})

    return sorted(results, key=lambda x: -x["tanimoto"])

def admet_flags(props: dict) -> list[str]:
    flags = []
    if props["logp"] > 5:
        flags.append("High lipophilicity: poor aqueous solubility risk")
    if props["tpsa"] > 140:
        flags.append("High TPSA: poor membrane permeability risk")
    if props["mw"] > 500:
        flags.append("High MW: poor oral absorption risk")
    if props["rotatable_bonds"] > 10:
        flags.append("High flexibility: poor oral bioavailability risk")
    return flags
python
from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem, Draw, Lipinski, DataStructs
from rdkit.Chem import rdMolDescriptors
import pandas as pd

def molecular_properties(smiles: str) -> dict:
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError(f"Invalid SMILES: {smiles}")
    return {
        "smiles": smiles,
        "mw": Descriptors.MolWt(mol),
        "logp": Descriptors.MolLogP(mol),
        "hbd": Descriptors.NumHDonors(mol),
        "hba": Descriptors.NumHAcceptors(mol),
        "tpsa": Descriptors.TPSA(mol),
        "rotatable_bonds": Descriptors.NumRotatableBonds(mol),
        "rings": Descriptors.RingCount(mol),
        "lipinski_violations": sum([
            Descriptors.MolWt(mol) > 500,
            Descriptors.MolLogP(mol) > 5,
            Descriptors.NumHDonors(mol) > 5,
            Descriptors.NumHAcceptors(mol) > 10,
        ]),
    }

def lipinski_filter(df: pd.DataFrame) -> pd.DataFrame:
    return df[df["lipinski_violations"] <= 1].copy()

def similarity_search(query_smiles: str, library: list[str], threshold: float = 0.7) -> list[dict]:
    query_mol = Chem.MolFromSmiles(query_smiles)
    query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, radius=2, nBits=2048)

    results = []
    for smi in library:
        mol = Chem.MolFromSmiles(smi)
        if mol is None:
            continue
        fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)
        tanimoto = DataStructs.TanimotoSimilarity(query_fp, fp)
        if tanimoto >= threshold:
            results.append({"smiles": smi, "tanimoto": tanimoto})

    return sorted(results, key=lambda x: -x["tanimoto"])

def admet_flags(props: dict) -> list[str]:
    flags = []
    if props["logp"] > 5:
        flags.append("High lipophilicity: poor aqueous solubility risk")
    if props["tpsa"] > 140:
        flags.append("High TPSA: poor membrane permeability risk")
    if props["mw"] > 500:
        flags.append("High MW: poor oral absorption risk")
    if props["rotatable_bonds"] > 10:
        flags.append("High flexibility: poor oral bioavailability risk")
    return flags

Best Practices

最佳实践

  • Always validate SMILES parsing before computing descriptors—invalid structures produce silent errors
  • Use Morgan fingerprints (radius=2, 2048 bits) as the default for similarity calculations
  • Apply Lipinski's Rule of Five as a first-pass filter, not an absolute cutoff
  • Report Tanimoto similarity thresholds used in all similarity searches
  • Standardize molecules (desalt, neutralize, canonicalize) before comparison
  • Visualize chemical space with t-SNE or UMAP on fingerprint representations
  • 在计算描述符前务必验证SMILES解析结果——无效结构会产生无提示错误
  • 相似性计算默认使用Morgan指纹(半径=2,2048位)
  • 将Lipinski五规则作为初步过滤条件,而非绝对的筛选标准
  • 在所有相似性搜索中报告所使用的Tanimoto相似性阈值
  • 比较前对分子进行标准化处理(脱盐、中和、规范化)
  • 基于指纹表示,使用t-SNE或UMAP可视化化学空间

Platform Compatibility

平台兼容性

PlatformSupportNotes
CursorFullPython + RDKit environment
VS CodeFullJupyter + molecular viz
WindsurfFullScientific Python
Claude CodeFullPipeline generation
ClineFullCheminformatics workflows
aiderPartialCode generation only
平台支持情况备注
Cursor完全支持Python + RDKit环境
VS Code完全支持Jupyter + 分子可视化
Windsurf完全支持科学计算Python环境
Claude Code完全支持管道生成
Cline完全支持化学信息学工作流
aider部分支持仅代码生成

Related Skills

相关技能

  • Bioinformatics
  • Database Lookup
  • Machine Learning
  • Batch Processing
  • 生物信息学
  • 数据库查询
  • 机器学习
  • 批量处理

Keywords

关键词

cheminformatics
rdkit
molecular-properties
virtual-screening
admet
lipinski
drug-discovery
molecular-similarity

© 2026 googleadsagent.ai™ | Agent Skills™ | MIT License
cheminformatics
rdkit
molecular-properties
virtual-screening
admet
lipinski
drug-discovery
molecular-similarity

© 2026 googleadsagent.ai™ | Agent Skills™ | MIT License