foldseek

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Foldseek Structure Search

Foldseek 结构搜索

Prerequisites

前置要求

RequirementMinimumRecommended
Python3.8+3.10
RAM8GB16GB
Disk10GB50GB (for local databases)
要求最低配置推荐配置
Python3.8+3.10
内存8GB16GB
磁盘10GB50GB(用于本地数据库)

How to run

运行方法

Note: Foldseek can run locally or via web server. No GPU required.
注意:Foldseek可在本地运行或通过Web服务器运行,无需GPU。

Option 1: Web Server (Quick; rate-limited, use sparingly)

选项1:Web服务器(快速;有调用频率限制,谨慎使用)

bash
undefined
bash
undefined

Upload structure to web server

Upload structure to web server

curl -X POST "https://search.foldseek.com/api/ticket"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
undefined
curl -X POST "https://search.foldseek.com/api/ticket"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
undefined

Option 2: Local installation

选项2:本地安装

bash
undefined
bash
undefined

Install Foldseek

Install Foldseek

conda install -c conda-forge -c bioconda foldseek
conda install -c conda-forge -c bioconda foldseek

Search PDB

Search PDB

foldseek easy-search query.pdb /path/to/pdb100 results.m8 tmp/
foldseek easy-search query.pdb /path/to/pdb100 results.m8 tmp/

Search AlphaFold DB

Search AlphaFold DB

foldseek easy-search query.pdb /path/to/afdb50 results.m8 tmp/
undefined
foldseek easy-search query.pdb /path/to/afdb50 results.m8 tmp/
undefined

Option 3: Python API

选项3:Python API

python
import subprocess
import pandas as pd

def foldseek_search(query_pdb, database, output="results.m8"):
    """Run Foldseek search."""
    subprocess.run([
        "foldseek", "easy-search",
        query_pdb, database, output, "tmp/",
        "--format-output", "query,target,pident,alnlen,evalue,bits"
    ])
    return pd.read_csv(output, sep="\t",
                       names=["query", "target", "pident", "alnlen", "evalue", "bits"])
python
import subprocess
import pandas as pd

def foldseek_search(query_pdb, database, output="results.m8"):
    """运行Foldseek搜索。"""
    subprocess.run([
        "foldseek", "easy-search",
        query_pdb, database, output, "tmp/",
        "--format-output", "query,target,pident,alnlen,evalue,bits"
    ])
    return pd.read_csv(output, sep="\t",
                       names=["query", "target", "pident", "alnlen", "evalue", "bits"])

Key parameters

关键参数

ParameterDefaultDescription
--min-seq-id
0.0Minimum sequence identity
-e
0.001E-value threshold
--alignment-type
20=3Di, 1=TM, 2=3Di+AA
--max-seqs
300Max hits to pass through prefilter; reducing this affects sensitivity
参数默认值说明
--min-seq-id
0.0最小序列一致性
-e
0.001E值阈值
--alignment-type
20=3Di, 1=TM, 2=3Di+AA
--max-seqs
300预过滤后保留的最大命中数;降低该值会影响灵敏度

Databases

数据库

DatabaseDescriptionSize
pdb100
PDB clustered at 100%~200K structures
afdb50
AlphaFold DB at 50%~67M structures
swissprot
SwissProt structures~500K structures
cath50
CATH domains~50K domains
数据库说明大小
pdb100
按100%一致性聚类的PDB库~200K个结构
afdb50
按50%一致性聚类的AlphaFold DB库~67M个结构
swissprot
SwissProt结构库~500K个结构
cath50
CATH结构域库~50K个结构域

Output format

输出格式

undefined
undefined

results.m8 (tabular)

results.m8 (tabular)

query target pident alnlen evalue bits query 1abc_A 85.2 120 1e-45 180.5 query 2def_B 72.1 115 1e-32 145.2
undefined
query target pident alnlen evalue bits query 1abc_A 85.2 120 1e-45 180.5 query 2def_B 72.1 115 1e-32 145.2
undefined

Sample output

示例输出

Successful run

运行成功示例

$ foldseek easy-search query.pdb pdb100 results.m8 tmp/
[INFO] Loading database: pdb100 (194,527 entries)
[INFO] Searching...
[INFO] Found 127 hits

Top 5 hits:
1. 1abc_A - 85.2% identity, E=1e-45
2. 2def_B - 72.1% identity, E=1e-32
3. 3ghi_C - 68.5% identity, E=1e-28
4. 4jkl_A - 55.3% identity, E=1e-18
5. 5mno_B - 42.1% identity, E=1e-10
$ foldseek easy-search query.pdb pdb100 results.m8 tmp/
[INFO] Loading database: pdb100 (194,527 entries)
[INFO] Searching...
[INFO] Found 127 hits

前5个命中结果:
1. 1abc_A - 85.2%一致性, E=1e-45
2. 2def_B - 72.1%一致性, E=1e-32
3. 3ghi_C - 68.5%一致性, E=1e-28
4. 4jkl_A - 55.3%一致性, E=1e-18
5. 5mno_B - 42.1%一致性, E=1e-10

Decision tree

决策树

Should I use Foldseek?
├─ What are you searching?
│  ├─ By 3D structure → Foldseek ✓
│  ├─ By sequence → Use BLAST (uniprot skill)
│  └─ Both → Run both, compare results
└─ What do you need?
   ├─ Find structural homologs → Foldseek ✓
   ├─ Remote homolog detection → Foldseek ✓
   ├─ Structural clustering → Foldseek ✓
   └─ Functional annotation → Cross-reference with UniProt
是否应该使用Foldseek?
├─ 你要搜索什么?
│  ├─ 通过3D结构 → Foldseek ✓
│  ├─ 通过序列 → 使用BLAST (uniprot skill)
│  └─ 两者都要 → 同时运行两者,对比结果
└─ 你需要实现什么目标?
   ├─ 查找结构同源物 → Foldseek ✓
   ├─ 远源同源物检测 → Foldseek ✓
   ├─ 结构聚类 → Foldseek ✓
   └─ 功能注释 → 与UniProt交叉参考

Common use cases

常见使用场景

Find similar designs

查找相似设计

bash
undefined
bash
undefined

Compare your design to PDB

将你的设计与PDB库对比

foldseek easy-search design.pdb pdb100 similar_natural.m8 tmp/
undefined
foldseek easy-search design.pdb pdb100 similar_natural.m8 tmp/
undefined

Novelty check

新颖性检查

bash
undefined
bash
undefined

Ensure design is novel (low similarity to known)

确保设计具有新颖性(与已知结构相似度低)

foldseek easy-search design.pdb afdb50 novelty.m8 tmp/
foldseek easy-search design.pdb afdb50 novelty.m8 tmp/

Novel if: top hit identity < 30%

若顶部命中结果的一致性 < 30%,则视为新颖

undefined
undefined

Scaffold search

支架搜索

bash
undefined
bash
undefined

Find scaffolds for motif grafting

查找用于基序移植的支架

foldseek easy-search motif.pdb pdb100 scaffolds.m8 tmp/
--min-seq-id 0.0 -e 10

---
foldseek easy-search motif.pdb pdb100 scaffolds.m8 tmp/
--min-seq-id 0.0 -e 10

---

Verify

验证

bash
wc -l results.m8  # Number of hits

bash
wc -l results.m8  # 命中结果数量

Troubleshooting

故障排除

No hits: Lower e-value threshold, try larger database Too many hits: Increase min-seq-id threshold Slow search: Use smaller database
无命中结果:降低E值阈值,尝试更大的数据库 命中结果过多:提高最小序列一致性阈值 搜索缓慢:使用更小的数据库

Error interpretation

错误解读

ErrorCauseFix
Database not found
Wrong pathCheck database location
Invalid PDB
Malformed structureValidate PDB format
Out of memory
Large databaseUse more RAM or web server

Next: Download hits with
pdb
skill → use for scaffold design.
错误原因解决方法
Database not found
路径错误检查数据库位置
Invalid PDB
结构格式错误验证PDB格式
Out of memory
数据库过大使用更大内存或Web服务器

下一步:使用
pdb
技能下载命中结果 → 用于支架设计。