foldseek
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFoldseek Structure Search
Foldseek 结构搜索
Prerequisites
前置要求
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.10 |
| RAM | 8GB | 16GB |
| Disk | 10GB | 50GB (for local databases) |
| 要求 | 最低配置 | 推荐配置 |
|---|---|---|
| Python | 3.8+ | 3.10 |
| 内存 | 8GB | 16GB |
| 磁盘 | 10GB | 50GB(用于本地数据库) |
How to run
运行方法
Note: Foldseek can run locally or via web server. No GPU required.
注意:Foldseek可在本地运行或通过Web服务器运行,无需GPU。
Option 1: Web Server (Quick; rate-limited, use sparingly)
选项1:Web服务器(快速;有调用频率限制,谨慎使用)
bash
undefinedbash
undefinedUpload structure to web server
Upload structure to web server
curl -X POST "https://search.foldseek.com/api/ticket"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
undefinedcurl -X POST "https://search.foldseek.com/api/ticket"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
-F "q=@query.pdb"
-F "database[]=afdb50"
-F "database[]=pdb100"
undefinedOption 2: Local installation
选项2:本地安装
bash
undefinedbash
undefinedInstall Foldseek
Install Foldseek
conda install -c conda-forge -c bioconda foldseek
conda install -c conda-forge -c bioconda foldseek
Search PDB
Search PDB
foldseek easy-search query.pdb /path/to/pdb100 results.m8 tmp/
foldseek easy-search query.pdb /path/to/pdb100 results.m8 tmp/
Search AlphaFold DB
Search AlphaFold DB
foldseek easy-search query.pdb /path/to/afdb50 results.m8 tmp/
undefinedfoldseek easy-search query.pdb /path/to/afdb50 results.m8 tmp/
undefinedOption 3: Python API
选项3:Python API
python
import subprocess
import pandas as pd
def foldseek_search(query_pdb, database, output="results.m8"):
"""Run Foldseek search."""
subprocess.run([
"foldseek", "easy-search",
query_pdb, database, output, "tmp/",
"--format-output", "query,target,pident,alnlen,evalue,bits"
])
return pd.read_csv(output, sep="\t",
names=["query", "target", "pident", "alnlen", "evalue", "bits"])python
import subprocess
import pandas as pd
def foldseek_search(query_pdb, database, output="results.m8"):
"""运行Foldseek搜索。"""
subprocess.run([
"foldseek", "easy-search",
query_pdb, database, output, "tmp/",
"--format-output", "query,target,pident,alnlen,evalue,bits"
])
return pd.read_csv(output, sep="\t",
names=["query", "target", "pident", "alnlen", "evalue", "bits"])Key parameters
关键参数
| Parameter | Default | Description |
|---|---|---|
| 0.0 | Minimum sequence identity |
| 0.001 | E-value threshold |
| 2 | 0=3Di, 1=TM, 2=3Di+AA |
| 300 | Max hits to pass through prefilter; reducing this affects sensitivity |
| 参数 | 默认值 | 说明 |
|---|---|---|
| 0.0 | 最小序列一致性 |
| 0.001 | E值阈值 |
| 2 | 0=3Di, 1=TM, 2=3Di+AA |
| 300 | 预过滤后保留的最大命中数;降低该值会影响灵敏度 |
Databases
数据库
| Database | Description | Size |
|---|---|---|
| PDB clustered at 100% | ~200K structures |
| AlphaFold DB at 50% | ~67M structures |
| SwissProt structures | ~500K structures |
| CATH domains | ~50K domains |
| 数据库 | 说明 | 大小 |
|---|---|---|
| 按100%一致性聚类的PDB库 | ~200K个结构 |
| 按50%一致性聚类的AlphaFold DB库 | ~67M个结构 |
| SwissProt结构库 | ~500K个结构 |
| CATH结构域库 | ~50K个结构域 |
Output format
输出格式
undefinedundefinedresults.m8 (tabular)
results.m8 (tabular)
query target pident alnlen evalue bits
query 1abc_A 85.2 120 1e-45 180.5
query 2def_B 72.1 115 1e-32 145.2
undefinedquery target pident alnlen evalue bits
query 1abc_A 85.2 120 1e-45 180.5
query 2def_B 72.1 115 1e-32 145.2
undefinedSample output
示例输出
Successful run
运行成功示例
$ foldseek easy-search query.pdb pdb100 results.m8 tmp/
[INFO] Loading database: pdb100 (194,527 entries)
[INFO] Searching...
[INFO] Found 127 hits
Top 5 hits:
1. 1abc_A - 85.2% identity, E=1e-45
2. 2def_B - 72.1% identity, E=1e-32
3. 3ghi_C - 68.5% identity, E=1e-28
4. 4jkl_A - 55.3% identity, E=1e-18
5. 5mno_B - 42.1% identity, E=1e-10$ foldseek easy-search query.pdb pdb100 results.m8 tmp/
[INFO] Loading database: pdb100 (194,527 entries)
[INFO] Searching...
[INFO] Found 127 hits
前5个命中结果:
1. 1abc_A - 85.2%一致性, E=1e-45
2. 2def_B - 72.1%一致性, E=1e-32
3. 3ghi_C - 68.5%一致性, E=1e-28
4. 4jkl_A - 55.3%一致性, E=1e-18
5. 5mno_B - 42.1%一致性, E=1e-10Decision tree
决策树
Should I use Foldseek?
│
├─ What are you searching?
│ ├─ By 3D structure → Foldseek ✓
│ ├─ By sequence → Use BLAST (uniprot skill)
│ └─ Both → Run both, compare results
│
└─ What do you need?
├─ Find structural homologs → Foldseek ✓
├─ Remote homolog detection → Foldseek ✓
├─ Structural clustering → Foldseek ✓
└─ Functional annotation → Cross-reference with UniProt是否应该使用Foldseek?
│
├─ 你要搜索什么?
│ ├─ 通过3D结构 → Foldseek ✓
│ ├─ 通过序列 → 使用BLAST (uniprot skill)
│ └─ 两者都要 → 同时运行两者,对比结果
│
└─ 你需要实现什么目标?
├─ 查找结构同源物 → Foldseek ✓
├─ 远源同源物检测 → Foldseek ✓
├─ 结构聚类 → Foldseek ✓
└─ 功能注释 → 与UniProt交叉参考Common use cases
常见使用场景
Find similar designs
查找相似设计
bash
undefinedbash
undefinedCompare your design to PDB
将你的设计与PDB库对比
foldseek easy-search design.pdb pdb100 similar_natural.m8 tmp/
undefinedfoldseek easy-search design.pdb pdb100 similar_natural.m8 tmp/
undefinedNovelty check
新颖性检查
bash
undefinedbash
undefinedEnsure design is novel (low similarity to known)
确保设计具有新颖性(与已知结构相似度低)
foldseek easy-search design.pdb afdb50 novelty.m8 tmp/
foldseek easy-search design.pdb afdb50 novelty.m8 tmp/
Novel if: top hit identity < 30%
若顶部命中结果的一致性 < 30%,则视为新颖
undefinedundefinedScaffold search
支架搜索
bash
undefinedbash
undefinedFind scaffolds for motif grafting
查找用于基序移植的支架
foldseek easy-search motif.pdb pdb100 scaffolds.m8 tmp/
--min-seq-id 0.0 -e 10
--min-seq-id 0.0 -e 10
---foldseek easy-search motif.pdb pdb100 scaffolds.m8 tmp/
--min-seq-id 0.0 -e 10
--min-seq-id 0.0 -e 10
---Verify
验证
bash
wc -l results.m8 # Number of hitsbash
wc -l results.m8 # 命中结果数量Troubleshooting
故障排除
No hits: Lower e-value threshold, try larger database
Too many hits: Increase min-seq-id threshold
Slow search: Use smaller database
无命中结果:降低E值阈值,尝试更大的数据库
命中结果过多:提高最小序列一致性阈值
搜索缓慢:使用更小的数据库
Error interpretation
错误解读
| Error | Cause | Fix |
|---|---|---|
| Wrong path | Check database location |
| Malformed structure | Validate PDB format |
| Large database | Use more RAM or web server |
Next: Download hits with skill → use for scaffold design.
pdb| 错误 | 原因 | 解决方法 |
|---|---|---|
| 路径错误 | 检查数据库位置 |
| 结构格式错误 | 验证PDB格式 |
| 数据库过大 | 使用更大内存或Web服务器 |
下一步:使用技能下载命中结果 → 用于支架设计。
pdb