bio-blast
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBLAST Search
BLAST搜索
NCBI BLAST (Basic Local Alignment Search Tool) を BioPython で実行し、結果を JSON 形式で取得するスキルです。
这是一项使用BioPython执行NCBI BLAST(Basic Local Alignment Search Tool,基本局部比对搜索工具)并以JSON格式获取结果的技能。
Quick Start
快速开始
Install
安装
bash
uv pip install biopython typerbash
uv pip install biopython typerRun with FASTA
使用FASTA文件运行
bash
python scripts/run_blast_biopython.py --fasta path/to/query.fastabash
python scripts/run_blast_biopython.py --fasta path/to/query.fastaRun with raw sequence
使用原始序列运行
bash
python scripts/run_blast_biopython.py --sequence ATGCGATCG...bash
python scripts/run_blast_biopython.py --sequence ATGCGATCG...Restrict to organism (e.g., human)
限定物种(例如人类)
bash
python scripts/run_blast_biopython.py --fasta query.fasta --organism "Homo sapiens"bash
python scripts/run_blast_biopython.py --fasta query.fasta --organism "Homo sapiens"Protein BLAST
蛋白质BLAST
bash
python scripts/run_blast_biopython.py --program blastp --database swissprot --sequence MTEYKLVVVG...bash
python scripts/run_blast_biopython.py --program blastp --database swissprot --sequence MTEYKLVVVG...Save output
保存输出
bash
python scripts/run_blast_biopython.py --fasta query.fasta --output blast_results.jsonbash
python scripts/run_blast_biopython.py --fasta query.fasta --output blast_results.jsonOutput Format
输出格式
Results are returned in JSON format with the following structure:
json
{
"query": "No definition line",
"query_length": 99,
"database": "core_nt",
"num_hits": 10,
"hits": [
{
"rank": 1,
"accession": "NM_007294",
"title": "Homo sapiens BRCA1 DNA repair associated (BRCA1), mRNA",
"e_value": 4.35e-43,
"bit_score": 179.82,
"percent_identity": 100.0,
"identities": 99,
"align_length": 99,
"gaps": 0,
"query_start": 1,
"query_end": 99,
"subject_start": 1,
"subject_end": 99
}
]
}结果将以JSON格式返回,结构如下:
json
{
"query": "No definition line",
"query_length": 99,
"database": "core_nt",
"num_hits": 10,
"hits": [
{
"rank": 1,
"accession": "NM_007294",
"title": "Homo sapiens BRCA1 DNA repair associated (BRCA1), mRNA",
"e_value": 4.35e-43,
"bit_score": 179.82,
"percent_identity": 100.0,
"identities": 99,
"align_length": 99,
"gaps": 0,
"query_start": 1,
"query_end": 99,
"subject_start": 1,
"subject_end": 99
}
]
}Command-line Options
命令行选项
- : BLAST program (blastn, blastp, blastx, tblastn, tblastx). Default: blastn
--program - : BLAST database (nt, nr, refseq_rna, swissprot, etc.). Default: nt
--database - : Path to FASTA file (single sequence only)
--fasta - : Raw query sequence string
--sequence - : Restrict search to organism (e.g., "Homo sapiens")
--organism - : E-value threshold. Default: 0.001
--expect - : Maximum number of hits. Default: 10
--hitlist-size - : Output path for JSON results
--output
- : BLAST程序(blastn、blastp、blastx、tblastn、tblastx)。默认值:blastn
--program - : BLAST数据库(nt、nr、refseq_rna、swissprot等)。默认值:nt
--database - : FASTA文件路径(仅支持单条序列)
--fasta - : 原始查询序列字符串
--sequence - : 限定搜索的物种(例如"Homo sapiens")
--organism - : E值阈值。默认值:0.001
--expect - : 最大命中结果数。默认值:10
--hitlist-size - : JSON结果的输出路径
--output
Best Practices
最佳实践
- Save results - Don't re-run searches unnecessarily
- Set E-value threshold - Default 10 is too permissive; use 0.001-0.01
- Use gget for quick searches - Simpler API for single sequences
- Cache parsed data - Avoid re-parsing large XML files
- Handle rate limits - NCBI limits request frequency
- 保存结果 - 避免不必要地重复运行搜索
- 设置E值阈值 - 默认值10过于宽松,建议使用0.001-0.01
- 使用gget进行快速搜索 - 单条序列搜索的更简洁API
- 缓存解析后的数据 - 避免重复解析大型XML文件
- 处理速率限制 - NCBI对请求频率有限制
BLAST vs BLAT
BLAST与BLAT对比
| Aspect | BLAST | BLAT |
|---|---|---|
| Purpose | Similarity search | Genome mapping |
| Sensitivity | High | Medium |
| Speed | Medium | Very fast |
| Best for | Homolog search | Position finding |
| 方面 | BLAST | BLAT |
|---|---|---|
| 用途 | 相似性搜索 | 基因组定位 |
| 灵敏度 | 高 | 中 |
| 速度 | 中等 | 极快 |
| 适用场景 | 同源序列搜索 | 位置查找 |