bio-blast

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

BLAST Search

BLAST搜索

NCBI BLAST (Basic Local Alignment Search Tool) を BioPython で実行し、結果を JSON 形式で取得するスキルです。
这是一项使用BioPython执行NCBI BLAST(Basic Local Alignment Search Tool,基本局部比对搜索工具)并以JSON格式获取结果的技能。

Quick Start

快速开始

Install

安装

bash
uv pip install biopython typer
bash
uv pip install biopython typer

Run with FASTA

使用FASTA文件运行

bash
python scripts/run_blast_biopython.py --fasta path/to/query.fasta
bash
python scripts/run_blast_biopython.py --fasta path/to/query.fasta

Run with raw sequence

使用原始序列运行

bash
python scripts/run_blast_biopython.py --sequence ATGCGATCG...
bash
python scripts/run_blast_biopython.py --sequence ATGCGATCG...

Restrict to organism (e.g., human)

限定物种(例如人类)

bash
python scripts/run_blast_biopython.py --fasta query.fasta --organism "Homo sapiens"
bash
python scripts/run_blast_biopython.py --fasta query.fasta --organism "Homo sapiens"

Protein BLAST

蛋白质BLAST

bash
python scripts/run_blast_biopython.py --program blastp --database swissprot --sequence MTEYKLVVVG...
bash
python scripts/run_blast_biopython.py --program blastp --database swissprot --sequence MTEYKLVVVG...

Save output

保存输出

bash
python scripts/run_blast_biopython.py --fasta query.fasta --output blast_results.json
bash
python scripts/run_blast_biopython.py --fasta query.fasta --output blast_results.json

Output Format

输出格式

Results are returned in JSON format with the following structure:
json
{
  "query": "No definition line",
  "query_length": 99,
  "database": "core_nt",
  "num_hits": 10,
  "hits": [
    {
      "rank": 1,
      "accession": "NM_007294",
      "title": "Homo sapiens BRCA1 DNA repair associated (BRCA1), mRNA",
      "e_value": 4.35e-43,
      "bit_score": 179.82,
      "percent_identity": 100.0,
      "identities": 99,
      "align_length": 99,
      "gaps": 0,
      "query_start": 1,
      "query_end": 99,
      "subject_start": 1,
      "subject_end": 99
    }
  ]
}
结果将以JSON格式返回,结构如下:
json
{
  "query": "No definition line",
  "query_length": 99,
  "database": "core_nt",
  "num_hits": 10,
  "hits": [
    {
      "rank": 1,
      "accession": "NM_007294",
      "title": "Homo sapiens BRCA1 DNA repair associated (BRCA1), mRNA",
      "e_value": 4.35e-43,
      "bit_score": 179.82,
      "percent_identity": 100.0,
      "identities": 99,
      "align_length": 99,
      "gaps": 0,
      "query_start": 1,
      "query_end": 99,
      "subject_start": 1,
      "subject_end": 99
    }
  ]
}

Command-line Options

命令行选项

  • --program
    : BLAST program (blastn, blastp, blastx, tblastn, tblastx). Default: blastn
  • --database
    : BLAST database (nt, nr, refseq_rna, swissprot, etc.). Default: nt
  • --fasta
    : Path to FASTA file (single sequence only)
  • --sequence
    : Raw query sequence string
  • --organism
    : Restrict search to organism (e.g., "Homo sapiens")
  • --expect
    : E-value threshold. Default: 0.001
  • --hitlist-size
    : Maximum number of hits. Default: 10
  • --output
    : Output path for JSON results
  • --program
    : BLAST程序(blastn、blastp、blastx、tblastn、tblastx)。默认值:blastn
  • --database
    : BLAST数据库(nt、nr、refseq_rna、swissprot等)。默认值:nt
  • --fasta
    : FASTA文件路径(仅支持单条序列)
  • --sequence
    : 原始查询序列字符串
  • --organism
    : 限定搜索的物种(例如"Homo sapiens")
  • --expect
    : E值阈值。默认值:0.001
  • --hitlist-size
    : 最大命中结果数。默认值:10
  • --output
    : JSON结果的输出路径

Best Practices

最佳实践

  1. Save results - Don't re-run searches unnecessarily
  2. Set E-value threshold - Default 10 is too permissive; use 0.001-0.01
  3. Use gget for quick searches - Simpler API for single sequences
  4. Cache parsed data - Avoid re-parsing large XML files
  5. Handle rate limits - NCBI limits request frequency
  1. 保存结果 - 避免不必要地重复运行搜索
  2. 设置E值阈值 - 默认值10过于宽松,建议使用0.001-0.01
  3. 使用gget进行快速搜索 - 单条序列搜索的更简洁API
  4. 缓存解析后的数据 - 避免重复解析大型XML文件
  5. 处理速率限制 - NCBI对请求频率有限制

BLAST vs BLAT

BLAST与BLAT对比

AspectBLASTBLAT
PurposeSimilarity searchGenome mapping
SensitivityHighMedium
SpeedMediumVery fast
Best forHomolog searchPosition finding
方面BLASTBLAT
用途相似性搜索基因组定位
灵敏度
速度中等极快
适用场景同源序列搜索位置查找