string-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSTRING Database
STRING数据库
Overview
概述
STRING is a comprehensive database of known and predicted protein-protein interactions covering 59M proteins and 20B+ interactions across 5000+ organisms. Query interaction networks, perform functional enrichment, discover partners via REST API for systems biology and pathway analysis.
STRING是一个涵盖已知和预测蛋白质-蛋白质相互作用的综合数据库,覆盖5000+种生物的5900万种蛋白质和200亿+条相互作用关系。可通过REST API查询相互作用网络、执行功能富集分析、发现相互作用伙伴,用于系统生物学和通路分析。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Retrieving protein-protein interaction networks for single or multiple proteins
- Performing functional enrichment analysis (GO, KEGG, Pfam) on protein lists
- Discovering interaction partners and expanding protein networks
- Testing if proteins form significantly enriched functional modules
- Generating network visualizations with evidence-based coloring
- Analyzing homology and protein family relationships
- Conducting cross-species protein interaction comparisons
- Identifying hub proteins and network connectivity patterns
在以下场景中应使用本技能:
- 检索单个或多个蛋白质的蛋白质-蛋白质相互作用网络
- 对蛋白质列表执行功能富集分析(GO、KEGG、Pfam)
- 发现相互作用伙伴并扩展蛋白质网络
- 测试蛋白质是否形成显著富集的功能模块
- 生成带有证据着色的网络可视化图
- 分析同源性和蛋白质家族关系
- 进行跨物种蛋白质相互作用比较
- 识别枢纽蛋白质和网络连接模式
Quick Start
快速开始
The skill provides:
- Python helper functions () for all STRING REST API operations
scripts/string_api.py - Comprehensive reference documentation () with detailed API specifications
references/string_reference.md
When users request STRING data, determine which operation is needed and use the appropriate function from .
scripts/string_api.py本技能提供:
- 用于所有STRING REST API操作的Python辅助函数()
scripts/string_api.py - 包含详细API规范的综合参考文档()
references/string_reference.md
当用户请求STRING数据时,确定所需操作并使用中的对应函数。
scripts/string_api.pyCore Operations
核心操作
1. Identifier Mapping (string_map_ids
)
string_map_ids1. 标识符映射(string_map_ids
)
string_map_idsConvert gene names, protein names, and external IDs to STRING identifiers.
When to use: Starting any STRING analysis, validating protein names, finding canonical identifiers.
Usage:
python
from scripts.string_api import string_map_ids将基因名称、蛋白质名称和外部ID转换为STRING标识符。
适用场景:启动任何STRING分析、验证蛋白质名称、查找标准标识符。
使用示例:
python
from scripts.string_api import string_map_idsMap single protein
映射单个蛋白质
result = string_map_ids('TP53', species=9606)
result = string_map_ids('TP53', species=9606)
Map multiple proteins
映射多个蛋白质
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
Map with multiple matches per query
为每个查询返回多个匹配结果
result = string_map_ids('p53', species=9606, limit=5)
**Parameters**:
- `species`: NCBI taxon ID (9606 = human, 10090 = mouse, 7227 = fly)
- `limit`: Number of matches per identifier (default: 1)
- `echo_query`: Include query term in output (default: 1)
**Best practice**: Always map identifiers first for faster subsequent queries.result = string_map_ids('p53', species=9606, limit=5)
**参数**:
- `species`:NCBI分类ID(9606 = 人类,10090 = 小鼠,7227 = 果蝇)
- `limit`:每个标识符的匹配结果数量(默认值:1)
- `echo_query`:在输出中包含查询术语(默认值:1)
**最佳实践**:在后续查询前始终先映射标识符,以提高查询速度。2. Network Retrieval (string_network
)
string_network2. 网络检索(string_network
)
string_networkGet protein-protein interaction network data in tabular format.
When to use: Building interaction networks, analyzing connectivity, retrieving interaction evidence.
Usage:
python
from scripts.string_api import string_network以表格格式获取蛋白质-蛋白质相互作用网络数据。
适用场景:构建相互作用网络、分析连接性、检索相互作用证据。
使用示例:
python
from scripts.string_api import string_networkGet network for single protein
获取单个蛋白质的网络
network = string_network('9606.ENSP00000269305', species=9606)
network = string_network('9606.ENSP00000269305', species=9606)
Get network with multiple proteins
获取多个蛋白质的网络
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493']
network = string_network(proteins, required_score=700)
Expand network with additional interactors
添加额外的相互作用蛋白以扩展网络
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)
Physical interactions only
仅获取物理相互作用
network = string_network('TP53', species=9606, network_type='physical')
**Parameters**:
- `required_score`: Confidence threshold (0-1000)
- 150: low confidence (exploratory)
- 400: medium confidence (default, standard analysis)
- 700: high confidence (conservative)
- 900: highest confidence (very stringent)
- `network_type`: `'functional'` (all evidence, default) or `'physical'` (direct binding only)
- `add_nodes`: Add N most connected proteins (0-10)
**Output columns**: Interaction pairs, confidence scores, and individual evidence scores (neighborhood, fusion, coexpression, experimental, database, text-mining).network = string_network('TP53', species=9606, network_type='physical')
**参数**:
- `required_score`:置信度阈值(0-1000)
- 150:低置信度(探索性分析)
- 400:中等置信度(默认值,标准分析)
- 700:高置信度(保守性分析)
- 900:最高置信度(非常严格)
- `network_type`:`'functional'`(所有证据类型,默认值)或`'physical'`(仅直接结合)
- `add_nodes`:添加N个连接最紧密的蛋白质(0-10)
**输出列**:相互作用对、置信度得分、各证据类型得分(邻域、融合、共表达、实验、数据库、文本挖掘)。3. Network Visualization (string_network_image
)
string_network_image3. 网络可视化(string_network_image
)
string_network_imageGenerate network visualization as PNG image.
When to use: Creating figures, visual exploration, presentations.
Usage:
python
from scripts.string_api import string_network_image生成PNG格式的网络可视化图。
适用场景:创建图表、可视化探索、演示汇报。
使用示例:
python
from scripts.string_api import string_network_imageGet network image
获取网络图片
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
img_data = string_network_image(proteins, species=9606, required_score=700)
Save image
保存图片
with open('network.png', 'wb') as f:
f.write(img_data)
with open('network.png', 'wb') as f:
f.write(img_data)
Evidence-colored network
带证据着色的网络
img = string_network_image(proteins, species=9606, network_flavor='evidence')
img = string_network_image(proteins, species=9606, network_flavor='evidence')
Confidence-based visualization
基于置信度的可视化
img = string_network_image(proteins, species=9606, network_flavor='confidence')
img = string_network_image(proteins, species=9606, network_flavor='confidence')
Actions network (activation/inhibition)
作用网络(激活/抑制)
img = string_network_image(proteins, species=9606, network_flavor='actions')
**Network flavors**:
- `'evidence'`: Colored lines show evidence types (default)
- `'confidence'`: Line thickness represents confidence
- `'actions'`: Shows activating/inhibiting relationshipsimg = string_network_image(proteins, species=9606, network_flavor='actions')
**网络风格**:
- `'evidence'`:彩色线条表示证据类型(默认值)
- `'confidence'`:线条粗细代表置信度
- `'actions'`:显示激活/抑制关系4. Interaction Partners (string_interaction_partners
)
string_interaction_partners4. 相互作用伙伴(string_interaction_partners
)
string_interaction_partnersFind all proteins that interact with given protein(s).
When to use: Discovering novel interactions, finding hub proteins, expanding networks.
Usage:
python
from scripts.string_api import string_interaction_partners查找与给定蛋白质相互作用的所有蛋白质。
适用场景:发现新型相互作用、查找枢纽蛋白质、扩展网络。
使用示例:
python
from scripts.string_api import string_interaction_partnersGet top 10 interactors of TP53
获取TP53的前10个相互作用伙伴
partners = string_interaction_partners('TP53', species=9606, limit=10)
partners = string_interaction_partners('TP53', species=9606, limit=10)
Get high-confidence interactors
获取高置信度的相互作用伙伴
partners = string_interaction_partners('TP53', species=9606,
limit=20, required_score=700)
partners = string_interaction_partners('TP53', species=9606,
limit=20, required_score=700)
Find interactors for multiple proteins
查找多个蛋白质的相互作用伙伴
partners = string_interaction_partners(['TP53', 'MDM2'],
species=9606, limit=15)
**Parameters**:
- `limit`: Maximum number of partners to return (default: 10)
- `required_score`: Confidence threshold (0-1000)
**Use cases**:
- Hub protein identification
- Network expansion from seed proteins
- Discovering indirect connectionspartners = string_interaction_partners(['TP53', 'MDM2'],
species=9606, limit=15)
**参数**:
- `limit`:返回的最大伙伴数量(默认值:10)
- `required_score`:置信度阈值(0-1000)
**应用场景**:
- 枢纽蛋白质识别
- 从种子蛋白质扩展网络
- 发现间接连接5. Functional Enrichment (string_enrichment
)
string_enrichment5. 功能富集分析(string_enrichment
)
string_enrichmentPerform enrichment analysis across Gene Ontology, KEGG pathways, Pfam domains, and more.
When to use: Interpreting protein lists, pathway analysis, functional characterization, understanding biological processes.
Usage:
python
from scripts.string_enrichment import string_enrichment对基因本体(Gene Ontology)、KEGG通路、Pfam结构域等执行富集分析。
适用场景:解释蛋白质列表、通路分析、功能表征、理解生物学过程。
使用示例:
python
from scripts.string_enrichment import string_enrichmentEnrichment for a protein list
对蛋白质列表进行富集分析
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73']
enrichment = string_enrichment(proteins, species=9606)
Parse results to find significant terms
解析结果以找到显著术语
import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
**Enrichment categories**:
- **Gene Ontology**: Biological Process, Molecular Function, Cellular Component
- **KEGG Pathways**: Metabolic and signaling pathways
- **Pfam**: Protein domains
- **InterPro**: Protein families and domains
- **SMART**: Domain architecture
- **UniProt Keywords**: Curated functional keywords
**Output columns**:
- `category`: Annotation database (e.g., "KEGG Pathways", "GO Biological Process")
- `term`: Term identifier
- `description`: Human-readable term description
- `number_of_genes`: Input proteins with this annotation
- `p_value`: Uncorrected enrichment p-value
- `fdr`: False discovery rate (corrected p-value)
**Statistical method**: Fisher's exact test with Benjamini-Hochberg FDR correction.
**Interpretation**: FDR < 0.05 indicates statistically significant enrichment.import pandas as pd
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
**富集类别**:
- **基因本体(Gene Ontology)**:生物过程、分子功能、细胞组分
- **KEGG通路**:代谢和信号通路
- **Pfam**:蛋白质结构域
- **InterPro**:蛋白质家族和结构域
- **SMART**:结构域架构
- **UniProt关键词**:人工整理的功能关键词
**输出列**:
- `category`:注释数据库(例如"KEGG Pathways"、"GO Biological Process")
- `term`:术语标识符
- `description`:人类可读的术语描述
- `number_of_genes`:带有该注释的输入蛋白质数量
- `p_value`:未校正的富集p值
- `fdr`:错误发现率(校正后的p值)
**统计方法**:采用Benjamini-Hochberg FDR校正的Fisher精确检验。
**解读**:FDR < 0.05表示具有统计学意义的富集。6. PPI Enrichment (string_ppi_enrichment
)
string_ppi_enrichment6. PPI富集分析(string_ppi_enrichment
)
string_ppi_enrichmentTest if a protein network has significantly more interactions than expected by chance.
When to use: Validating if proteins form functional module, testing network connectivity.
Usage:
python
from scripts.string_api import string_ppi_enrichment
import json测试蛋白质网络的相互作用数量是否显著高于随机预期。
适用场景:验证蛋白质是否形成功能模块、测试网络连接性。
使用示例:
python
from scripts.string_api import string_ppi_enrichment
import jsonTest network connectivity
测试网络连接性
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1']
result = string_ppi_enrichment(proteins, species=9606, required_score=400)
Parse JSON result
解析JSON结果
data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")
**Output fields**:
- `number_of_nodes`: Proteins in network
- `number_of_edges`: Observed interactions
- `expected_number_of_edges`: Expected in random network
- `p_value`: Statistical significance
**Interpretation**:
- p-value < 0.05: Network is significantly enriched (proteins likely form functional module)
- p-value ≥ 0.05: No significant enrichment (proteins may be unrelated)data = json.loads(result)
print(f"观测到的边数: {data['number_of_edges']}")
print(f"预期边数: {data['expected_number_of_edges']}")
print(f"P值: {data['p_value']}")
**输出字段**:
- `number_of_nodes`:网络中的蛋白质数量
- `number_of_edges`:观测到的相互作用数量
- `expected_number_of_edges`:随机网络中的预期边数
- `p_value`:统计学显著性
**解读**:
- p值 < 0.05:网络显著富集(蛋白质可能形成功能模块)
- p值 ≥ 0.05:无显著富集(蛋白质可能不相关)7. Homology Scores (string_homology
)
string_homology7. 同源性得分(string_homology
)
string_homologyRetrieve protein similarity and homology information.
When to use: Identifying protein families, paralog analysis, cross-species comparisons.
Usage:
python
from scripts.string_api import string_homology检索蛋白质相似性和同源性信息。
适用场景:识别蛋白质家族、旁系同源物分析、跨物种比较。
使用示例:
python
from scripts.string_api import string_homologyGet homology between proteins
获取蛋白质间的同源性
proteins = ['TP53', 'TP63', 'TP73'] # p53 family
homology = string_homology(proteins, species=9606)
**Use cases**:
- Protein family identification
- Paralog discovery
- Evolutionary analysisproteins = ['TP53', 'TP63', 'TP73'] # p53家族
homology = string_homology(proteins, species=9606)
**应用场景**:
- 蛋白质家族识别
- 旁系同源物发现
- 进化分析8. Version Information (string_version
)
string_version8. 版本信息(string_version
)
string_versionGet current STRING database version.
When to use: Ensuring reproducibility, documenting methods.
Usage:
python
from scripts.string_api import string_version
version = string_version()
print(f"STRING version: {version}")获取当前STRING数据库版本。
适用场景:确保可重复性、记录方法。
使用示例:
python
from scripts.string_api import string_version
version = string_version()
print(f"STRING版本: {version}")Common Analysis Workflows
常见分析工作流
Workflow 1: Protein List Analysis (Standard Workflow)
工作流1:蛋白质列表分析(标准工作流)
Use case: Analyze a list of proteins from experiment (e.g., differential expression, proteomics).
python
from scripts.string_api import (string_map_ids, string_network,
string_enrichment, string_ppi_enrichment,
string_network_image)应用场景:分析实验得到的蛋白质列表(例如差异表达、蛋白质组学)。
python
from scripts.string_api import (string_map_ids, string_network,
string_enrichment, string_ppi_enrichment,
string_network_image)Step 1: Map gene names to STRING IDs
步骤1:将基因名称映射为STRING ID
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2']
mapping = string_map_ids(gene_list, species=9606)
Step 2: Get interaction network
步骤2:获取相互作用网络
network = string_network(gene_list, species=9606, required_score=400)
network = string_network(gene_list, species=9606, required_score=400)
Step 3: Test if network is enriched
步骤3:测试网络是否富集
ppi_result = string_ppi_enrichment(gene_list, species=9606)
ppi_result = string_ppi_enrichment(gene_list, species=9606)
Step 4: Perform functional enrichment
步骤4:执行功能富集分析
enrichment = string_enrichment(gene_list, species=9606)
enrichment = string_enrichment(gene_list, species=9606)
Step 5: Generate network visualization
步骤5:生成网络可视化图
img = string_network_image(gene_list, species=9606,
network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
f.write(img)
img = string_network_image(gene_list, species=9606,
network_flavor='evidence', required_score=400)
with open('protein_network.png', 'wb') as f:
f.write(img)
Step 6: Parse and interpret results
步骤6:解析并解读结果
undefinedundefinedWorkflow 2: Single Protein Investigation
工作流2:单个蛋白质研究
Use case: Deep dive into one protein's interactions and partners.
python
from scripts.string_api import (string_map_ids, string_interaction_partners,
string_network_image)应用场景:深入研究单个蛋白质的相互作用和伙伴。
python
from scripts.string_api import (string_map_ids, string_interaction_partners,
string_network_image)Step 1: Map protein name
步骤1:映射蛋白质名称
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)
protein = 'TP53'
mapping = string_map_ids(protein, species=9606)
Step 2: Get all interaction partners
步骤2:获取所有相互作用伙伴
partners = string_interaction_partners(protein, species=9606,
limit=20, required_score=700)
partners = string_interaction_partners(protein, species=9606,
limit=20, required_score=700)
Step 3: Visualize expanded network
步骤3:可视化扩展后的网络
img = string_network_image(protein, species=9606, add_nodes=15,
network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
f.write(img)
undefinedimg = string_network_image(protein, species=9606, add_nodes=15,
network_flavor='confidence', required_score=700)
with open('tp53_network.png', 'wb') as f:
f.write(img)
undefinedWorkflow 3: Pathway-Centric Analysis
工作流3:通路中心分析
Use case: Identify and visualize proteins in a specific biological pathway.
python
from scripts.string_api import string_enrichment, string_network应用场景:识别并可视化特定生物学通路中的蛋白质。
python
from scripts.string_api import string_enrichment, string_networkStep 1: Start with known pathway proteins
步骤1:从已知通路蛋白质开始
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2',
'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']
Step 2: Get network
步骤2:获取网络
network = string_network(dna_repair_proteins, species=9606,
required_score=700, add_nodes=5)
network = string_network(dna_repair_proteins, species=9606,
required_score=700, add_nodes=5)
Step 3: Enrichment to confirm pathway annotation
步骤3:富集分析以确认通路注释
enrichment = string_enrichment(dna_repair_proteins, species=9606)
enrichment = string_enrichment(dna_repair_proteins, species=9606)
Step 4: Parse enrichment for DNA repair pathways
步骤4:解析富集结果以获取DNA修复通路
import pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
undefinedimport pandas as pd
import io
df = pd.read_csv(io.StringIO(enrichment), sep='\t')
dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
undefinedWorkflow 4: Cross-Species Analysis
工作流4:跨物种分析
Use case: Compare protein interactions across different organisms.
python
from scripts.string_api import string_network应用场景:比较不同生物间的蛋白质相互作用。
python
from scripts.string_api import string_networkHuman network
人类网络
human_network = string_network('TP53', species=9606, required_score=700)
human_network = string_network('TP53', species=9606, required_score=700)
Mouse network
小鼠网络
mouse_network = string_network('Trp53', species=10090, required_score=700)
mouse_network = string_network('Trp53', species=10090, required_score=700)
Yeast network (if ortholog exists)
酵母网络(如果存在同源物)
yeast_network = string_network('gene_name', species=4932, required_score=700)
undefinedyeast_network = string_network('gene_name', species=4932, required_score=700)
undefinedWorkflow 5: Network Expansion and Discovery
工作流5:网络扩展与发现
Use case: Start with seed proteins and discover connected functional modules.
python
from scripts.string_api import (string_interaction_partners, string_network,
string_enrichment)应用场景:从种子蛋白质开始,发现相连的功能模块。
python
from scripts.string_api import (string_interaction_partners, string_network,
string_enrichment)Step 1: Start with seed protein(s)
步骤1:从种子蛋白质开始
seed_proteins = ['TP53']
seed_proteins = ['TP53']
Step 2: Get first-degree interactors
步骤2:获取一级相互作用伙伴
partners = string_interaction_partners(seed_proteins, species=9606,
limit=30, required_score=700)
partners = string_interaction_partners(seed_proteins, species=9606,
limit=30, required_score=700)
Step 3: Parse partners to get protein list
步骤3:解析伙伴以获取蛋白质列表
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
df['preferredName_B'].tolist()))
import pandas as pd
import io
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() +
df['preferredName_B'].tolist()))
Step 4: Perform enrichment on expanded network
步骤4:对扩展后的网络执行富集分析
enrichment = string_enrichment(all_proteins[:50], species=9606)
enrichment = string_enrichment(all_proteins[:50], species=9606)
Step 5: Filter for interesting functional modules
步骤5:筛选有趣的功能模块
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]
undefinedenrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrichment_df[enrichment_df['fdr'] < 0.001]
undefinedCommon Species
常见物种
When specifying species, use NCBI taxon IDs:
| Organism | Common Name | Taxon ID |
|---|---|---|
| Homo sapiens | Human | 9606 |
| Mus musculus | Mouse | 10090 |
| Rattus norvegicus | Rat | 10116 |
| Drosophila melanogaster | Fruit fly | 7227 |
| Caenorhabditis elegans | C. elegans | 6239 |
| Saccharomyces cerevisiae | Yeast | 4932 |
| Arabidopsis thaliana | Thale cress | 3702 |
| Escherichia coli | E. coli | 511145 |
| Danio rerio | Zebrafish | 7955 |
Full list available at: https://string-db.org/cgi/input?input_page_active_form=organisms
指定物种时,请使用NCBI分类ID:
| 生物 | 通用名称 | 分类ID |
|---|---|---|
| Homo sapiens | 人类 | 9606 |
| Mus musculus | 小鼠 | 10090 |
| Rattus norvegicus | 大鼠 | 10116 |
| Drosophila melanogaster | 果蝇 | 7227 |
| Caenorhabditis elegans | 秀丽隐杆线虫 | 6239 |
| Saccharomyces cerevisiae | 酵母 | 4932 |
| Arabidopsis thaliana | 拟南芥 | 3702 |
| Escherichia coli | 大肠杆菌 | 511145 |
| Danio rerio | 斑马鱼 | 7955 |
Understanding Confidence Scores
置信度得分说明
STRING provides combined confidence scores (0-1000) integrating multiple evidence types:
STRING提供整合多种证据类型的综合置信度得分(0-1000):
Evidence Channels
证据渠道
- Neighborhood (nscore): Conserved genomic neighborhood across species
- Fusion (fscore): Gene fusion events
- Phylogenetic Profile (pscore): Co-occurrence patterns across species
- Coexpression (ascore): Correlated RNA expression
- Experimental (escore): Biochemical and genetic experiments
- Database (dscore): Curated pathway and complex databases
- Text-mining (tscore): Literature co-occurrence and NLP extraction
- 邻域(nscore):跨物种保守的基因组邻域
- 融合(fscore):基因融合事件
- 系统发育谱(pscore):跨物种共现模式
- 共表达(ascore):相关的RNA表达
- 实验(escore):生化和遗传实验
- 数据库(dscore):人工整理的通路和复合物数据库
- 文本挖掘(tscore):文献共现和NLP提取
Recommended Thresholds
推荐阈值
Choose threshold based on analysis goals:
- 150 (low confidence): Exploratory analysis, hypothesis generation
- 400 (medium confidence): Standard analysis, balanced sensitivity/specificity
- 700 (high confidence): Conservative analysis, high-confidence interactions
- 900 (highest confidence): Very stringent, experimental evidence preferred
Trade-offs:
- Lower thresholds: More interactions (higher recall, more false positives)
- Higher thresholds: Fewer interactions (higher precision, more false negatives)
根据分析目标选择阈值:
- 150(低置信度):探索性分析、假设生成
- 400(中等置信度):标准分析,平衡灵敏度和特异性
- 700(高置信度):保守性分析,高置信度相互作用
- 900(最高置信度):非常严格,优先选择实验证据
权衡:
- 较低阈值:更多相互作用(召回率更高,假阳性更多)
- 较高阈值:更少相互作用(精确率更高,假阴性更多)
Network Types
网络类型
Functional Networks (Default)
功能网络(默认)
Includes all evidence types (experimental, computational, text-mining). Represents proteins that are functionally associated, even without direct physical binding.
When to use:
- Pathway analysis
- Functional enrichment studies
- Systems biology
- Most general analyses
包含所有证据类型(实验、计算、文本挖掘)。代表功能相关的蛋白质,即使没有直接物理结合。
适用场景:
- 通路分析
- 功能富集研究
- 系统生物学
- 大多数常规分析
Physical Networks
物理网络
Only includes evidence for direct physical binding (experimental data and database annotations for physical interactions).
When to use:
- Structural biology studies
- Protein complex analysis
- Direct binding validation
- When physical contact is required
仅包含直接物理结合的证据(实验数据和数据库注释的物理相互作用)。
适用场景:
- 结构生物学研究
- 蛋白质复合物分析
- 直接结合验证
- 需要物理接触的场景
API Best Practices
API最佳实践
- Always map identifiers first: Use before other operations for faster queries
string_map_ids() - Use STRING IDs when possible: Use format instead of gene names
9606.ENSP00000269305 - Specify species for networks >10 proteins: Required for accurate results
- Respect rate limits: Wait 1 second between API calls
- Use versioned URLs for reproducibility: Available in reference documentation
- Handle errors gracefully: Check for "Error:" prefix in returned strings
- Choose appropriate confidence thresholds: Match threshold to analysis goals
- 始终先映射标识符:在执行其他操作前使用,以提高查询速度
string_map_ids() - 尽可能使用STRING ID:使用格式而非基因名称
9606.ENSP00000269305 - 蛋白质数量>10时指定物种:为了结果准确必须指定
- 遵守速率限制:API调用间隔至少1秒
- 使用带版本的URL以确保可重复性:参考文档中提供
- 优雅处理错误:检查返回字符串中的"Error:"前缀
- 选择合适的置信度阈值:匹配阈值与分析目标
Detailed Reference
详细参考
For comprehensive API documentation, complete parameter lists, output formats, and advanced usage, refer to . This includes:
references/string_reference.md- Complete API endpoint specifications
- All supported output formats (TSV, JSON, XML, PSI-MI)
- Advanced features (bulk upload, values/ranks enrichment)
- Error handling and troubleshooting
- Integration with other tools (Cytoscape, R, Python libraries)
- Data license and citation information
如需完整的API文档、参数列表、输出格式和高级用法,请参考。其中包括:
references/string_reference.md- 完整的API端点规范
- 所有支持的输出格式(TSV、JSON、XML、PSI-MI)
- 高级功能(批量上传、值/排名富集)
- 错误处理和故障排除
- 与其他工具的集成(Cytoscape、R、Python库)
- 数据许可和引用信息
Troubleshooting
故障排除
No proteins found:
- Verify species parameter matches identifiers
- Try mapping identifiers first with
string_map_ids() - Check for typos in protein names
Empty network results:
- Lower confidence threshold ()
required_score - Check if proteins actually interact
- Verify species is correct
Timeout or slow queries:
- Reduce number of input proteins
- Use STRING IDs instead of gene names
- Split large queries into batches
"Species required" error:
- Add parameter for networks with >10 proteins
species - Always include species for consistency
Results look unexpected:
- Check STRING version with
string_version() - Verify network_type is appropriate (functional vs physical)
- Review confidence threshold selection
未找到蛋白质:
- 验证物种参数与标识符匹配
- 先尝试使用映射标识符
string_map_ids() - 检查蛋白质名称是否有拼写错误
网络结果为空:
- 降低置信度阈值()
required_score - 检查蛋白质是否真的存在相互作用
- 验证物种是否正确
超时或查询缓慢:
- 减少输入蛋白质的数量
- 使用STRING ID而非基因名称
- 将大型查询拆分为多个批次
"Species required"错误:
- 蛋白质数量>10时添加参数
species - 为了一致性始终包含物种参数
结果不符合预期:
- 使用检查STRING版本
string_version() - 验证是否合适(功能型vs物理型)
network_type - 重新审视置信度阈值的选择
Additional Resources
其他资源
For proteome-scale analysis or complete species network upload:
- Visit https://string-db.org
- Use "Upload proteome" feature
- STRING will generate complete interaction network and predict functions
For bulk downloads of complete datasets:
- Download page: https://string-db.org/cgi/download
- Includes complete interaction files, protein annotations, and pathway mappings
如需蛋白质组规模的分析或完整物种网络上传:
- 访问https://string-db.org
- 使用"Upload proteome"功能
- STRING将生成完整的相互作用网络并预测功能
如需批量下载完整数据集:
- 下载页面:https://string-db.org/cgi/download
- 包含完整的相互作用文件、蛋白质注释和通路映射
Data License
数据许可
STRING data is freely available under Creative Commons BY 4.0 license:
- Free for academic and commercial use
- Attribution required when publishing
- Cite latest STRING publication
STRING数据根据Creative Commons BY 4.0许可免费提供:
- 学术和商业用途均免费
- 发表时需注明来源
- 引用最新的STRING出版物
Citation
引用
When using STRING in publications, cite the most recent publication from: https://string-db.org/cgi/about
在出版物中使用STRING时,请引用最新的STRING出版物,地址:https://string-db.org/cgi/about