string-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

STRING Database

STRING数据库

Overview

概述

STRING is a comprehensive database of known and predicted protein-protein interactions covering 59M proteins and 20B+ interactions across 5000+ organisms. Query interaction networks, perform functional enrichment, discover partners via REST API for systems biology and pathway analysis.
STRING是一个涵盖已知和预测蛋白质-蛋白质相互作用的综合数据库,覆盖5000+种生物的5900万种蛋白质和200亿+条相互作用关系。可通过REST API查询相互作用网络、执行功能富集分析、发现相互作用伙伴,用于系统生物学和通路分析。

When to Use This Skill

何时使用该技能

This skill should be used when:
  • Retrieving protein-protein interaction networks for single or multiple proteins
  • Performing functional enrichment analysis (GO, KEGG, Pfam) on protein lists
  • Discovering interaction partners and expanding protein networks
  • Testing if proteins form significantly enriched functional modules
  • Generating network visualizations with evidence-based coloring
  • Analyzing homology and protein family relationships
  • Conducting cross-species protein interaction comparisons
  • Identifying hub proteins and network connectivity patterns
在以下场景中应使用本技能:
  • 检索单个或多个蛋白质的蛋白质-蛋白质相互作用网络
  • 对蛋白质列表执行功能富集分析(GO、KEGG、Pfam)
  • 发现相互作用伙伴并扩展蛋白质网络
  • 测试蛋白质是否形成显著富集的功能模块
  • 生成带有证据着色的网络可视化图
  • 分析同源性和蛋白质家族关系
  • 进行跨物种蛋白质相互作用比较
  • 识别枢纽蛋白质和网络连接模式

Quick Start

快速开始

The skill provides:
  1. Python helper functions (
    scripts/string_api.py
    ) for all STRING REST API operations
  2. Comprehensive reference documentation (
    references/string_reference.md
    ) with detailed API specifications
When users request STRING data, determine which operation is needed and use the appropriate function from
scripts/string_api.py
.
本技能提供:
  1. 用于所有STRING REST API操作的Python辅助函数(
    scripts/string_api.py
  2. 包含详细API规范的综合参考文档(
    references/string_reference.md
当用户请求STRING数据时,确定所需操作并使用
scripts/string_api.py
中的对应函数。

Core Operations

核心操作

1. Identifier Mapping (
string_map_ids
)

1. 标识符映射(
string_map_ids

Convert gene names, protein names, and external IDs to STRING identifiers.
When to use: Starting any STRING analysis, validating protein names, finding canonical identifiers.
Usage:
python
from scripts.string_api import string_map_ids
将基因名称、蛋白质名称和外部ID转换为STRING标识符。
适用场景:启动任何STRING分析、验证蛋白质名称、查找标准标识符。
使用示例
python
from scripts.string_api import string_map_ids

Map single protein

映射单个蛋白质

result = string_map_ids('TP53', species=9606)
result = string_map_ids('TP53', species=9606)

Map multiple proteins

映射多个蛋白质

result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)

Map with multiple matches per query

为每个查询返回多个匹配结果

result = string_map_ids('p53', species=9606, limit=5)

**Parameters**:
- `species`: NCBI taxon ID (9606 = human, 10090 = mouse, 7227 = fly)
- `limit`: Number of matches per identifier (default: 1)
- `echo_query`: Include query term in output (default: 1)

**Best practice**: Always map identifiers first for faster subsequent queries.
result = string_map_ids('p53', species=9606, limit=5)

**参数**:
- `species`:NCBI分类ID(9606 = 人类,10090 = 小鼠,7227 = 果蝇)
- `limit`:每个标识符的匹配结果数量(默认值:1)
- `echo_query`:在输出中包含查询术语(默认值:1)

**最佳实践**:在后续查询前始终先映射标识符,以提高查询速度。

2. Network Retrieval (
string_network
)

2. 网络检索(
string_network

Get protein-protein interaction network data in tabular format.
When to use: Building interaction networks, analyzing connectivity, retrieving interaction evidence.
Usage:
python
from scripts.string_api import string_network
以表格格式获取蛋白质-蛋白质相互作用网络数据。
适用场景:构建相互作用网络、分析连接性、检索相互作用证据。
使用示例
python
from scripts.string_api import string_network

Get network for single protein

获取单个蛋白质的网络

network = string_network('9606.ENSP00000269305', species=9606)
network = string_network('9606.ENSP00000269305', species=9606)

Get network with multiple proteins

获取多个蛋白质的网络

proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493'] network = string_network(proteins, required_score=700)
proteins = ['9606.ENSP00000269305', '9606.ENSP00000275493'] network = string_network(proteins, required_score=700)

Expand network with additional interactors

添加额外的相互作用蛋白以扩展网络

network = string_network('TP53', species=9606, add_nodes=10, required_score=400)
network = string_network('TP53', species=9606, add_nodes=10, required_score=400)

Physical interactions only

仅获取物理相互作用

network = string_network('TP53', species=9606, network_type='physical')

**Parameters**:
- `required_score`: Confidence threshold (0-1000)
  - 150: low confidence (exploratory)
  - 400: medium confidence (default, standard analysis)
  - 700: high confidence (conservative)
  - 900: highest confidence (very stringent)
- `network_type`: `'functional'` (all evidence, default) or `'physical'` (direct binding only)
- `add_nodes`: Add N most connected proteins (0-10)

**Output columns**: Interaction pairs, confidence scores, and individual evidence scores (neighborhood, fusion, coexpression, experimental, database, text-mining).
network = string_network('TP53', species=9606, network_type='physical')

**参数**:
- `required_score`:置信度阈值(0-1000)
  - 150:低置信度(探索性分析)
  - 400:中等置信度(默认值,标准分析)
  - 700:高置信度(保守性分析)
  - 900:最高置信度(非常严格)
- `network_type`:`'functional'`(所有证据类型,默认值)或`'physical'`(仅直接结合)
- `add_nodes`:添加N个连接最紧密的蛋白质(0-10)

**输出列**:相互作用对、置信度得分、各证据类型得分(邻域、融合、共表达、实验、数据库、文本挖掘)。

3. Network Visualization (
string_network_image
)

3. 网络可视化(
string_network_image

Generate network visualization as PNG image.
When to use: Creating figures, visual exploration, presentations.
Usage:
python
from scripts.string_api import string_network_image
生成PNG格式的网络可视化图。
适用场景:创建图表、可视化探索、演示汇报。
使用示例
python
from scripts.string_api import string_network_image

Get network image

获取网络图片

proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1'] img_data = string_network_image(proteins, species=9606, required_score=700)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1'] img_data = string_network_image(proteins, species=9606, required_score=700)

Save image

保存图片

with open('network.png', 'wb') as f: f.write(img_data)
with open('network.png', 'wb') as f: f.write(img_data)

Evidence-colored network

带证据着色的网络

img = string_network_image(proteins, species=9606, network_flavor='evidence')
img = string_network_image(proteins, species=9606, network_flavor='evidence')

Confidence-based visualization

基于置信度的可视化

img = string_network_image(proteins, species=9606, network_flavor='confidence')
img = string_network_image(proteins, species=9606, network_flavor='confidence')

Actions network (activation/inhibition)

作用网络(激活/抑制)

img = string_network_image(proteins, species=9606, network_flavor='actions')

**Network flavors**:
- `'evidence'`: Colored lines show evidence types (default)
- `'confidence'`: Line thickness represents confidence
- `'actions'`: Shows activating/inhibiting relationships
img = string_network_image(proteins, species=9606, network_flavor='actions')

**网络风格**:
- `'evidence'`:彩色线条表示证据类型(默认值)
- `'confidence'`:线条粗细代表置信度
- `'actions'`:显示激活/抑制关系

4. Interaction Partners (
string_interaction_partners
)

4. 相互作用伙伴(
string_interaction_partners

Find all proteins that interact with given protein(s).
When to use: Discovering novel interactions, finding hub proteins, expanding networks.
Usage:
python
from scripts.string_api import string_interaction_partners
查找与给定蛋白质相互作用的所有蛋白质。
适用场景:发现新型相互作用、查找枢纽蛋白质、扩展网络。
使用示例
python
from scripts.string_api import string_interaction_partners

Get top 10 interactors of TP53

获取TP53的前10个相互作用伙伴

partners = string_interaction_partners('TP53', species=9606, limit=10)
partners = string_interaction_partners('TP53', species=9606, limit=10)

Get high-confidence interactors

获取高置信度的相互作用伙伴

partners = string_interaction_partners('TP53', species=9606, limit=20, required_score=700)
partners = string_interaction_partners('TP53', species=9606, limit=20, required_score=700)

Find interactors for multiple proteins

查找多个蛋白质的相互作用伙伴

partners = string_interaction_partners(['TP53', 'MDM2'], species=9606, limit=15)

**Parameters**:
- `limit`: Maximum number of partners to return (default: 10)
- `required_score`: Confidence threshold (0-1000)

**Use cases**:
- Hub protein identification
- Network expansion from seed proteins
- Discovering indirect connections
partners = string_interaction_partners(['TP53', 'MDM2'], species=9606, limit=15)

**参数**:
- `limit`:返回的最大伙伴数量(默认值:10)
- `required_score`:置信度阈值(0-1000)

**应用场景**:
- 枢纽蛋白质识别
- 从种子蛋白质扩展网络
- 发现间接连接

5. Functional Enrichment (
string_enrichment
)

5. 功能富集分析(
string_enrichment

Perform enrichment analysis across Gene Ontology, KEGG pathways, Pfam domains, and more.
When to use: Interpreting protein lists, pathway analysis, functional characterization, understanding biological processes.
Usage:
python
from scripts.string_enrichment import string_enrichment
对基因本体(Gene Ontology)、KEGG通路、Pfam结构域等执行富集分析。
适用场景:解释蛋白质列表、通路分析、功能表征、理解生物学过程。
使用示例
python
from scripts.string_enrichment import string_enrichment

Enrichment for a protein list

对蛋白质列表进行富集分析

proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73'] enrichment = string_enrichment(proteins, species=9606)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1', 'ATR', 'TP73'] enrichment = string_enrichment(proteins, species=9606)

Parse results to find significant terms

解析结果以找到显著术语

import pandas as pd df = pd.read_csv(io.StringIO(enrichment), sep='\t') significant = df[df['fdr'] < 0.05]

**Enrichment categories**:
- **Gene Ontology**: Biological Process, Molecular Function, Cellular Component
- **KEGG Pathways**: Metabolic and signaling pathways
- **Pfam**: Protein domains
- **InterPro**: Protein families and domains
- **SMART**: Domain architecture
- **UniProt Keywords**: Curated functional keywords

**Output columns**:
- `category`: Annotation database (e.g., "KEGG Pathways", "GO Biological Process")
- `term`: Term identifier
- `description`: Human-readable term description
- `number_of_genes`: Input proteins with this annotation
- `p_value`: Uncorrected enrichment p-value
- `fdr`: False discovery rate (corrected p-value)

**Statistical method**: Fisher's exact test with Benjamini-Hochberg FDR correction.

**Interpretation**: FDR < 0.05 indicates statistically significant enrichment.
import pandas as pd df = pd.read_csv(io.StringIO(enrichment), sep='\t') significant = df[df['fdr'] < 0.05]

**富集类别**:
- **基因本体(Gene Ontology)**:生物过程、分子功能、细胞组分
- **KEGG通路**:代谢和信号通路
- **Pfam**:蛋白质结构域
- **InterPro**:蛋白质家族和结构域
- **SMART**:结构域架构
- **UniProt关键词**:人工整理的功能关键词

**输出列**:
- `category`:注释数据库(例如"KEGG Pathways"、"GO Biological Process")
- `term`:术语标识符
- `description`:人类可读的术语描述
- `number_of_genes`:带有该注释的输入蛋白质数量
- `p_value`:未校正的富集p值
- `fdr`:错误发现率(校正后的p值)

**统计方法**:采用Benjamini-Hochberg FDR校正的Fisher精确检验。

**解读**:FDR < 0.05表示具有统计学意义的富集。

6. PPI Enrichment (
string_ppi_enrichment
)

6. PPI富集分析(
string_ppi_enrichment

Test if a protein network has significantly more interactions than expected by chance.
When to use: Validating if proteins form functional module, testing network connectivity.
Usage:
python
from scripts.string_api import string_ppi_enrichment
import json
测试蛋白质网络的相互作用数量是否显著高于随机预期。
适用场景:验证蛋白质是否形成功能模块、测试网络连接性。
使用示例
python
from scripts.string_api import string_ppi_enrichment
import json

Test network connectivity

测试网络连接性

proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1'] result = string_ppi_enrichment(proteins, species=9606, required_score=400)
proteins = ['TP53', 'MDM2', 'ATM', 'CHEK2', 'BRCA1'] result = string_ppi_enrichment(proteins, species=9606, required_score=400)

Parse JSON result

解析JSON结果

data = json.loads(result) print(f"Observed edges: {data['number_of_edges']}") print(f"Expected edges: {data['expected_number_of_edges']}") print(f"P-value: {data['p_value']}")

**Output fields**:
- `number_of_nodes`: Proteins in network
- `number_of_edges`: Observed interactions
- `expected_number_of_edges`: Expected in random network
- `p_value`: Statistical significance

**Interpretation**:
- p-value < 0.05: Network is significantly enriched (proteins likely form functional module)
- p-value ≥ 0.05: No significant enrichment (proteins may be unrelated)
data = json.loads(result) print(f"观测到的边数: {data['number_of_edges']}") print(f"预期边数: {data['expected_number_of_edges']}") print(f"P值: {data['p_value']}")

**输出字段**:
- `number_of_nodes`:网络中的蛋白质数量
- `number_of_edges`:观测到的相互作用数量
- `expected_number_of_edges`:随机网络中的预期边数
- `p_value`:统计学显著性

**解读**:
- p值 < 0.05:网络显著富集(蛋白质可能形成功能模块)
- p值 ≥ 0.05:无显著富集(蛋白质可能不相关)

7. Homology Scores (
string_homology
)

7. 同源性得分(
string_homology

Retrieve protein similarity and homology information.
When to use: Identifying protein families, paralog analysis, cross-species comparisons.
Usage:
python
from scripts.string_api import string_homology
检索蛋白质相似性和同源性信息。
适用场景:识别蛋白质家族、旁系同源物分析、跨物种比较。
使用示例
python
from scripts.string_api import string_homology

Get homology between proteins

获取蛋白质间的同源性

proteins = ['TP53', 'TP63', 'TP73'] # p53 family homology = string_homology(proteins, species=9606)

**Use cases**:
- Protein family identification
- Paralog discovery
- Evolutionary analysis
proteins = ['TP53', 'TP63', 'TP73'] # p53家族 homology = string_homology(proteins, species=9606)

**应用场景**:
- 蛋白质家族识别
- 旁系同源物发现
- 进化分析

8. Version Information (
string_version
)

8. 版本信息(
string_version

Get current STRING database version.
When to use: Ensuring reproducibility, documenting methods.
Usage:
python
from scripts.string_api import string_version

version = string_version()
print(f"STRING version: {version}")
获取当前STRING数据库版本。
适用场景:确保可重复性、记录方法。
使用示例
python
from scripts.string_api import string_version

version = string_version()
print(f"STRING版本: {version}")

Common Analysis Workflows

常见分析工作流

Workflow 1: Protein List Analysis (Standard Workflow)

工作流1:蛋白质列表分析(标准工作流)

Use case: Analyze a list of proteins from experiment (e.g., differential expression, proteomics).
python
from scripts.string_api import (string_map_ids, string_network,
                                string_enrichment, string_ppi_enrichment,
                                string_network_image)
应用场景:分析实验得到的蛋白质列表(例如差异表达、蛋白质组学)。
python
from scripts.string_api import (string_map_ids, string_network,
                                string_enrichment, string_ppi_enrichment,
                                string_network_image)

Step 1: Map gene names to STRING IDs

步骤1:将基因名称映射为STRING ID

gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2'] mapping = string_map_ids(gene_list, species=9606)
gene_list = ['TP53', 'BRCA1', 'ATM', 'CHEK2', 'MDM2', 'ATR', 'BRCA2'] mapping = string_map_ids(gene_list, species=9606)

Step 2: Get interaction network

步骤2:获取相互作用网络

network = string_network(gene_list, species=9606, required_score=400)
network = string_network(gene_list, species=9606, required_score=400)

Step 3: Test if network is enriched

步骤3:测试网络是否富集

ppi_result = string_ppi_enrichment(gene_list, species=9606)
ppi_result = string_ppi_enrichment(gene_list, species=9606)

Step 4: Perform functional enrichment

步骤4:执行功能富集分析

enrichment = string_enrichment(gene_list, species=9606)
enrichment = string_enrichment(gene_list, species=9606)

Step 5: Generate network visualization

步骤5:生成网络可视化图

img = string_network_image(gene_list, species=9606, network_flavor='evidence', required_score=400) with open('protein_network.png', 'wb') as f: f.write(img)
img = string_network_image(gene_list, species=9606, network_flavor='evidence', required_score=400) with open('protein_network.png', 'wb') as f: f.write(img)

Step 6: Parse and interpret results

步骤6:解析并解读结果

undefined
undefined

Workflow 2: Single Protein Investigation

工作流2:单个蛋白质研究

Use case: Deep dive into one protein's interactions and partners.
python
from scripts.string_api import (string_map_ids, string_interaction_partners,
                                string_network_image)
应用场景:深入研究单个蛋白质的相互作用和伙伴。
python
from scripts.string_api import (string_map_ids, string_interaction_partners,
                                string_network_image)

Step 1: Map protein name

步骤1:映射蛋白质名称

protein = 'TP53' mapping = string_map_ids(protein, species=9606)
protein = 'TP53' mapping = string_map_ids(protein, species=9606)

Step 2: Get all interaction partners

步骤2:获取所有相互作用伙伴

partners = string_interaction_partners(protein, species=9606, limit=20, required_score=700)
partners = string_interaction_partners(protein, species=9606, limit=20, required_score=700)

Step 3: Visualize expanded network

步骤3:可视化扩展后的网络

img = string_network_image(protein, species=9606, add_nodes=15, network_flavor='confidence', required_score=700) with open('tp53_network.png', 'wb') as f: f.write(img)
undefined
img = string_network_image(protein, species=9606, add_nodes=15, network_flavor='confidence', required_score=700) with open('tp53_network.png', 'wb') as f: f.write(img)
undefined

Workflow 3: Pathway-Centric Analysis

工作流3:通路中心分析

Use case: Identify and visualize proteins in a specific biological pathway.
python
from scripts.string_api import string_enrichment, string_network
应用场景:识别并可视化特定生物学通路中的蛋白质。
python
from scripts.string_api import string_enrichment, string_network

Step 1: Start with known pathway proteins

步骤1:从已知通路蛋白质开始

dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2', 'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']
dna_repair_proteins = ['TP53', 'ATM', 'ATR', 'CHEK1', 'CHEK2', 'BRCA1', 'BRCA2', 'RAD51', 'XRCC1']

Step 2: Get network

步骤2:获取网络

network = string_network(dna_repair_proteins, species=9606, required_score=700, add_nodes=5)
network = string_network(dna_repair_proteins, species=9606, required_score=700, add_nodes=5)

Step 3: Enrichment to confirm pathway annotation

步骤3:富集分析以确认通路注释

enrichment = string_enrichment(dna_repair_proteins, species=9606)
enrichment = string_enrichment(dna_repair_proteins, species=9606)

Step 4: Parse enrichment for DNA repair pathways

步骤4:解析富集结果以获取DNA修复通路

import pandas as pd import io df = pd.read_csv(io.StringIO(enrichment), sep='\t') dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
undefined
import pandas as pd import io df = pd.read_csv(io.StringIO(enrichment), sep='\t') dna_repair = df[df['description'].str.contains('DNA repair', case=False)]
undefined

Workflow 4: Cross-Species Analysis

工作流4:跨物种分析

Use case: Compare protein interactions across different organisms.
python
from scripts.string_api import string_network
应用场景:比较不同生物间的蛋白质相互作用。
python
from scripts.string_api import string_network

Human network

人类网络

human_network = string_network('TP53', species=9606, required_score=700)
human_network = string_network('TP53', species=9606, required_score=700)

Mouse network

小鼠网络

mouse_network = string_network('Trp53', species=10090, required_score=700)
mouse_network = string_network('Trp53', species=10090, required_score=700)

Yeast network (if ortholog exists)

酵母网络(如果存在同源物)

yeast_network = string_network('gene_name', species=4932, required_score=700)
undefined
yeast_network = string_network('gene_name', species=4932, required_score=700)
undefined

Workflow 5: Network Expansion and Discovery

工作流5:网络扩展与发现

Use case: Start with seed proteins and discover connected functional modules.
python
from scripts.string_api import (string_interaction_partners, string_network,
                                string_enrichment)
应用场景:从种子蛋白质开始,发现相连的功能模块。
python
from scripts.string_api import (string_interaction_partners, string_network,
                                string_enrichment)

Step 1: Start with seed protein(s)

步骤1:从种子蛋白质开始

seed_proteins = ['TP53']
seed_proteins = ['TP53']

Step 2: Get first-degree interactors

步骤2:获取一级相互作用伙伴

partners = string_interaction_partners(seed_proteins, species=9606, limit=30, required_score=700)
partners = string_interaction_partners(seed_proteins, species=9606, limit=30, required_score=700)

Step 3: Parse partners to get protein list

步骤3:解析伙伴以获取蛋白质列表

import pandas as pd import io df = pd.read_csv(io.StringIO(partners), sep='\t') all_proteins = list(set(df['preferredName_A'].tolist() + df['preferredName_B'].tolist()))
import pandas as pd import io df = pd.read_csv(io.StringIO(partners), sep='\t') all_proteins = list(set(df['preferredName_A'].tolist() + df['preferredName_B'].tolist()))

Step 4: Perform enrichment on expanded network

步骤4:对扩展后的网络执行富集分析

enrichment = string_enrichment(all_proteins[:50], species=9606)
enrichment = string_enrichment(all_proteins[:50], species=9606)

Step 5: Filter for interesting functional modules

步骤5:筛选有趣的功能模块

enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t') modules = enrichment_df[enrichment_df['fdr'] < 0.001]
undefined
enrichment_df = pd.read_csv(io.StringIO(enrichment), sep='\t') modules = enrichment_df[enrichment_df['fdr'] < 0.001]
undefined

Common Species

常见物种

When specifying species, use NCBI taxon IDs:
OrganismCommon NameTaxon ID
Homo sapiensHuman9606
Mus musculusMouse10090
Rattus norvegicusRat10116
Drosophila melanogasterFruit fly7227
Caenorhabditis elegansC. elegans6239
Saccharomyces cerevisiaeYeast4932
Arabidopsis thalianaThale cress3702
Escherichia coliE. coli511145
Danio rerioZebrafish7955
指定物种时,请使用NCBI分类ID:
生物通用名称分类ID
Homo sapiens人类9606
Mus musculus小鼠10090
Rattus norvegicus大鼠10116
Drosophila melanogaster果蝇7227
Caenorhabditis elegans秀丽隐杆线虫6239
Saccharomyces cerevisiae酵母4932
Arabidopsis thaliana拟南芥3702
Escherichia coli大肠杆菌511145
Danio rerio斑马鱼7955

Understanding Confidence Scores

置信度得分说明

STRING provides combined confidence scores (0-1000) integrating multiple evidence types:
STRING提供整合多种证据类型的综合置信度得分(0-1000):

Evidence Channels

证据渠道

  1. Neighborhood (nscore): Conserved genomic neighborhood across species
  2. Fusion (fscore): Gene fusion events
  3. Phylogenetic Profile (pscore): Co-occurrence patterns across species
  4. Coexpression (ascore): Correlated RNA expression
  5. Experimental (escore): Biochemical and genetic experiments
  6. Database (dscore): Curated pathway and complex databases
  7. Text-mining (tscore): Literature co-occurrence and NLP extraction
  1. 邻域(nscore):跨物种保守的基因组邻域
  2. 融合(fscore):基因融合事件
  3. 系统发育谱(pscore):跨物种共现模式
  4. 共表达(ascore):相关的RNA表达
  5. 实验(escore):生化和遗传实验
  6. 数据库(dscore):人工整理的通路和复合物数据库
  7. 文本挖掘(tscore):文献共现和NLP提取

Recommended Thresholds

推荐阈值

Choose threshold based on analysis goals:
  • 150 (low confidence): Exploratory analysis, hypothesis generation
  • 400 (medium confidence): Standard analysis, balanced sensitivity/specificity
  • 700 (high confidence): Conservative analysis, high-confidence interactions
  • 900 (highest confidence): Very stringent, experimental evidence preferred
Trade-offs:
  • Lower thresholds: More interactions (higher recall, more false positives)
  • Higher thresholds: Fewer interactions (higher precision, more false negatives)
根据分析目标选择阈值:
  • 150(低置信度):探索性分析、假设生成
  • 400(中等置信度):标准分析,平衡灵敏度和特异性
  • 700(高置信度):保守性分析,高置信度相互作用
  • 900(最高置信度):非常严格,优先选择实验证据
权衡
  • 较低阈值:更多相互作用(召回率更高,假阳性更多)
  • 较高阈值:更少相互作用(精确率更高,假阴性更多)

Network Types

网络类型

Functional Networks (Default)

功能网络(默认)

Includes all evidence types (experimental, computational, text-mining). Represents proteins that are functionally associated, even without direct physical binding.
When to use:
  • Pathway analysis
  • Functional enrichment studies
  • Systems biology
  • Most general analyses
包含所有证据类型(实验、计算、文本挖掘)。代表功能相关的蛋白质,即使没有直接物理结合。
适用场景
  • 通路分析
  • 功能富集研究
  • 系统生物学
  • 大多数常规分析

Physical Networks

物理网络

Only includes evidence for direct physical binding (experimental data and database annotations for physical interactions).
When to use:
  • Structural biology studies
  • Protein complex analysis
  • Direct binding validation
  • When physical contact is required
仅包含直接物理结合的证据(实验数据和数据库注释的物理相互作用)。
适用场景
  • 结构生物学研究
  • 蛋白质复合物分析
  • 直接结合验证
  • 需要物理接触的场景

API Best Practices

API最佳实践

  1. Always map identifiers first: Use
    string_map_ids()
    before other operations for faster queries
  2. Use STRING IDs when possible: Use format
    9606.ENSP00000269305
    instead of gene names
  3. Specify species for networks >10 proteins: Required for accurate results
  4. Respect rate limits: Wait 1 second between API calls
  5. Use versioned URLs for reproducibility: Available in reference documentation
  6. Handle errors gracefully: Check for "Error:" prefix in returned strings
  7. Choose appropriate confidence thresholds: Match threshold to analysis goals
  1. 始终先映射标识符:在执行其他操作前使用
    string_map_ids()
    ,以提高查询速度
  2. 尽可能使用STRING ID:使用
    9606.ENSP00000269305
    格式而非基因名称
  3. 蛋白质数量>10时指定物种:为了结果准确必须指定
  4. 遵守速率限制:API调用间隔至少1秒
  5. 使用带版本的URL以确保可重复性:参考文档中提供
  6. 优雅处理错误:检查返回字符串中的"Error:"前缀
  7. 选择合适的置信度阈值:匹配阈值与分析目标

Detailed Reference

详细参考

For comprehensive API documentation, complete parameter lists, output formats, and advanced usage, refer to
references/string_reference.md
. This includes:
  • Complete API endpoint specifications
  • All supported output formats (TSV, JSON, XML, PSI-MI)
  • Advanced features (bulk upload, values/ranks enrichment)
  • Error handling and troubleshooting
  • Integration with other tools (Cytoscape, R, Python libraries)
  • Data license and citation information
如需完整的API文档、参数列表、输出格式和高级用法,请参考
references/string_reference.md
。其中包括:
  • 完整的API端点规范
  • 所有支持的输出格式(TSV、JSON、XML、PSI-MI)
  • 高级功能(批量上传、值/排名富集)
  • 错误处理和故障排除
  • 与其他工具的集成(Cytoscape、R、Python库)
  • 数据许可和引用信息

Troubleshooting

故障排除

No proteins found:
  • Verify species parameter matches identifiers
  • Try mapping identifiers first with
    string_map_ids()
  • Check for typos in protein names
Empty network results:
  • Lower confidence threshold (
    required_score
    )
  • Check if proteins actually interact
  • Verify species is correct
Timeout or slow queries:
  • Reduce number of input proteins
  • Use STRING IDs instead of gene names
  • Split large queries into batches
"Species required" error:
  • Add
    species
    parameter for networks with >10 proteins
  • Always include species for consistency
Results look unexpected:
  • Check STRING version with
    string_version()
  • Verify network_type is appropriate (functional vs physical)
  • Review confidence threshold selection
未找到蛋白质
  • 验证物种参数与标识符匹配
  • 先尝试使用
    string_map_ids()
    映射标识符
  • 检查蛋白质名称是否有拼写错误
网络结果为空
  • 降低置信度阈值(
    required_score
  • 检查蛋白质是否真的存在相互作用
  • 验证物种是否正确
超时或查询缓慢
  • 减少输入蛋白质的数量
  • 使用STRING ID而非基因名称
  • 将大型查询拆分为多个批次
"Species required"错误
  • 蛋白质数量>10时添加
    species
    参数
  • 为了一致性始终包含物种参数
结果不符合预期
  • 使用
    string_version()
    检查STRING版本
  • 验证
    network_type
    是否合适(功能型vs物理型)
  • 重新审视置信度阈值的选择

Additional Resources

其他资源

For proteome-scale analysis or complete species network upload:
  • Visit https://string-db.org
  • Use "Upload proteome" feature
  • STRING will generate complete interaction network and predict functions
For bulk downloads of complete datasets:
如需蛋白质组规模的分析或完整物种网络上传:
  • 访问https://string-db.org
  • 使用"Upload proteome"功能
  • STRING将生成完整的相互作用网络并预测功能
如需批量下载完整数据集:

Data License

数据许可

STRING data is freely available under Creative Commons BY 4.0 license:
  • Free for academic and commercial use
  • Attribution required when publishing
  • Cite latest STRING publication
STRING数据根据Creative Commons BY 4.0许可免费提供:
  • 学术和商业用途均免费
  • 发表时需注明来源
  • 引用最新的STRING出版物

Citation

引用

When using STRING in publications, cite the most recent publication from: https://string-db.org/cgi/about
在出版物中使用STRING时,请引用最新的STRING出版物,地址:https://string-db.org/cgi/about