neo4j-snowflake-graph-analytics-skill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Snowflake Native App — graph algorithm power inside Snowflake. Data stays in Snowflake; project into a graph, run algorithms via SQL
CALL
, results written back to Snowflake tables.

Snowflake原生应用——在Snowflake内实现图算法能力。数据无需移出Snowflake;将数据投影为图,通过SQL
CALL
运行算法,结果写回Snowflake表中。

When to Use

使用场景

  • Running graph algorithms / GDS in Snowflake
  • Data in Snowflake tables
  • On-demand / pipeline workloads — ephemeral sessions, pay per session-minute
  • Full isolation from the live database during analytics
  • 在Snowflake中运行图算法/GDS
  • 数据存储在Snowflake表中
  • 按需/流水线工作负载——临时会话,按会话分钟计费
  • 分析期间与实时数据库完全隔离

When NOT to Use

不适用场景

  • Aura Pro with embedded GDS plugin
    neo4j-gds-skill
  • Aura Graph Analytics
    neo4j-aura-graph-analytics-skill
  • Self-managed Neo4j with embedded GDS plugin
    neo4j-gds-skill
  • Writing Cypher queries
    neo4j-cypher-skill
  • 搭载嵌入式GDS插件的Aura Pro → 使用
    neo4j-gds-skill
  • Aura Graph Analytics → 使用
    neo4j-aura-graph-analytics-skill
  • 搭载嵌入式GDS插件的自托管Neo4j → 使用
    neo4j-gds-skill
  • 编写Cypher查询 → 使用
    neo4j-cypher-skill

Key Concepts

核心概念

Project → Compute → Write

投影 → 计算 → 写入

Every algorithm run follows three steps:
  1. Project — specify node/relationship tables; app builds in-memory graph
  2. Compute — run algorithm with config parameters
  3. Write — results written back to a Snowflake table
每次算法运行都遵循三个步骤:
  1. 投影——指定节点/关系表;应用程序构建内存中图
  2. 计算——使用配置参数运行算法
  3. 写入——将结果写回Snowflake表

Required Table Columns

所需表列

Table typeRequired columnsOptional columns
Node table
nodeId
(Number)
Any additional columns become node properties
Relationship table
sourceNodeId
(Number),
targetNodeId
(Number)
Any additional columns become relationship properties
If your tables use different column names, create a view aliasing to
nodeId
,
sourceNodeId
,
targetNodeId
.
表类型必填列可选列
节点表
nodeId
(数字类型)
任何额外列将成为节点属性
关系表
sourceNodeId
(数字类型)、
targetNodeId
(数字类型)
任何额外列将成为关系属性
如果你的表使用不同列名,请创建视图将列别名改为
nodeId
sourceNodeId
targetNodeId

Graph Orientation

图方向

When projecting relationships, you can set
orientation
:
  • NATURAL
    (default) — directed, source → target
  • UNDIRECTED
    — treated as bidirectional
  • REVERSE
    — direction flipped

投影关系时,可设置
orientation
参数:
  • NATURAL
    (默认)——有向图,从源节点指向目标节点
  • UNDIRECTED
    ——视为双向图
  • REVERSE
    ——方向反转

Installation

安装步骤

  1. Go to the Snowflake Marketplace
  2. Install Neo4j Graph Analytics (default app name:
    Neo4j_Graph_Analytics
    )
  3. During install, enable Event sharing when prompted
  4. After install, go to Data Products → Apps → Neo4j Graph Analytics → Privileges → Grant
  5. Grant
    CREATE COMPUTE POOL
    and
    CREATE WAREHOUSE
    privileges, then click Activate

  1. 访问Snowflake Marketplace
  2. 安装Neo4j Graph Analytics(默认应用名称:
    Neo4j_Graph_Analytics
  3. 安装过程中,当提示时启用事件共享
  4. 安装完成后,进入数据产品 → 应用 → Neo4j Graph Analytics → 权限 → 授予
  5. 授予
    CREATE COMPUTE POOL
    CREATE WAREHOUSE
    权限,然后点击激活

Privilege Setup (run once per database/schema)

权限设置(每个数据库/架构运行一次)

sql
-- Step 1: Use ACCOUNTADMIN to set up roles and grants
USE ROLE ACCOUNTADMIN;

-- Create a consumer role for users of the application
CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE;
SET MY_USER = (SELECT CURRENT_USER());
GRANT ROLE MY_CONSUMER_ROLE TO USER IDENTIFIER($MY_USER);

-- Step 2: Create a database role and grant it to the app
USE DATABASE MY_DATABASE;
CREATE DATABASE ROLE IF NOT EXISTS MY_DB_ROLE;
GRANT USAGE ON DATABASE MY_DATABASE TO DATABASE ROLE MY_DB_ROLE;
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL VIEWS IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT CREATE TABLE ON SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT DATABASE ROLE MY_DB_ROLE TO APPLICATION Neo4j_Graph_Analytics;

-- Step 3: Grant the consumer role access to output tables
GRANT USAGE ON DATABASE MY_DATABASE TO ROLE MY_CONSUMER_ROLE;
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO ROLE MY_CONSUMER_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO ROLE MY_CONSUMER_ROLE;

-- Step 4: Switch to the consumer role to run algorithms
USE ROLE MY_CONSUMER_ROLE;
Replace
P2P
,
PUBLIC
,
GRAPH_USER_ROLE
, and
GRAPH_DB_ROLE
with your actual names throughout.

sql
-- Step 1: Use ACCOUNTADMIN to set up roles and grants
USE ROLE ACCOUNTADMIN;

-- Create a consumer role for users of the application
CREATE ROLE IF NOT EXISTS MY_CONSUMER_ROLE;
GRANT APPLICATION ROLE Neo4j_Graph_Analytics.app_user TO ROLE MY_CONSUMER_ROLE;
SET MY_USER = (SELECT CURRENT_USER());
GRANT ROLE MY_CONSUMER_ROLE TO USER IDENTIFIER($MY_USER);

-- Step 2: Create a database role and grant it to the app
USE DATABASE MY_DATABASE;
CREATE DATABASE ROLE IF NOT EXISTS MY_DB_ROLE;
GRANT USAGE ON DATABASE MY_DATABASE TO DATABASE ROLE MY_DB_ROLE;
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON ALL VIEWS IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT CREATE TABLE ON SCHEMA MY_DATABASE.MY_SCHEMA TO DATABASE ROLE MY_DB_ROLE;
GRANT DATABASE ROLE MY_DB_ROLE TO APPLICATION Neo4j_Graph_Analytics;

-- Step 3: Grant the consumer role access to output tables
GRANT USAGE ON DATABASE MY_DATABASE TO ROLE MY_CONSUMER_ROLE;
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO ROLE MY_CONSUMER_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_DATABASE.MY_SCHEMA TO ROLE MY_CONSUMER_ROLE;

-- Step 4: Switch to the consumer role to run algorithms
USE ROLE MY_CONSUMER_ROLE;
请将全程的
P2P
PUBLIC
GRAPH_USER_ROLE
GRAPH_DB_ROLE
替换为你的实际名称。

Running an Algorithm — Full Example

运行算法——完整示例

sql
-- Optional: set default database to avoid fully-qualified names
USE DATABASE Neo4j_Graph_Analytics;
USE ROLE GRAPH_USER_ROLE;

-- Call WCC (Weakly Connected Components)
CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
    'defaultTablePrefix': 'P2P.PUBLIC',
    'project': {
        'nodeTables': ['USER_VW'],
        'relationshipTables': {
            'AGG_TRANSACTIONS_VW': {
                'sourceTable': 'P2P.PUBLIC.USER_VW',
                'targetTable': 'P2P.PUBLIC.USER_VW',
                'orientation': 'NATURAL'
            }
        }
    },
    'compute': { 'consecutiveIds': true },
    'write': [{
        'nodeLabel': 'NODES',
        'outputTable': 'USER_COMPONENTS'
    }]
});

-- Inspect results
SELECT * FROM P2P.PUBLIC.USER_COMPONENTS;
First argument is the compute pool size:
PoolUse
CPU_X64_XS
Dev / small graphs
CPU_X64_S/M/L
Progressively larger
HIGHMEM_X64_S/M/L
Large graphs, lower CPU need
GPU_NV_S/XS
,
GPU_GCP_NV_L4_1_24G
Compute-intensive (GraphSAGE); GPU not available in all regions

See Estimating Jobs to choose size.

sql
-- Optional: set default database to avoid fully-qualified names
USE DATABASE Neo4j_Graph_Analytics;
USE ROLE GRAPH_USER_ROLE;

-- Call WCC (Weakly Connected Components)
CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
    'defaultTablePrefix': 'P2P.PUBLIC',
    'project': {
        'nodeTables': ['USER_VW'],
        'relationshipTables': {
            'AGG_TRANSACTIONS_VW': {
                'sourceTable': 'P2P.PUBLIC.USER_VW',
                'targetTable': 'P2P.PUBLIC.USER_VW',
                'orientation': 'NATURAL'
            }
        }
    },
    'compute': { 'consecutiveIds': true },
    'write': [{
        'nodeLabel': 'NODES',
        'outputTable': 'USER_COMPONENTS'
    }]
});

-- Inspect results
SELECT * FROM P2P.PUBLIC.USER_COMPONENTS;
第一个参数是计算池大小:
计算池适用场景
CPU_X64_XS
开发/小型图
CPU_X64_S/M/L
规模逐步增大的图
HIGHMEM_X64_S/M/L
大型图,对CPU需求较低
GPU_NV_S/XS
GPU_GCP_NV_L4_1_24G
计算密集型任务(如GraphSAGE);GPU并非在所有区域可用

请查看估算任务以选择合适的计算池大小。

Available Algorithms

可用算法

Community Detection

社区检测

AlgorithmProcedureUse case
Weakly Connected Components
graph.wcc
Find disconnected subgraphs
Louvain
graph.louvain
Community detection, modularity optimisation
Leiden
graph.leiden
Improved community detection (more stable than Louvain)
K-Means Clustering
graph.kmeans
Cluster nodes by node properties
Triangle Count
graph.triangle_count
Measure local clustering / detect dense subgraphs
算法存储过程适用场景
Weakly Connected Components(弱连通分量)
graph.wcc
查找不连通的子图
Louvain
graph.louvain
社区检测,优化模块度
Leiden
graph.leiden
改进型社区检测(比Louvain更稳定)
K-Means聚类
graph.kmeans
根据节点属性对节点进行聚类
Triangle Count(三角形计数)
graph.triangle_count
衡量局部聚类/检测密集子图

Centrality

中心性

AlgorithmProcedureUse case
PageRank
graph.pagerank
Rank nodes by influence
Article Rank
graph.article_rank
PageRank variant, discounts high-degree neighbours
Betweenness Centrality
graph.betweenness
Find bridge nodes in a network
Degree Centrality
graph.degree
Count direct connections per node
算法存储过程适用场景
PageRank
graph.pagerank
根据影响力对节点排名
Article Rank
graph.article_rank
PageRank变体,降低高度数邻居的权重
Betweenness Centrality(介数中心性)
graph.betweenness
查找网络中的桥接节点
Degree Centrality(度数中心性)
graph.degree
统计每个节点的直接连接数

Pathfinding

路径查找

AlgorithmProcedureUse case
Dijkstra Source-Target
graph.dijkstra_source_target
Shortest path between two nodes
Dijkstra Single-Source
graph.dijkstra_single_source
Shortest paths from one node to all others
Delta-Stepping SSSP
graph.delta_stepping
Faster parallel shortest paths
Breadth First Search
graph.bfs
BFS traversal from a source node
Yen's K-Shortest Paths
graph.yens
Top-K shortest paths between two nodes
Max Flow
graph.max_flow
Maximum flow through a network
FastPath
graph.fastpath
Fast approximate shortest paths
算法存储过程适用场景
Dijkstra源-目标路径
graph.dijkstra_source_target
两个节点之间的最短路径
Dijkstra单源路径
graph.dijkstra_single_source
从一个节点到所有其他节点的最短路径
Delta-Stepping SSSP
graph.delta_stepping
更快的并行最短路径算法
Breadth First Search(广度优先搜索)
graph.bfs
从源节点开始的BFS遍历
Yen's K-最短路径
graph.yens
两个节点之间的前K条最短路径
Max Flow(最大流)
graph.max_flow
网络中的最大流量计算
FastPath
graph.fastpath
快速近似最短路径算法

Similarity

相似度

AlgorithmProcedureUse case
Node Similarity
graph.node_similarity
Find similar nodes based on shared neighbours
Filtered Node Similarity
graph.filtered_node_similarity
Node similarity with source/target filters
K-Nearest Neighbors
graph.knn
Find K most similar nodes
Filtered KNN
graph.filtered_knn
KNN with source/target filters
算法存储过程适用场景
Node Similarity(节点相似度)
graph.node_similarity
根据共享邻居查找相似节点
Filtered Node Similarity(过滤式节点相似度)
graph.filtered_node_similarity
带源/目标过滤的节点相似度计算
K-Nearest Neighbors(K近邻)
graph.knn
查找K个最相似的节点
Filtered KNN(过滤式K近邻)
graph.filtered_knn
带源/目标过滤的K近邻计算

Node Embeddings / ML

节点嵌入/机器学习

AlgorithmProcedureUse case
Fast Random Projection (FastRP)
graph.fastrp
Fast node embeddings
Node2Vec
graph.node2vec
Random-walk-based node embeddings
HashGNN
graph.hashgnn
GNN-inspired embeddings without training
GraphSAGE (train)
graph.graphsage_train
Train inductive node embeddings
GraphSAGE (predict)
graph.graphsage_predict
Predict with a trained GraphSAGE model
Node Classification (train)
graph.node_classification_train
Supervised node label prediction
Node Classification (predict)
graph.node_classification_predict
Apply trained node classifier

算法存储过程适用场景
Fast Random Projection(FastRP,快速随机投影)
graph.fastrp
快速生成节点嵌入
Node2Vec
graph.node2vec
基于随机游走的节点嵌入
HashGNN
graph.hashgnn
无需训练的GNN启发式嵌入
GraphSAGE(训练)
graph.graphsage_train
训练归纳式节点嵌入
GraphSAGE(预测)
graph.graphsage_predict
使用训练好的GraphSAGE模型进行预测
Node Classification(训练)
graph.node_classification_train
监督式节点标签预测训练
Node Classification(预测)
graph.node_classification_predict
应用训练好的节点分类器

Projection Configuration Reference

投影配置参考

json
{
  "project": {
    "nodeTables": [
      "DB.SCHEMA.TABLE_A",
      "DB.SCHEMA.TABLE_B"
    ],
    "relationshipTables": {
      "DB.SCHEMA.REL_TABLE": {
        "sourceTable": "DB.SCHEMA.TABLE_A",
        "targetTable": "DB.SCHEMA.TABLE_B",
        "orientation": "NATURAL"
      }
    }
  }
}
  • defaultTablePrefix
    — use when all tables are in the same schema
  • Multiple node/relationship tables supported — each maps to a different label/type
  • Extra columns become node/relationship properties (e.g.
    weight
    column for weighted paths)

json
{
  "project": {
    "nodeTables": [
      "DB.SCHEMA.TABLE_A",
      "DB.SCHEMA.TABLE_B"
    ],
    "relationshipTables": {
      "DB.SCHEMA.REL_TABLE": {
        "sourceTable": "DB.SCHEMA.TABLE_A",
        "targetTable": "DB.SCHEMA.TABLE_B",
        "orientation": "NATURAL"
      }
    }
  }
}
  • defaultTablePrefix
    ——当所有表都在同一个架构下时使用
  • 支持多个节点/关系表——每个表对应不同的标签/类型
  • 额外列将成为节点/关系属性(例如用于加权路径的
    weight
    列)

Write Configuration Reference

写入配置参考

json
{
  "write": [
    {
      "nodeLabel": "TABLE_A",
      "outputTable": "DB.SCHEMA.OUTPUT_TABLE",
      "nodeProperty": "score"
    }
  ]
}
  • nodeLabel
    — node table name without schema prefix
  • outputTable
    — created or overwritten
  • nodeProperty
    (optional) — which computed property to write if algorithm produces multiple
For relationship results (KNN, Node Similarity):
json
{
  "write": [
    {
      "relationshipType": "SIMILAR",
      "outputTable": "DB.SCHEMA.SIMILARITY_OUTPUT"
    }
  ]
}

json
{
  "write": [
    {
      "nodeLabel": "TABLE_A",
      "outputTable": "DB.SCHEMA.OUTPUT_TABLE",
      "nodeProperty": "score"
    }
  ]
}
  • nodeLabel
    ——不带架构前缀的节点表名称
  • outputTable
    ——将被创建或覆盖
  • nodeProperty
    (可选)——如果算法生成多个属性,指定要写入的计算属性
对于关系结果(如KNN、节点相似度):
json
{
  "write": [
    {
      "relationshipType": "SIMILAR",
      "outputTable": "DB.SCHEMA.SIMILARITY_OUTPUT"
    }
  ]
}

Common Patterns

常见模式

Chaining Algorithms

算法链式调用

Results write to tables — feed one algorithm's output into the next.
FUTURE TABLES
grant (done in setup) lets the app read tables it just created.
sql
-- Step 1: Run FastRP to generate embeddings
CALL Neo4j_Graph_Analytics.graph.fastrp('CPU_X64_XS', { ... });

-- Step 2: Run KNN on the embedding output
CALL Neo4j_Graph_Analytics.graph.knn('CPU_X64_XS', { ... });
结果写入表中——将一个算法的输出作为下一个算法的输入。设置过程中授予的
FUTURE TABLES
权限允许应用读取其刚刚创建的表。
sql
-- Step 1: Run FastRP to generate embeddings
CALL Neo4j_Graph_Analytics.graph.fastrp('CPU_X64_XS', { ... });

-- Step 2: Run KNN on the embedding output
CALL Neo4j_Graph_Analytics.graph.knn('CPU_X64_XS', { ... });

Using Views Instead of Renaming Columns

使用视图而非重命名列

Create views with required column names and supported data types. Convert categorical data to numerical scores.
sql
CREATE VIEW MY_SCHEMA.NODES_VIEW AS
  SELECT user_id AS nodeId, name, age
  FROM MY_SCHEMA.USERS;

CREATE VIEW MY_SCHEMA.RELS_VIEW AS
  SELECT from_user AS sourceNodeId, to_user AS targetNodeId, weight
  FROM MY_SCHEMA.CONNECTIONS;

创建包含所需列名和支持数据类型的视图。将分类数据转换为数值评分。
sql
CREATE VIEW MY_SCHEMA.NODES_VIEW AS
  SELECT user_id AS nodeId, name, age
  FROM MY_SCHEMA.USERS;

CREATE VIEW MY_SCHEMA.RELS_VIEW AS
  SELECT from_user AS sourceNodeId, to_user AS targetNodeId, weight
  FROM MY_SCHEMA.CONNECTIONS;

Troubleshooting

故障排除

ProblemSolution
Insufficient privileges
Check the app has
SELECT
on your tables and
CREATE TABLE
on the schema
Column nodeId not found
Your table is missing the required column — create a view that aliases it
Compute pool not available
The pool may still be starting up; wait a minute and retry
Algorithm returns no resultsCheck your node/relationship tables are not empty and projections are correct

问题解决方案
Insufficient privileges
(权限不足)
检查应用是否拥有表的
SELECT
权限以及架构的
CREATE TABLE
权限
Column nodeId not found
(未找到nodeId列)
你的表缺少必填列——创建视图将列别名改为nodeId
Compute pool not available
(计算池不可用)
计算池可能仍在启动中;等待一分钟后重试
算法无结果返回检查你的节点/关系表是否非空,且投影配置正确

Further Reading

扩展阅读