exploring-llm-clusters

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Exploring LLM clusters

探索LLM集群

Use this skill when investigating LLM analytics clusters — understanding what patterns exist in your AI/LLM traffic, comparing cluster behavior, and drilling into individual clusters.

当你需要研究LLM分析集群时使用此技能——了解AI/LLM流量中存在的模式，对比集群行为，并深入分析单个集群。

Tools

工具

Tool	Purpose
`posthog:llma-clustering-job-list`	List clustering job configurations for the team
`posthog:llma-clustering-job-get`	Get a specific clustering job by ID
`posthog:execute-sql`	Query cluster run events and compute metrics
`posthog:query-llm-traces-list`	Find traces belonging to a cluster
`posthog:query-llm-trace`	Inspect a specific trace in detail

工具	用途
`posthog:llma-clustering-job-list`	列出团队的集群任务配置
`posthog:llma-clustering-job-get`	通过ID获取特定集群任务
`posthog:execute-sql`	查询集群运行事件并计算指标
`posthog:query-llm-traces-list`	查找属于某个集群的追踪数据
`posthog:query-llm-trace`	详细检查特定追踪数据

How clustering works

集群工作原理

PostHog clusters LLM traces (or individual generations) by embedding similarity. A Temporal workflow runs periodically or on-demand, producing cluster events stored as

$ai_trace_clusters

(trace-level) or

$ai_generation_clusters

(generation-level).

Each cluster event contains:

$ai_clustering_run_id

— unique run identifier (format:

<team_id>_<level>_<YYYYMMDD>_<HHMMSS>[_<job_id>]

)

```
$ai_clustering_level
```
—
```
"trace"
```
or
```
"generation"
```
```
$ai_window_start
```
/
```
$ai_window_end
```
— time window analyzed
```
$ai_total_items_analyzed
```
— number of traces/generations processed
```
$ai_clusters
```
— JSON array of cluster objects
```
$ai_clustering_params
```
— algorithm parameters used

PostHog通过嵌入相似度对LLM追踪数据（或单个生成内容）进行聚类。Temporal工作流定期或按需运行，生成存储为

$ai_trace_clusters

（追踪级别）或

$ai_generation_clusters

（生成级别）的集群事件。

每个集群事件包含：

$ai_clustering_run_id

— 唯一运行标识符（格式：

<team_id>_<level>_<YYYYMMDD>_<HHMMSS>[_<job_id>]

）

```
$ai_clustering_level
```
—
```
"trace"
```
或
```
"generation"
```
```
$ai_window_start
```
/
```
$ai_window_end
```
— 分析的时间窗口
```
$ai_total_items_analyzed
```
— 处理的追踪/生成内容数量
```
$ai_clusters
```
— 集群对象的JSON数组
```
$ai_clustering_params
```
— 使用的算法参数

Cluster object shape (inside

$ai_clusters

)

$ai_clusters

中的集群对象结构

json

{
  "cluster_id": 0,
  "size": 42,
  "title": "User authentication flows",
  "description": "Traces involving login, signup, and token refresh operations",
  "traces": {
    "<trace_or_generation_id>": {
      "distance_to_centroid": 0.123,
      "rank": 0,
      "x": -2.34,
      "y": 1.56,
      "timestamp": "2026-03-28T10:00:00Z",
      "trace_id": "abc-123",
      "generation_id": "gen-456"
    }
  },
  "centroid_x": -2.1,
  "centroid_y": 1.4
}

```
cluster_id: -1
```
is the noise/outlier cluster (items that didn't fit any cluster)
Items in
```
traces
```
are keyed by trace ID (trace-level) or generation event UUID (generation-level)
```
rank
```
orders items by proximity to centroid (0 = closest)
```
x
```
,
```
y
```
are 2D coordinates for visualization (UMAP/PCA/t-SNE reduced)

json

{
  "cluster_id": 0,
  "size": 42,
  "title": "User authentication flows",
  "description": "Traces involving login, signup, and token refresh operations",
  "traces": {
    "<trace_or_generation_id>": {
      "distance_to_centroid": 0.123,
      "rank": 0,
      "x": -2.34,
      "y": 1.56,
      "timestamp": "2026-03-28T10:00:00Z",
      "trace_id": "abc-123",
      "generation_id": "gen-456"
    }
  },
  "centroid_x": -2.1,
  "centroid_y": 1.4
}

```
cluster_id: -1
```
是噪声/异常值集群（不适合任何集群的项目）
```
traces
```
中的项目以追踪ID（追踪级别）或生成事件UUID（生成级别）作为键
```
rank
```
按与质心的接近程度对项目排序（0 = 最接近）
```
x
```
,
```
y
```
是用于可视化的2D坐标（UMAP/PCA/t-SNE降维后）

Clustering jobs

集群任务

Each team can have up to 5 clustering jobs. A job defines:

name — human-readable label
analysis_level —
```
"trace"
```
or
```
"generation"
```
event_filters — property filters scoping which traces are included
enabled — whether the job runs on schedule

Default jobs named

"Default - trace"

and

"Default - generation"

are auto-created and disabled when a custom job is created for the same level.

每个团队最多可拥有5个集群任务。任务定义包括：

name — 易读标签
analysis_level —
```
"trace"
```
或
```
"generation"
```
event_filters — 限定纳入哪些追踪数据的属性过滤器
enabled — 任务是否按计划运行

当为同一级别创建自定义任务时，名为

"Default - trace"

和

"Default - generation"

的默认任务会自动创建并禁用。

Workflow: explore clusters

工作流：探索集群

Step 1 — List recent clustering runs

步骤1 — 列出近期集群运行记录

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
    JSONExtractString(properties, '$ai_clustering_level') as level,
    JSONExtractString(properties, '$ai_window_start') as window_start,
    JSONExtractString(properties, '$ai_window_end') as window_end,
    JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
    timestamp
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
    AND timestamp >= now() - INTERVAL 7 DAY
ORDER BY timestamp DESC
LIMIT 10

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
    JSONExtractString(properties, '$ai_clustering_level') as level,
    JSONExtractString(properties, '$ai_window_start') as window_start,
    JSONExtractString(properties, '$ai_window_end') as window_end,
    JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
    timestamp
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
    AND timestamp >= now() - INTERVAL 7 DAY
ORDER BY timestamp DESC
LIMIT 10

Step 2 — Get clusters from a specific run

步骤2 — 获取特定运行记录的集群

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
    JSONExtractString(properties, '$ai_clustering_level') as level,
    JSONExtractString(properties, '$ai_clustering_job_id') as job_id,
    JSONExtractString(properties, '$ai_clustering_job_name') as job_name,
    JSONExtractString(properties, '$ai_window_start') as window_start,
    JSONExtractString(properties, '$ai_window_end') as window_end,
    JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
    JSONExtractRaw(properties, '$ai_clusters') as clusters,
    JSONExtractRaw(properties, '$ai_clustering_params') as params
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
    AND JSONExtractString(properties, '$ai_clustering_run_id') = '<run_id>'
LIMIT 1

The

clusters

field is a JSON array. Parse it to see cluster titles, sizes, and descriptions.

Important: The clusters JSON can be very large (thousands of trace IDs with coordinates). When the result is too large for inline display, it auto-persists to a file. Use

print_clusters.py

from scripts/ to get a readable summary.

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
    JSONExtractString(properties, '$ai_clustering_level') as level,
    JSONExtractString(properties, '$ai_clustering_job_id') as job_id,
    JSONExtractString(properties, '$ai_clustering_job_name') as job_name,
    JSONExtractString(properties, '$ai_window_start') as window_start,
    JSONExtractString(properties, '$ai_window_end') as window_end,
    JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
    JSONExtractRaw(properties, '$ai_clusters') as clusters,
    JSONExtractRaw(properties, '$ai_clustering_params') as params
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
    AND JSONExtractString(properties, '$ai_clustering_run_id') = '<run_id>'
LIMIT 1

clusters

字段是JSON数组。解析它可以查看集群标题、大小和描述。

重要提示： 集群JSON可能非常大（包含数千个带坐标的追踪ID）。当结果太大无法内联显示时，会自动保存到文件。使用scripts/中的

print_clusters.py

获取可读摘要。

Step 3 — Compute metrics for clusters

步骤3 — 计算集群指标

For trace-level clusters, compute cost/latency/token metrics:

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_trace_id') as trace_id,
    sum(toFloat(properties.$ai_total_cost_usd)) as total_cost,
    max(toFloat(properties.$ai_latency)) as latency,
    sum(toInt(properties.$ai_input_tokens)) as input_tokens,
    sum(toInt(properties.$ai_output_tokens)) as output_tokens,
    countIf(properties.$ai_is_error = 'true') as error_count
FROM events
WHERE event IN ('$ai_generation', '$ai_embedding', '$ai_span')
    AND timestamp >= parseDateTimeBestEffort('<window_start>')
    AND timestamp <= parseDateTimeBestEffort('<window_end>')
    AND JSONExtractString(properties, '$ai_trace_id') IN ('<trace_id_1>', '<trace_id_2>', ...)
GROUP BY trace_id

For generation-level clusters, match by event UUID:

sql

posthog:execute-sql
SELECT
    toString(uuid) as generation_id,
    toFloat(properties.$ai_total_cost_usd) as cost,
    toFloat(properties.$ai_latency) as latency,
    toInt(properties.$ai_input_tokens) as input_tokens,
    toInt(properties.$ai_output_tokens) as output_tokens,
    if(properties.$ai_is_error = 'true', 1, 0) as is_error
FROM events
WHERE event = '$ai_generation'
    AND timestamp >= parseDateTimeBestEffort('<window_start>')
    AND timestamp <= parseDateTimeBestEffort('<window_end>')
    AND toString(uuid) IN ('<gen_uuid_1>', '<gen_uuid_2>', ...)

对于追踪级集群，计算成本/延迟/令牌指标：

sql

posthog:execute-sql
SELECT
    JSONExtractString(properties, '$ai_trace_id') as trace_id,
    sum(toFloat(properties.$ai_total_cost_usd)) as total_cost,
    max(toFloat(properties.$ai_latency)) as latency,
    sum(toInt(properties.$ai_input_tokens)) as input_tokens,
    sum(toInt(properties.$ai_output_tokens)) as output_tokens,
    countIf(properties.$ai_is_error = 'true') as error_count
FROM events
WHERE event IN ('$ai_generation', '$ai_embedding', '$ai_span')
    AND timestamp >= parseDateTimeBestEffort('<window_start>')
    AND timestamp <= parseDateTimeBestEffort('<window_end>')
    AND JSONExtractString(properties, '$ai_trace_id') IN ('<trace_id_1>', '<trace_id_2>', ...)
GROUP BY trace_id

对于生成级集群，按事件UUID匹配：

sql

posthog:execute-sql
SELECT
    toString(uuid) as generation_id,
    toFloat(properties.$ai_total_cost_usd) as cost,
    toFloat(properties.$ai_latency) as latency,
    toInt(properties.$ai_input_tokens) as input_tokens,
    toInt(properties.$ai_output_tokens) as output_tokens,
    if(properties.$ai_is_error = 'true', 1, 0) as is_error
FROM events
WHERE event = '$ai_generation'
    AND timestamp >= parseDateTimeBestEffort('<window_start>')
    AND timestamp <= parseDateTimeBestEffort('<window_end>')
    AND toString(uuid) IN ('<gen_uuid_1>', '<gen_uuid_2>', ...)

Step 4 — Drill into specific traces

步骤4 — 深入分析特定追踪数据

Once you've identified interesting clusters, use the trace tools to inspect individual traces:

json

posthog:query-llm-trace
{
  "traceId": "<trace_id_from_cluster>",
  "dateRange": {"date_from": "<window_start>", "date_to": "<window_end>"}
}

一旦确定了感兴趣的集群，使用追踪工具检查单个追踪数据：

json

posthog:query-llm-trace
{
  "traceId": "<trace_id_from_cluster>",
  "dateRange": {"date_from": "<window_start>", "date_to": "<window_end>"}
}

Investigation patterns

研究模式

"What kinds of LLM usage do we have?"

“我们有哪些类型的LLM使用场景？”

List recent clustering runs (Step 1)
Load the latest run's clusters (Step 2)
Review cluster titles and descriptions — each represents a distinct usage pattern
Compare cluster sizes to understand traffic distribution

列出近期集群运行记录（步骤1）
加载最新运行记录的集群（步骤2）
查看集群标题和描述——每个代表一种独特的使用模式
对比集群大小以了解流量分布

"Which cluster is most expensive / slowest?"

“哪个集群成本最高/速度最慢？”

Load clusters from a run (Step 2)
Extract trace IDs from each cluster
Compute metrics per cluster (Step 3)
Aggregate:
```
avg(cost)
```
,
```
avg(latency)
```
,
```
sum(cost)
```
per cluster
Compare across clusters

加载某一运行记录的集群（步骤2）
提取每个集群的追踪ID
计算每个集群的指标（步骤3）
聚合：每个集群的
```
avg(cost)
```
、
```
avg(latency)
```
、
```
sum(cost)
```
跨集群对比

"What's in this cluster?"

“这个集群里有什么内容？”

Load the cluster's traces (from the
```
traces
```
field)
Sort by
```
rank
```
(closest to centroid = most representative)
Inspect the top 3-5 traces via
```
query-llm-trace
```
to understand the pattern
Check the cluster
```
title
```
and
```
description
```
for the AI-generated summary

加载集群的追踪数据（来自
```
traces
```
字段）
按
```
rank
```
排序（最接近质心=最具代表性）
通过
```
query-llm-trace
```
检查前3-5条追踪数据以了解模式
查看集群的
```
title
```
和
```
description
```
获取AI生成的摘要

"Are there error-heavy clusters?"

“是否存在错误率高的集群？”

Compute metrics (Step 3) with
```
error_count
```
Calculate error rate per cluster:
```
items_with_errors / total_items
```
Focus on clusters with high error rates
Drill into errored traces to find root causes

计算包含
```
error_count
```
的指标（步骤3）
计算每个集群的错误率：
```
items_with_errors / total_items
```
重点关注错误率高的集群
深入分析出错的追踪数据以找到根本原因

"How do clusters compare across runs?"

“不同运行记录的集群如何对比？”

List multiple runs (Step 1)
Load clusters from each run
Compare cluster titles — similar titles across runs indicate stable patterns
Track cluster size changes to detect shifts in traffic patterns

列出多个运行记录（步骤1）
加载每个运行记录的集群
对比集群标题——不同运行记录中相似的标题表示稳定的模式
跟踪集群大小变化以检测流量模式的转变

Constructing UI links

构建UI链接

Clusters overview:

https://app.posthog.com/llm-analytics/clusters

Specific run:

https://app.posthog.com/llm-analytics/clusters/<url_encoded_run_id>

Cluster detail:

https://app.posthog.com/llm-analytics/clusters/<url_encoded_run_id>/<cluster_id>

Always surface these links so the user can verify visually in the PostHog UI.

集群概览:

https://app.posthog.com/llm-analytics/clusters

特定运行记录:

https://app.posthog.com/llm-analytics/clusters/<url_encoded_run_id>

集群详情:

https://app.posthog.com/llm-analytics/clusters/<url_encoded_run_id>/<cluster_id>

始终提供这些链接，以便用户可以在PostHog UI中直观验证。

Tips

提示

Always set a time range in SQL queries — cluster events without time bounds are slow
Start with run listing to orient, then drill into specific clusters
Cluster titles and descriptions are AI-generated summaries — verify by inspecting traces
The noise cluster (
```
cluster_id: -1
```
) contains outliers that didn't fit any pattern
Use
```
llma-clustering-job-list
```
to understand what clustering configs are active
Trace IDs in clusters can be used directly with
```
query-llm-trace
```
for deep inspection
For large clusters, inspect the top-ranked traces (closest to centroid) for representative examples

始终在SQL查询中设置时间范围——没有时间限制的集群事件查询速度很慢
先从列出运行记录开始定位，再深入分析特定集群
集群标题和描述是AI生成的摘要——通过检查追踪数据进行验证
噪声集群（
```
cluster_id: -1
```
）包含不适合任何模式的异常值
使用
```
llma-clustering-job-list
```
了解哪些集群配置处于激活状态
集群中的追踪ID可直接用于
```
query-llm-trace
```
进行深度检查
对于大型集群，检查排名靠前的追踪数据（最接近质心）以获取代表性示例

exploring-llm-clusters

Original

Translation

Exploring LLM clusters

探索LLM集群

Tools

工具

How clustering works

集群工作原理

Cluster object shape (inside
`$ai_clusters`
)

`$ai_clusters`
中的集群对象结构

Clustering jobs

集群任务

Workflow: explore clusters

工作流：探索集群

Step 1 — List recent clustering runs

步骤1 — 列出近期集群运行记录

Step 2 — Get clusters from a specific run

步骤2 — 获取特定运行记录的集群

Step 3 — Compute metrics for clusters

步骤3 — 计算集群指标

Step 4 — Drill into specific traces

步骤4 — 深入分析特定追踪数据

Investigation patterns

研究模式

"What kinds of LLM usage do we have?"

“我们有哪些类型的LLM使用场景？”

"Which cluster is most expensive / slowest?"

“哪个集群成本最高/速度最慢？”

"What's in this cluster?"

“这个集群里有什么内容？”

"Are there error-heavy clusters?"

“是否存在错误率高的集群？”

"How do clusters compare across runs?"

“不同运行记录的集群如何对比？”

Constructing UI links

构建UI链接

Tips

提示

exploring-llm-clusters

Original

Translation

Exploring LLM clusters

探索LLM集群

Tools

工具

How clustering works

集群工作原理

Cluster object shape (inside $ai_clusters)

$ai_clusters中的集群对象结构

Clustering jobs

集群任务

Workflow: explore clusters

工作流：探索集群

Step 1 — List recent clustering runs

步骤1 — 列出近期集群运行记录

Step 2 — Get clusters from a specific run

步骤2 — 获取特定运行记录的集群

Step 3 — Compute metrics for clusters

步骤3 — 计算集群指标

Step 4 — Drill into specific traces

步骤4 — 深入分析特定追踪数据

Investigation patterns

研究模式

"What kinds of LLM usage do we have?"

“我们有哪些类型的LLM使用场景？”

"Which cluster is most expensive / slowest?"

“哪个集群成本最高/速度最慢？”

"What's in this cluster?"

“这个集群里有什么内容？”

"Are there error-heavy clusters?"

“是否存在错误率高的集群？”

"How do clusters compare across runs?"

“不同运行记录的集群如何对比？”

Constructing UI links

构建UI链接

Tips

提示

Cluster object shape (inside
`$ai_clusters`
)

`$ai_clusters`
中的集群对象结构