feature-usage-feed

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Building a feature usage feed via LLM evals

通过LLM评估构建功能使用信息流

Some PostHog features (group session summaries, single session summaries, replay AI search, error tracking AI debug, etc.) generate hundreds or thousands of LLM traces per week. Reading them by hand is not feasible. This skill covers the end-to-end pattern for turning that trace volume into a live Slack feed of canonical use cases — what users are actually doing with the feature.
The workflow is mixed, and leans UI. Trace inspection and filter discovery (steps 1-2) are MCP-driven. Eval creation, dry-running, and enabling (steps 4-5) are MCP-driven when
posthog:llma-evaluation-*
tools are exposed to your agent — but they often aren't, in which case fall back to the UI (Data pipeline → destinations for the alert is always UI). Each step flags its UI fallback. Expect to finish in the UI even when you start from chat.
部分PostHog功能(如群组会话摘要、单会话摘要、回放AI搜索、错误跟踪AI调试等)每周会生成数百甚至数千条LLM跟踪记录。手动阅读这些记录并不现实。本方案涵盖了将大量跟踪记录转换为Slack实时信息流的端到端模式,展示用户实际如何使用该功能。
工作流为混合模式,依赖UI操作。跟踪记录检查和筛选规则发现(步骤1-2)由MCP驱动。评估创建、试运行和启用(步骤4-5)在
posthog:llma-evaluation-*
工具对Agent开放时由MCP驱动——但这些工具通常未开放,此时需回退到UI操作(数据管道→告警目标始终通过UI设置)。每个步骤都标注了UI回退方案。即使从聊天开始,最终也需要在UI中完成操作。

When to use

适用场景

  • "How are people actually using [feature X] in production?"
  • "Can we identify the canonical use cases for [feature X] so we can write better docs / prioritize improvements?"
  • "I want a Slack feed of representative usage examples without manually skimming traces."
  • "Set up a feed of use cases for [feature X] in #team-[area]-usage."
If the user just wants to debug a single trace or tune an existing eval, redirect to
exploring-llm-traces
or
exploring-llm-evaluations
instead.
  • “用户在生产环境中实际如何使用[功能X]?”
  • “我们能否识别[功能X]的标准用例,以便编写更优质的文档/优先改进方向?”
  • “我想要一个Slack信息流展示代表性的使用示例,无需手动浏览跟踪记录。”
  • “在#team-[领域]-usage频道设置[功能X]的用例信息流。”
如果用户只是想要调试单条跟踪记录或调整现有评估,请引导至
exploring-llm-traces
exploring-llm-evaluations
方案。

Two filter patterns

两种筛选模式

This skill supports two different ways to scope an eval to "the feature you care about":
Pattern A — Feature-native trace_id prefix. For standalone features that emit their own
$ai_trace_id
pattern (e.g.
session-summary:group:
,
replay-search:
, error-tracking-specific flows). Filter on the prefix.
Pattern B — PostHog AI agent mode. For features the user interacts with via PostHog AI in a specific agent mode (error tracking, product analytics, session replay, SQL, flags, surveys, LLM analytics). Filter on
ai_product = 'posthog_ai' AND agent_mode = '<mode>'
. This requires PR #55160 (merged April 2026) to be deployed, which threads
agent_mode
and
supermode
onto every
$ai_generation
emitted by the chat agent loop. A useful ergonomic side-effect:
agent_mode IS NOT NULL
is a reliable "user-facing chat turn" filter — batch jobs and tool-internal LLM calls go through different code paths and have
agent_mode=null
, so they're excluded for free.
If the user asks "what are users trying to DO in [ET / replay / SQL / flags / surveys] mode of PostHog AI", that's Pattern B. If they ask "what use cases does [standalone feature] cover", that's Pattern A. Pick the pattern first — the prompt, filter, and Slack channel naming all follow from it.
本方案支持两种将评估范围限定为“你关注的功能”的方式:
模式A — 功能原生trace_id前缀:适用于会生成专属
$ai_trace_id
模式的独立功能(如
session-summary:group:
replay-search:
、错误跟踪特定流程)。通过前缀进行筛选。
模式B — PostHog AI Agent模式:适用于用户通过PostHog AI特定Agent模式(错误跟踪、产品分析、会话回放、SQL、功能标志、调研、LLM分析)交互的功能。筛选条件为
ai_product = 'posthog_ai' AND agent_mode = '<mode>'
。这需要部署PR #55160(2026年4月合并),该PR会将
agent_mode
supermode
传递到聊天Agent循环发送的每条
$ai_generation
事件中。一个实用的附加效果:
agent_mode IS NOT NULL
是可靠的“用户面向聊天轮次”筛选条件——批量任务和工具内部LLM调用通过不同代码路径执行,且
agent_mode=null
,因此会被自动排除。
如果用户询问“在PostHog AI的[错误跟踪/回放/SQL/功能标志/调研]模式下,用户试图做什么”,则使用模式B。如果用户询问“[独立功能]涵盖哪些用例”,则使用模式A。先选择模式——提示词、筛选条件和Slack频道命名都将以此为基础。

Prerequisites

前置条件

RequirementHow to verify
(Pattern A) Feature emits
$ai_generation
events with a stable
$ai_trace_id
pattern
posthog:execute-sql
for distinct
$ai_trace_id
prefixes
(Pattern B)
agent_mode
property is present on recent
$ai_generation
events
posthog:execute-sql
group-by
properties.agent_mode
on recent
ai_product='posthog_ai'
events. Null bucket is normal (batch jobs + tool-internal calls) — you want non-null coverage across the modes you care about.
$session_id
is attached to the
$ai_generation
events (links trace to trigger session)
posthog:execute-sql
for
countIf($session_id IS NOT NULL) / count()
$session_id
is also attached to the
$ai_evaluation
events (lets the Slack alert link to the session)
Same query but on
$ai_evaluation
events after the eval has run once
User has organisation-level AI data processing approvalRequired for
llm_judge
evaluations and the eval summary tool
If
$session_id
is missing on either event type, file a backend fix before continuing — there is no UI workaround. The session-summary feature has a worked example of the threading pattern in PR #54952. For Pattern B, the agent-mode threading pattern is in PR #55160.
要求验证方式
(模式A)功能会发送带有稳定
$ai_trace_id
模式的
$ai_generation
事件
使用
posthog:execute-sql
查询不同的
$ai_trace_id
前缀
(模式B)近期
$ai_generation
事件包含
agent_mode
属性
使用
posthog:execute-sql
对近期
ai_product='posthog_ai'
事件按
properties.agent_mode
分组统计。空值分组是正常的(批量任务+工具内部调用)——你需要关注的模式需有非空覆盖。
$session_id
已附加到
$ai_generation
事件(将跟踪记录关联到触发会话)
使用
posthog:execute-sql
查询
countIf($session_id IS NOT NULL) / count()
$session_id
也已附加到
$ai_evaluation
事件(使Slack告警可链接到会话)
评估运行一次后,对
$ai_evaluation
事件执行相同查询
用户拥有组织级AI数据处理权限这是使用
llm_judge
评估和评估摘要工具的必要条件
如果任意一种事件类型缺少
$session_id
,请先提交后端修复再继续——没有UI替代方案。会话摘要功能在PR #54952中提供了会话ID传递模式的示例。对于模式B,Agent模式传递模式在PR #55160中。

Tools

工具

ToolPurpose
posthog:query-llm-traces-list
Find sample traces matching the feature's
$ai_trace_id
pattern
posthog:query-llm-trace
Inspect a specific trace's contents end-to-end
posthog:execute-sql
Verify trace volume, session_id coverage, eval result distributions
posthog:llma-evaluation-create
(often unexposed — UI fallback: LLM analytics → Evaluations → New) Create the LLM-judge eval (disabled at first)
posthog:llma-evaluation-run
(often unexposed — UI fallback: the eval's detail page has a "Run on event" button) Dry-run the eval against specific generations during prompt iteration
posthog:llma-evaluation-update
(often unexposed — UI fallback: edit the eval in LLM analytics → Evaluations) Tweak the prompt / enable when ready
posthog:llma-evaluation-summary-create
(often unexposed — UI fallback: the eval detail page has a "Summarize results" button) After the feed is running, get an AI summary of pass/N/A patterns to validate signal quality
posthog:workflows-list
/
posthog:workflows-get
(often unexposed — UI: Data pipeline → Workflows) Browse existing workflow configs — useful for cloning an existing feed's structure when setting up a new one. Read-only; no create/update tool is exposed yet, so step 6's Slack workflow setup is UI-only.
Before starting, check which of the
posthog:llma-evaluation-*
tools are actually exposed in your agent's MCP tool set.
If they aren't loaded, treat steps 4-5 as UI walkthroughs rather than tool calls.
工具用途
posthog:query-llm-traces-list
查找匹配功能
$ai_trace_id
模式的示例跟踪记录
posthog:query-llm-trace
端到端查看特定跟踪记录的内容
posthog:execute-sql
验证跟踪记录数量、session_id覆盖率、评估结果分布
posthog:llma-evaluation-create
通常未开放 — UI回退方案:LLM分析→评估→新建)创建LLM-judge评估(初始状态为禁用)
posthog:llma-evaluation-run
通常未开放 — UI回退方案:评估详情页有“在事件上运行”按钮)在提示词迭代期间,针对特定生成结果试运行评估
posthog:llma-evaluation-update
通常未开放 — UI回退方案:在LLM分析→评估中编辑评估)调整提示词/准备就绪后启用评估
posthog:llma-evaluation-summary-create
通常未开放 — UI回退方案:评估详情页有“总结结果”按钮)信息流运行后,获取AI生成的通过/不适用模式摘要,验证信号质量
posthog:workflows-list
/
posthog:workflows-get
通常未开放 — UI方案:数据管道→工作流)浏览现有工作流配置——在设置新信息流时,克隆现有信息流的结构非常有用。仅支持只读;目前未开放创建/更新工具,因此步骤6的Slack工作流设置只能通过UI完成。
开始前,请检查Agent的MCP工具集中实际开放了哪些
posthog:llma-evaluation-*
工具
。如果未加载这些工具,请将步骤4-5视为UI操作指南,而非工具调用。

Workflow

工作流

Step 1 — Identify the filter

步骤1 — 确定筛选条件

Pattern A (feature-native trace_id prefix): find the prefix that maps to your feature.
sql
SELECT
    splitByChar(':', coalesce(properties.$ai_trace_id, ''))[1] AS root,
    splitByChar(':', coalesce(properties.$ai_trace_id, ''))[2] AS subtype,
    count() AS events
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
    AND event = '$ai_generation'
    AND properties.$ai_trace_id IS NOT NULL
GROUP BY root, subtype
ORDER BY events DESC
LIMIT 25
Note:
coalesce(..., '')
is load-bearing —
splitByChar
on a nullable column errors out in HogQL otherwise.
Pattern B (PostHog AI agent mode): verify coverage and volume for the mode you're targeting.
sql
SELECT
    properties.agent_mode AS agent_mode,
    properties.supermode AS supermode,
    count() AS events,
    count(DISTINCT properties.$ai_trace_id) AS traces
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
    AND event = '$ai_generation'
    AND properties.ai_product = 'posthog_ai'
GROUP BY agent_mode, supermode
ORDER BY events DESC
LIMIT 20
Expected values for
agent_mode
:
error_tracking
,
product_analytics
,
sql
,
session_replay
,
flags
,
survey
,
llm_analytics
,
null
. Null ≈ batch jobs + tool-internal calls (not user chat).
supermode='plan'
splits planning turns from execution turns — worth calling out separately if your feed is about plan-mode specifically.
Record the mode + rough volume. Low-volume modes (<100 events/day) will produce a trickle-feed that's hard to validate early; high-volume modes (>1k/day) may need sampling to avoid Slack flooding. See the "Tips" section on sampling.
模式A(功能原生trace_id前缀):找到与你的功能匹配的前缀。
sql
SELECT
    splitByChar(':', coalesce(properties.$ai_trace_id, ''))[1] AS root,
    splitByChar(':', coalesce(properties.$ai_trace_id, ''))[2] AS subtype,
    count() AS events
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
    AND event = '$ai_generation'
    AND properties.$ai_trace_id IS NOT NULL
GROUP BY root, subtype
ORDER BY events DESC
LIMIT 25
注意:
coalesce(..., '')
是必需的——否则HogQL中对可空列使用
splitByChar
会报错。
模式B(PostHog AI Agent模式):验证目标模式的覆盖范围和数量。
sql
SELECT
    properties.agent_mode AS agent_mode,
    properties.supermode AS supermode,
    count() AS events,
    count(DISTINCT properties.$ai_trace_id) AS traces
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
    AND event = '$ai_generation'
    AND properties.ai_product = 'posthog_ai'
GROUP BY agent_mode, supermode
ORDER BY events DESC
LIMIT 20
agent_mode
的预期值:
error_tracking
product_analytics
sql
session_replay
flags
survey
llm_analytics
null
。空值≈批量任务+工具内部调用(非用户聊天)。
supermode='plan'
会将规划轮次与执行轮次分开——如果你的信息流专门针对规划模式,值得单独提及。
记录模式和大致数量。低量模式(<100事件/天)会产生难以早期验证的涓流式信息流;高量模式(>1k事件/天)可能需要抽样以避免Slack消息泛滥。请查看“提示”部分的抽样说明。

Step 2 — Pull a handful of sample traces

步骤2 — 获取少量示例跟踪记录

Use these for prompt iteration in step 4.
Pattern A:
json
posthog:query-llm-traces-list
{
  "properties": [
    { "type": "event", "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix-here>" }
  ],
  "limit": 10,
  "dateRange": { "date_from": "-2d" },
  "randomOrder": true
}
Pattern B:
json
posthog:query-llm-traces-list
{
  "properties": [
    { "type": "event", "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
    { "type": "event", "key": "agent_mode", "operator": "exact", "value": "<mode-here>" }
  ],
  "limit": 10,
  "dateRange": { "date_from": "-2d" },
  "randomOrder": true
}
randomOrder: true
matters — recency bias produces a non-representative sample. Pick 5-10 traces to test against.
Output size warning:
query-llm-traces-list
with
limit: 10
routinely returns 3-6MB of JSON (full input/output per generation). This will blow your context window. Immediately delegate the summarization to a subagent the moment you see the "result exceeds maximum allowed tokens" error — ask the subagent to extract, per trace: the trace id, the first user message (truncated to ~300 chars), the sampled
$current_url
, and a one-sentence description of what the conversation was about. Don't try to read the raw file in-line.
Watch for topic drift in Pattern B samples. The
agent_mode
tag reflects the user's mode selection at the time of the turn — but chat state retains the mode even if the user drifts off-topic within the same conversation (e.g. user selected "error tracking" mode, then asked an unrelated pricing question three turns later). Your eval prompt's classification step needs to be permissive about topic-drift: PASS should mean "user is doing something recognizably in-scope for this mode", FAIL should catch the off-topic drift. If you don't, your feed will include irrelevant PASS entries that happen to carry the mode tag.
这些记录将用于步骤4的提示词迭代。
模式A
json
posthog:query-llm-traces-list
{
  "properties": [
    { "type": "event", "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix-here>" }
  ],
  "limit": 10,
  "dateRange": { "date_from": "-2d" },
  "randomOrder": true
}
模式B
json
posthog:query-llm-traces-list
{
  "properties": [
    { "type": "event", "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
    { "type": "event", "key": "agent_mode", "operator": "exact", "value": "<mode-here>" }
  ],
  "limit": 10,
  "dateRange": { "date_from": "-2d" },
  "randomOrder": true
}
randomOrder: true
非常重要——近期偏差会导致样本不具代表性。选择5-10条跟踪记录进行测试。
输出大小警告
query-llm-traces-list
设置
limit: 10
通常会返回3-6MB的JSON(每条生成结果的完整输入/输出),这会超出上下文窗口。一旦看到“结果超过最大允许token数”错误,请立即将摘要任务委托给子Agent——要求子Agent为每条跟踪记录提取:跟踪ID、第一条用户消息(截断至约300字符)、抽样的
$current_url
,以及一句关于对话内容的描述。不要尝试直接读取原始文件。
注意模式B样本中的主题偏移
agent_mode
标签反映了用户在该轮次选择的模式——但聊天状态会保留模式,即使用户在同一场对话中偏离主题(例如,用户选择了“错误跟踪”模式,然后在三轮后询问了无关的定价问题)。你的评估提示词分类步骤需要允许一定程度的主题偏移:PASS表示“用户正在做与该模式明显相关的事情”,FAIL表示捕获到偏离主题的内容。否则,你的信息流会包含带有模式标签但无关的PASS条目。

Step 3 — Draft the LLM-judge prompt

步骤3 — 编写LLM-judge提示词

The prompt has two responsibilities: (a) classify the trace as relevant or not, (b) produce reasoning text that is directly postable to Slack (no preamble, no meta-description). The reasoning field becomes the Slack message body.
Template:
text
You are analyzing a PostHog [FEATURE NAME] trace to extract its real use case.
Your reasoning text will be posted directly to a Slack channel as a notification.
Write it as a short, ready-to-post message — no preamble, no meta-description.

Step 1 — Classification:
- PASS = this trace is the [feature kind] you care about
- FAIL = a different LLM call or a false match
- N/A = ambiguous from the trace alone

Step 2 — Reasoning (only matters if PASS). Write 2-3 sentences in this exact format:

"[OPENER] [what they targeted/filtered for]. They were
trying to [understand X / debug Y / find Z]. The result surfaced [key pattern
or finding]."

Your output MUST start with the exact phrase "[OPENER]". No other opening is allowed.

Rules:
- No "This is a [feature]..." or "The input contains..." preamble
- No JSON, field names, system-prompt references, or meta-description
- Concrete > generic. "users hitting error tracking for the first time" beats "user behavior"
- If you cannot infer one of the three pieces from the trace, write "(unclear from trace)" in that slot — do not guess
Pick an
[OPENER]
that matches how users actually interact with the feature.
The forced opener is load-bearing (it prevents the model from drifting into "this trace is a..." meta-description), but the exact verb has to fit the interaction:
Feature / modeOPENER
Session summary (group / single)
A user ran a summary on
Replay AI search
A user searched replays for
PostHog AI in error tracking mode
A user asked PostHog AI about
PostHog AI in session replay mode
A user asked PostHog AI about
PostHog AI in SQL mode
A user asked PostHog AI to write SQL for
Note:
supermode='plan'
is a sub-filter that layers on top of an
agent_mode
row — it's not its own row. If you want plan-mode-only, filter
agent_mode='<mode>' AND supermode='plan'
and pick an opener like
"A user asked PostHog AI to plan"
.
If you force
"A user ran"
on a chat-based feature, the model will produce awkward contortions ("A user ran a question about...") that read wrong in Slack. The forced-opener pattern is the mechanism — the specific phrase is per-feature.
The negative example list ("No 'This is a...' preamble", etc.) is load-bearing regardless of opener. Don't remove it.
提示词有两个职责:(a) 将跟踪记录分类为相关或不相关,(b) 生成可直接发布到Slack的推理文本(无需开场白,无需元描述)。推理字段将成为Slack消息内容。
模板:
text
You are analyzing a PostHog [FEATURE NAME] trace to extract its real use case.
Your reasoning text will be posted directly to a Slack channel as a notification.
Write it as a short, ready-to-post message — no preamble, no meta-description.

Step 1 — Classification:
- PASS = this trace is the [feature kind] you care about
- FAIL = a different LLM call or a false match
- N/A = ambiguous from the trace alone

Step 2 — Reasoning (only matters if PASS). Write 2-3 sentences in this exact format:

"[OPENER] [what they targeted/filtered for]. They were
trying to [understand X / debug Y / find Z]. The result surfaced [key pattern
or finding]."

Your output MUST start with the exact phrase "[OPENER]". No other opening is allowed.

Rules:
- No "This is a [feature]..." or "The input contains..." preamble
- No JSON, field names, system-prompt references, or meta-description
- Concrete > generic. "users hitting error tracking for the first time" beats "user behavior"
- If you cannot infer one of the three pieces from the trace, write "(unclear from trace)" in that slot — do not guess
选择与用户实际交互方式匹配的
[OPENER]
。强制开场白是必需的(防止模型偏离到“这是一条...”的元描述),但具体动词必须符合交互场景:
功能/模式开场白
会话摘要(群组/单个)
A user ran a summary on
回放AI搜索
A user searched replays for
PostHog AI错误跟踪模式
A user asked PostHog AI about
PostHog AI会话回放模式
A user asked PostHog AI about
PostHog AI SQL模式
A user asked PostHog AI to write SQL for
注意:
supermode='plan'
是叠加在
agent_mode
之上的子筛选条件——并非独立行。如果仅需要规划模式,请筛选
agent_mode='<mode>' AND supermode='plan'
,并选择类似
"A user asked PostHog AI to plan"
的开场白。
如果对基于聊天的功能强制使用
"A user ran"
,模型会生成生硬的表述(如"A user ran a question about..."),在Slack中阅读体验不佳。强制开场白是机制——具体短语需根据功能调整。
无论使用何种开场白,负面示例列表(如“No 'This is a...' preamble”等)都是必需的,不要删除。

Step 4 — Create the eval (disabled), test, iterate

步骤4 — 创建评估(禁用状态)、测试、迭代

Create with
enabled: false
so it doesn't immediately fan out to all traces.
If
posthog:llma-evaluation-create
is exposed
, use this payload:
json
posthog:llma-evaluation-create
{
  "name": "[feature] use case feed",
  "description": "Extracts canonical use cases for [feature] for the #team-[area]-usage Slack feed",
  "evaluation_type": "llm_judge",
  "evaluation_config": {
    "prompt": "<full prompt from step 3>"
  },
  "output_type": "boolean",
  "output_config": { "allows_na": true },
  "model_configuration": {
    "provider": "<provider>",
    "model": "<model>"
  },
  "enabled": false,
  "conditions": {
    "filters": [
      // Pattern A — feature-native trace_id prefix:
      { "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix>" }

      // Pattern B — PostHog AI agent mode (use these INSTEAD of the trace_id filter):
      // { "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
      // { "key": "agent_mode", "operator": "exact", "value": "<mode>" }
    ]
  }
}
Leave model choice to the user — LLM-judge cost scales linearly with event volume, and cheap-vs-capable is a real tradeoff they should make based on their own spend tolerance and signal-quality requirements. Don't pick for them.
UI fallback (when
llma-evaluation-create
isn't exposed): LLM analytics → Evaluations → New evaluation. Type =
LLM judge
, output = boolean + allow N/A, filters as above, enabled = off. Paste the prompt from step 3.
Then dry-run against your sample traces.
If
posthog:llma-evaluation-run
is exposed:
json
posthog:llma-evaluation-run
{
  "evaluationId": "<uuid from create>",
  "target_event_id": "<a $ai_generation event id from step 2>",
  "timestamp": "<ISO timestamp of that event>"
}
UI fallback: on the eval detail page, use the "Run on event" button with the trace sample's event id.
Look at the returned
$ai_evaluation_reasoning
. If it preambles, drifts, or describes the input, fix the prompt (via
llma-evaluation-update
or by editing in the UI) and re-run. Iterate on 3-5 traces before enabling.
Common failure modes during iteration:
SymptomFix
Reasoning starts with "This is a..."Strengthen the forced opener instruction; add a counter-example
Reasoning is generic ("user behavior", "various patterns")Add positive examples of concrete phrasing in the prompt
Model classifies everything as PASSTighten the FAIL definition; add an example of what a non-match looks like
Reasoning is too long for SlackAdd a hard sentence cap ("MAX 3 sentences, hard limit")
创建时设置
enabled: false
,避免立即应用到所有跟踪记录。
如果
posthog:llma-evaluation-create
已开放
,使用以下负载:
json
posthog:llma-evaluation-create
{
  "name": "[feature] use case feed",
  "description": "Extracts canonical use cases for [feature] for the #team-[area]-usage Slack feed",
  "evaluation_type": "llm_judge",
  "evaluation_config": {
    "prompt": "<full prompt from step 3>"
  },
  "output_type": "boolean",
  "output_config": { "allows_na": true },
  "model_configuration": {
    "provider": "<provider>",
    "model": "<model>"
  },
  "enabled": false,
  "conditions": {
    "filters": [
      // Pattern A — feature-native trace_id prefix:
      { "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix>" }

      // Pattern B — PostHog AI agent mode (use these INSTEAD of the trace_id filter):
      // { "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
      // { "key": "agent_mode", "operator": "exact", "value": "<mode>" }
    ]
  }
}
模型选择交给用户——LLM-judge的成本与事件数量线性相关,低成本与高性能之间的权衡需要用户根据自身预算和信号质量需求决定。不要替用户选择。
UI回退方案(当
llma-evaluation-create
未开放时):LLM分析→评估→新建评估。类型=
LLM judge
,输出=布尔值+允许不适用,筛选条件如上,启用状态=关闭。粘贴步骤3的提示词。
然后针对示例跟踪记录试运行。
如果
posthog:llma-evaluation-run
已开放
json
posthog:llma-evaluation-run
{
  "evaluationId": "<uuid from create>",
  "target_event_id": "<a $ai_generation event id from step 2>",
  "timestamp": "<ISO timestamp of that event>"
}
UI回退方案:在评估详情页,使用“在事件上运行”按钮,输入跟踪记录样本的事件ID。
查看返回的
$ai_evaluation_reasoning
。如果有开场白、偏离主题或描述输入内容,请调整提示词(通过
llma-evaluation-update
或UI编辑)并重新运行。在启用前,针对3-5条跟踪记录进行迭代。
迭代期间常见失败模式:
症状修复方案
推理以"This is a..."开头强化强制开场白说明;添加反例
推理过于通用(如"user behavior"、"various patterns")在提示词中添加具体表述的正面示例
模型将所有内容分类为PASS收紧FAIL定义;添加非匹配示例
推理过长,不适合Slack添加严格的句子限制("MAX 3 sentences, hard limit")

Step 5 — Enable the eval

步骤5 — 启用评估

Once 3-5 sample runs produce clean Slack-ready output.
If
posthog:llma-evaluation-update
is exposed:
json
posthog:llma-evaluation-update
{
  "evaluationId": "<uuid>",
  "enabled": true
}
UI fallback: LLM analytics → Evaluations → open the eval → toggle enabled.
The eval will now run on every new matching
$ai_generation
event.
当3-5次试运行生成清晰的Slack就绪输出后,启用评估。
如果
posthog:llma-evaluation-update
已开放
json
posthog:llma-evaluation-update
{
  "evaluationId": "<uuid>",
  "enabled": true
}
UI回退方案:LLM分析→评估→打开评估→切换启用状态。
评估现在会针对每一条新的匹配
$ai_generation
事件运行。

Step 6 — Build the workflow (UI only)

步骤6 — 构建工作流(仅UI)

Workflow setup is not MCP-accessible for writes (
posthog:workflows-list
/
posthog:workflows-get
are read-only). The steps below are a UI walkthrough.
Prereq: before you start, invite the PostHog Slack bot to your target channel (
/invite @PostHog
in the Slack channel). Without this, the Slack dispatch step will fail with an opaque permission error at send time, not at save time — easy to miss.
工作流设置不支持MCP写入(
posthog:workflows-list
/
posthog:workflows-get
仅支持只读)。以下是UI操作指南。
前置条件:开始前,邀请PostHog Slack机器人到目标频道(在Slack频道中输入
/invite @PostHog
)。否则,Slack发送步骤会在发送时因权限错误失败,且错误不明显——容易被忽略。

6.1 Create the workflow

6.1 创建工作流

Data pipeline → Workflows → New workflow. Name it
<feature> use case feed
to match the eval name from step 4.
数据管道→工作流→新建工作流。命名为
<feature> use case feed
,与步骤4的评估名称匹配。

6.2 Trigger step

6.2 触发步骤

  • Event:
    AI evaluation (LLM)
    — i.e.
    $ai_evaluation
    . This is the event emitted when an eval runs, and it's the only event that carries
    $ai_evaluation_*
    properties. The original
    $ai_generation
    event is not enriched with eval results, so filtering on
    $ai_generation
    here matches nothing.
  • Property filters (both required):
    • AI Evaluation Name (LLM)
      equals
      <your eval name from step 4>
    • AI Evaluation Result (LLM)
      equals
      true
⚠️ LOAD-BEARING: the stored values for
$ai_evaluation_result
are the strings
'True'
/
'False'
/
'None'
— NOT
'PASS'
/
'FAIL'
/
'N/A'
(despite what the prompt template calls them internally). The Workflows UI property filter normalizes
true
'True'
, so selecting
equals true
from the dropdown works. But if you were wiring this in raw SQL somewhere else (say a hog function), you'd need the string literal. Verify the stored distribution before saving:
sql
SELECT DISTINCT toString(properties.$ai_evaluation_result) AS result, count() AS n
FROM events
WHERE event = '$ai_evaluation'
  AND properties.$ai_evaluation_name = '<your eval name>'
  AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY result
If the only values are
True
/
False
/
None
and
True
dominates, the UI
equals true
filter will match. If you see anything else, adjust accordingly.
  • 事件
    AI evaluation (LLM)
    — 即
    $ai_evaluation
    。这是评估运行时发送的事件,也是唯一携带
    $ai_evaluation_*
    属性的事件。原始
    $ai_generation
    事件包含评估结果,因此在此处筛选
    $ai_generation
    不会匹配任何内容。
  • 属性筛选(均为必填)
    • AI Evaluation Name (LLM)
      等于
      <步骤4中的评估名称>
    • AI Evaluation Result (LLM)
      等于
      true
⚠️ 关键注意事项
$ai_evaluation_result
的存储值为字符串
'True'
/
'False'
/
'None'
— 而非提示词模板内部使用的
'PASS'
/
'FAIL'
/
'N/A'
。工作流UI属性筛选器会将
true
转换为
'True'
,因此从下拉菜单选择
equals true
有效。但如果在其他地方(如hog函数)使用原始SQL,需要使用字符串字面量。保存前验证存储的分布:
sql
SELECT DISTINCT toString(properties.$ai_evaluation_result) AS result, count() AS n
FROM events
WHERE event = '$ai_evaluation'
  AND properties.$ai_evaluation_name = '<your eval name>'
  AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY result
如果只有
True
/
False
/
None
True
占主导,UI的
equals true
筛选器会匹配。如果看到其他值,请相应调整。

6.3 Slack dispatch step

6.3 Slack发送步骤

  • Add step → Slack dispatch
  • Channel:
    #<your-team>-usage-feed
  • Sender / bot display name: something that reads well in the channel (e.g.
    PostHog Usage Feed
    )
  • Blocks (Slack block-kit JSON) — paste this and replace
    <project_id>
    with your actual numeric project ID (e.g.
    2
    ):
json
[
  {
    "text": {
      "text": "<emoji> *{event.properties.$ai_evaluation_name}* triggered by *{person.name}*",
      "type": "mrkdwn"
    },
    "type": "section"
  },
  {
    "text": {
      "text": "{event.properties.$ai_evaluation_reasoning}",
      "type": "mrkdwn"
    },
    "type": "section"
  },
  {
    "type": "actions",
    "elements": [
      {
        "url": "https://us.posthog.com/project/<project_id>/llm-analytics/traces/{event.properties.$ai_trace_id}?event={event.properties.$ai_target_event_id}",
        "text": { "text": "View Trace", "type": "plain_text" },
        "type": "button"
      },
      {
        "url": "https://us.posthog.com/project/<project_id>/replay/{event.properties.$session_id}",
        "text": { "text": "View Trigger Session", "type": "plain_text" },
        "type": "button"
      },
      {
        "url": "{person.url}",
        "text": { "text": "View Person", "type": "plain_text" },
        "type": "button"
      }
    ]
  }
]
Pick an
<emoji>
that matches the feature's shape: 📊 product analytics, 🐛 error tracking, 🎬 session replay, 🔎 search/AI search, 🧪 experiments, 🚩 flags, 📋 surveys, 🧠 generic AI.
The
{event.properties.X}
and
{person.X}
placeholders are valid PostHog template syntax and resolve at send time.
  • 添加步骤→Slack发送
  • 频道
    #<your-team>-usage-feed
  • 发送者/机器人显示名称:频道中易读的名称(如
    PostHog Usage Feed
  • Blocks(Slack block-kit JSON) — 粘贴以下内容,并将
    <project_id>
    替换为实际的数字项目ID(如
    2
    ):
json
[
  {
    "text": {
      "text": "<emoji> *{event.properties.$ai_evaluation_name}* triggered by *{person.name}*",
      "type": "mrkdwn"
    },
    "type": "section"
  },
  {
    "text": {
      "text": "{event.properties.$ai_evaluation_reasoning}",
      "type": "mrkdwn"
    },
    "type": "section"
  },
  {
    "type": "actions",
    "elements": [
      {
        "url": "https://us.posthog.com/project/<project_id>/llm-analytics/traces/{event.properties.$ai_trace_id}?event={event.properties.$ai_target_event_id}",
        "text": { "text": "View Trace", "type": "plain_text" },
        "type": "button"
      },
      {
        "url": "https://us.posthog.com/project/<project_id>/replay/{event.properties.$session_id}",
        "text": { "text": "View Trigger Session", "type": "plain_text" },
        "type": "button"
      },
      {
        "url": "{person.url}",
        "text": { "text": "View Person", "type": "plain_text" },
        "type": "button"
      }
    ]
  }
]
选择与功能匹配的
<emoji>
:📊产品分析、🐛错误跟踪、🎬会话回放、🔎搜索/AI搜索、🧪实验、🚩功能标志、📋调研、🧠通用AI。
{event.properties.X}
{person.X}
占位符是有效的PostHog模板语法,会在发送时解析。

6.4 Test before enabling

6.4 启用前测试

The Workflows Test panel has two modes — this matters because naively hitting "Test" can look like a broken integration when it isn't:
  • Synthetic event (default) — the Test panel fabricates an
    $ai_evaluation
    payload and runs the flow without hitting Slack's real API. Useful as a dry-run of the block template, but
    {event.properties.$ai_*}
    placeholders may resolve to
    null
    and Slack's block validator will reject the payload with
    invalid_blocks
    . That's a test-harness artifact, not a real bug — don't chase it.
  • "Make real HTTPS requests" — flip this toggle on. Workflows then pulls a recent real
    $ai_evaluation
    event matching your filters and runs the flow end-to-end, including the actual Slack post. This is the test that tells you "it works" for real. If no matching real event exists yet (common if the eval was just enabled), trigger the feature yourself, wait ~1 minute, and retry.
Recommended flow: synthetic → sanity-check the block template renders → flip real-requests on → confirm an actual post lands in the channel → save + enable the workflow.
工作流测试面板有两种模式——这很重要,因为直接点击“测试”可能看起来像是集成故障,但实际并非如此:
  • 合成事件(默认)——测试面板会生成一个
    $ai_evaluation
    负载并运行流程,不会调用Slack的真实API。适合作为block模板的试运行,但
    {event.properties.$ai_*}
    占位符可能解析为
    null
    ,Slack的block验证器会以
    invalid_blocks
    拒绝负载。这是测试工具的 artifact,而非真实bug——无需处理。
  • “发送真实HTTPS请求” — 开启此开关。工作流会提取最近一条匹配筛选条件的真实
    $ai_evaluation
    事件,并端到端运行流程,包括实际发送Slack消息。这是验证“功能正常”的真实测试。如果还没有匹配的真实事件(评估刚启用时常见),请自行触发功能,等待约1分钟后重试。
推荐流程:合成事件→检查block模板渲染是否正常→开启真实请求开关→确认消息实际发送到频道→保存并启用工作流。

Step 7 — End-to-end verify in production

步骤7 — 生产环境端到端验证

Once the workflow is enabled, trigger the feature yourself. Within a minute or two:
  1. The
    $ai_generation
    event should appear in LLM Analytics
  2. The eval should auto-run and emit an
    $ai_evaluation
    event
  3. The workflow should fire and the Slack post should land in the configured channel
  4. Click "View Trigger Session" — should land on the recording of you using the feature, not the replay homepage
If "View Trigger Session" lands on the replay homepage,
$session_id
is missing on the
$ai_evaluation
event (which is separate from the
$ai_generation
event — threading is independent for the two). Backend fix needed — see prerequisites.
工作流启用后,自行触发功能。1-2分钟内:
  1. $ai_generation
    事件应出现在LLM分析中
  2. 评估应自动运行并发送
    $ai_evaluation
    事件
  3. 工作流应触发,Slack消息应发送到配置的频道
  4. 点击“View Trigger Session”——应跳转到你使用功能的会话录制页面,而非回放首页
如果“View Trigger Session”跳转到回放首页,说明
$ai_evaluation
事件缺少
$session_id
(与
$ai_generation
事件独立——两者的会话ID传递是分开的)。需要后端修复——请查看前置条件。

Worked example A (Pattern A): group session summary use cases

示例A(模式A):群组会话摘要用例

Pattern: a
group_summary_use_case_feed
eval streaming to a
#<team>-usage-feed
channel. Trace prefix:
session-summary:group:
. Opener:
"A user ran a group summary on"
. Slack channel showed e.g.:
📊 group_summary_use_case_feed triggered by some user "A user ran a group summary on a company's onboarding sessions from the last 7 days. They were trying to understand why account activation rates are low. The summary surfaced that most users abandon at the company onboarding wizard after creating accounts." [View Trace] [View Trigger Session] [View Person]
The PRs that made this work (linked here as worked examples of the session_id threading pattern, not as steps in the skill itself):
  • PostHog/posthog#54952 — threads
    trigger_session_id
    through to
    $ai_generation
    events on the session summary backend
  • (Followup PR — threads
    $session_id
    onto
    $ai_evaluation
    events specifically)
模式:
group_summary_use_case_feed
评估,推送到
#<team>-usage-feed
频道。跟踪记录前缀:
session-summary:group:
。开场白:
"A user ran a group summary on"
。Slack频道示例消息:
📊 group_summary_use_case_feed triggered by some user "A user ran a group summary on a company's onboarding sessions from the last 7 days. They were trying to understand why account activation rates are low. The summary surfaced that most users abandon at the company onboarding wizard after creating accounts." [View Trace] [View Trigger Session] [View Person]
实现此功能的PR(此处作为会话ID传递模式的示例,而非本方案的步骤):
  • PostHog/posthog#54952 — 在会话摘要后端将
    trigger_session_id
    传递到
    $ai_generation
    事件
  • (后续PR — 专门将
    $session_id
    传递到
    $ai_evaluation
    事件)

Worked example B (Pattern B): PostHog AI in error tracking mode

示例B(模式B):PostHog AI错误跟踪模式

Pattern: an
agent_mode = 'error_tracking'
scoped feed streaming to a
#<team>-usage-feed
channel, answering "what are users actually trying to DO when they chat with PostHog AI in error tracking mode?" Mode sizing varies by an order of magnitude or more across agent modes — spot-check volume per §Step 1 before wiring, because a high-volume mode can flood a channel. Opener:
"A user asked PostHog AI about"
.
Enabling PR: PostHog/posthog#55160 — threads
agent_mode
and
supermode
onto every
$ai_generation
emitted by the chat agent loop. Wiring lives in
ee/hogai/core/agent_modes/executables.py
(
AgentExecutable._get_model
) and passes the dict through the existing
posthog_properties
field on
MaxChatMixin
in
ee/hogai/llm.py
. Before this PR, scoping a PostHog AI eval to a specific mode wasn't possible — you'd end up evaluating every PostHog AI generation, which produced noisy feeds with low single-digit PASS rates.
Key observation from setup: the
agent_mode
tag reflects the mode at turn-time, but chat state retains mode selection even when users drift off-topic mid-conversation. Spot-check: a random
agent_mode=error_tracking
sample included a conversation that ended up being about session replay pricing. The eval prompt's classification must be permissive about topic drift — PASS only when the turn is recognizably in-scope for the mode, FAIL when the conversation has drifted to something else entirely.
模式:范围限定为
agent_mode = 'error_tracking'
的信息流,推送到
#<team>-usage-feed
频道,回答“用户在PostHog AI错误跟踪模式下聊天时实际试图做什么?”不同Agent模式的数量差异可达一个数量级以上——设置前请查看步骤1中的数量,因为高量模式可能导致频道消息泛滥。开场白:
"A user asked PostHog AI about"
启用PR:PostHog/posthog#55160 — 将
agent_mode
supermode
传递到聊天Agent循环发送的每条
$ai_generation
事件中。代码位于
ee/hogai/core/agent_modes/executables.py
AgentExecutable._get_model
),并通过
ee/hogai/llm.py
MaxChatMixin
的现有
posthog_properties
字段传递字典。在此PR之前,无法将PostHog AI评估范围限定为特定模式——你会评估所有PostHog AI生成结果,导致信息流噪音大,PASS率仅为个位数。
设置中的关键发现:
agent_mode
标签反映了轮次时的模式,但聊天状态会保留模式选择,即使用户在对话中途偏离主题。抽样检查:随机
agent_mode=error_tracking
样本包含一个最终转向会话回放定价的对话。评估提示词的分类必须允许一定程度的主题偏移——仅当轮次明显符合模式范围时标记为PASS,当对话完全偏离主题时标记为FAIL。

Validating signal quality after launch

上线后验证信号质量

Once the feed has been running for a day or two, sanity-check the eval output at scale.
If
posthog:llma-evaluation-summary-create
is exposed:
json
posthog:llma-evaluation-summary-create
{
  "evaluation_id": "<uuid>",
  "filter": "fail"
}
UI fallback: open the eval in LLM analytics → Evaluations → "Summarize results" button, filter = fail.
If the FAIL bucket is large, the classification step is too strict — relax it. If the PASS bucket has lots of generic reasonings, iterate on the prompt to enforce concreteness. The summary tool gives a quick read on this without you having to scroll through individual events.
Spot-check raw events when needed (note: the stored result value is
'True'
, not
'PASS'
— see step 6):
sql
SELECT
    properties.$ai_evaluation_reasoning AS reasoning,
    properties.$ai_trace_id AS trace_id,
    timestamp
FROM events
WHERE event = '$ai_evaluation'
    AND properties.$ai_evaluation_name = '<your eval name>'
    AND properties.$ai_evaluation_result = 'True'
    AND timestamp > now() - INTERVAL 1 DAY
ORDER BY timestamp DESC
LIMIT 25
信息流运行1-2天后,大规模验证评估输出。
如果
posthog:llma-evaluation-summary-create
已开放
json
posthog:llma-evaluation-summary-create
{
  "evaluation_id": "<uuid>",
  "filter": "fail"
}
UI回退方案:在LLM分析→评估中打开评估→点击“总结结果”按钮,筛选条件=fail。
如果FAIL分组数量大,说明分类步骤过于严格——放宽条件。如果PASS分组有大量通用推理,迭代提示词以强制具体化。摘要工具可快速查看此情况,无需逐条浏览事件。
必要时抽样检查原始事件(注意:存储的结果值为
'True'
,而非
'PASS'
——请查看步骤6):
sql
SELECT
    properties.$ai_evaluation_reasoning AS reasoning,
    properties.$ai_trace_id AS trace_id,
    timestamp
FROM events
WHERE event = '$ai_evaluation'
    AND properties.$ai_evaluation_name = '<your eval name>'
    AND properties.$ai_evaluation_result = 'True'
    AND timestamp > now() - INTERVAL 1 DAY
ORDER BY timestamp DESC
LIMIT 25

Tips

提示

  • The reasoning field IS the Slack message — design the prompt for that, not for "chain of thought before classification." Models can produce structured Slack-ready text in one pass.
  • LLM judges are non-deterministic across reruns. Expect 1-5% noise even with a fixed prompt and model. If you need reproducibility, pin a deterministic provider/seed in
    model_configuration
    .
  • Keep the eval scoped tightly via
    conditions.filters
    on
    $ai_trace_id
    prefix. Otherwise it fans out to every
    $ai_generation
    event in the project and burns LLM cost.
  • For high-volume features (>10k traces/week), consider sampling — set the eval to run on a percentage of matching events rather than all of them. Slack flooding is a real failure mode.
  • The "View Trigger Session" button is the highest-value link in the alert. Without it, the feed is just text — you can't watch what the user was actually doing. Verify it works in step 7 before considering the feed shipped.
  • Once the feed is live, periodically re-run the eval summary tool with
    filter: "pass"
    to surface the dominant use case clusters. That's how you turn the feed into actual product insights instead of just a notification stream.
  • 推理字段就是Slack消息——请为此设计提示词,而非“分类前的思维链”。模型可一次性生成结构化的Slack就绪文本。
  • 重新运行时,LLM judge的结果是非确定性的。即使提示词和模型固定,也会有1-5%的噪音。如果需要可重复性,请在
    model_configuration
    中固定确定性的提供商/种子。
  • 通过
    conditions.filters
    $ai_trace_id
    前缀进行严格范围限定。否则评估会应用到项目中的所有
    $ai_generation
    事件,消耗大量LLM成本。
  • 对于高量功能(>10k跟踪记录/周),考虑抽样——设置评估仅对一定比例的匹配事件运行,而非全部。Slack消息泛滥是真实的故障模式。
  • “View Trigger Session”按钮是告警中价值最高的链接。没有它,信息流只是文本——你无法查看用户实际操作。在步骤7中验证其正常工作后,再认为信息流已上线。
  • 信息流上线后,定期使用
    filter: "pass"
    重新运行评估摘要工具,以发现主要用例集群。这是将信息流转化为实际产品洞察而非单纯通知流的方式。