feature-usage-feed
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBuilding a feature usage feed via LLM evals
通过LLM评估构建功能使用信息流
Some PostHog features (group session summaries, single session summaries, replay AI search, error tracking AI debug, etc.) generate hundreds or thousands of LLM traces per week. Reading them by hand is not feasible. This skill covers the end-to-end pattern for turning that trace volume into a live Slack feed of canonical use cases — what users are actually doing with the feature.
The workflow is mixed, and leans UI. Trace inspection and filter discovery (steps 1-2) are MCP-driven. Eval creation, dry-running, and enabling (steps 4-5) are MCP-driven when tools are exposed to your agent — but they often aren't, in which case fall back to the UI (Data pipeline → destinations for the alert is always UI). Each step flags its UI fallback. Expect to finish in the UI even when you start from chat.
posthog:llma-evaluation-*部分PostHog功能(如群组会话摘要、单会话摘要、回放AI搜索、错误跟踪AI调试等)每周会生成数百甚至数千条LLM跟踪记录。手动阅读这些记录并不现实。本方案涵盖了将大量跟踪记录转换为Slack实时信息流的端到端模式,展示用户实际如何使用该功能。
工作流为混合模式,依赖UI操作。跟踪记录检查和筛选规则发现(步骤1-2)由MCP驱动。评估创建、试运行和启用(步骤4-5)在工具对Agent开放时由MCP驱动——但这些工具通常未开放,此时需回退到UI操作(数据管道→告警目标始终通过UI设置)。每个步骤都标注了UI回退方案。即使从聊天开始,最终也需要在UI中完成操作。
posthog:llma-evaluation-*When to use
适用场景
- "How are people actually using [feature X] in production?"
- "Can we identify the canonical use cases for [feature X] so we can write better docs / prioritize improvements?"
- "I want a Slack feed of representative usage examples without manually skimming traces."
- "Set up a feed of use cases for [feature X] in #team-[area]-usage."
If the user just wants to debug a single trace or tune an existing eval, redirect to or instead.
exploring-llm-tracesexploring-llm-evaluations- “用户在生产环境中实际如何使用[功能X]?”
- “我们能否识别[功能X]的标准用例,以便编写更优质的文档/优先改进方向?”
- “我想要一个Slack信息流展示代表性的使用示例,无需手动浏览跟踪记录。”
- “在#team-[领域]-usage频道设置[功能X]的用例信息流。”
如果用户只是想要调试单条跟踪记录或调整现有评估,请引导至或方案。
exploring-llm-tracesexploring-llm-evaluationsTwo filter patterns
两种筛选模式
This skill supports two different ways to scope an eval to "the feature you care about":
Pattern A — Feature-native trace_id prefix. For standalone features that emit their own pattern (e.g. , , error-tracking-specific flows). Filter on the prefix.
$ai_trace_idsession-summary:group:replay-search:Pattern B — PostHog AI agent mode. For features the user interacts with via PostHog AI in a specific agent mode (error tracking, product analytics, session replay, SQL, flags, surveys, LLM analytics). Filter on . This requires PR #55160 (merged April 2026) to be deployed, which threads and onto every emitted by the chat agent loop. A useful ergonomic side-effect: is a reliable "user-facing chat turn" filter — batch jobs and tool-internal LLM calls go through different code paths and have , so they're excluded for free.
ai_product = 'posthog_ai' AND agent_mode = '<mode>'agent_modesupermode$ai_generationagent_mode IS NOT NULLagent_mode=nullIf the user asks "what are users trying to DO in [ET / replay / SQL / flags / surveys] mode of PostHog AI", that's Pattern B. If they ask "what use cases does [standalone feature] cover", that's Pattern A. Pick the pattern first — the prompt, filter, and Slack channel naming all follow from it.
本方案支持两种将评估范围限定为“你关注的功能”的方式:
模式A — 功能原生trace_id前缀:适用于会生成专属模式的独立功能(如、、错误跟踪特定流程)。通过前缀进行筛选。
$ai_trace_idsession-summary:group:replay-search:模式B — PostHog AI Agent模式:适用于用户通过PostHog AI特定Agent模式(错误跟踪、产品分析、会话回放、SQL、功能标志、调研、LLM分析)交互的功能。筛选条件为。这需要部署PR #55160(2026年4月合并),该PR会将和传递到聊天Agent循环发送的每条事件中。一个实用的附加效果:是可靠的“用户面向聊天轮次”筛选条件——批量任务和工具内部LLM调用通过不同代码路径执行,且,因此会被自动排除。
ai_product = 'posthog_ai' AND agent_mode = '<mode>'agent_modesupermode$ai_generationagent_mode IS NOT NULLagent_mode=null如果用户询问“在PostHog AI的[错误跟踪/回放/SQL/功能标志/调研]模式下,用户试图做什么”,则使用模式B。如果用户询问“[独立功能]涵盖哪些用例”,则使用模式A。先选择模式——提示词、筛选条件和Slack频道命名都将以此为基础。
Prerequisites
前置条件
| Requirement | How to verify |
|---|---|
(Pattern A) Feature emits | |
(Pattern B) | |
| |
| Same query but on |
| User has organisation-level AI data processing approval | Required for |
If is missing on either event type, file a backend fix before continuing — there is no UI workaround. The session-summary feature has a worked example of the threading pattern in PR #54952. For Pattern B, the agent-mode threading pattern is in PR #55160.
$session_id| 要求 | 验证方式 |
|---|---|
(模式A)功能会发送带有稳定 | 使用 |
(模式B)近期 | 使用 |
| 使用 |
| 评估运行一次后,对 |
| 用户拥有组织级AI数据处理权限 | 这是使用 |
如果任意一种事件类型缺少,请先提交后端修复再继续——没有UI替代方案。会话摘要功能在PR #54952中提供了会话ID传递模式的示例。对于模式B,Agent模式传递模式在PR #55160中。
$session_idTools
工具
| Tool | Purpose |
|---|---|
| Find sample traces matching the feature's |
| Inspect a specific trace's contents end-to-end |
| Verify trace volume, session_id coverage, eval result distributions |
| (often unexposed — UI fallback: LLM analytics → Evaluations → New) Create the LLM-judge eval (disabled at first) |
| (often unexposed — UI fallback: the eval's detail page has a "Run on event" button) Dry-run the eval against specific generations during prompt iteration |
| (often unexposed — UI fallback: edit the eval in LLM analytics → Evaluations) Tweak the prompt / enable when ready |
| (often unexposed — UI fallback: the eval detail page has a "Summarize results" button) After the feed is running, get an AI summary of pass/N/A patterns to validate signal quality |
| (often unexposed — UI: Data pipeline → Workflows) Browse existing workflow configs — useful for cloning an existing feed's structure when setting up a new one. Read-only; no create/update tool is exposed yet, so step 6's Slack workflow setup is UI-only. |
Before starting, check which of the tools are actually exposed in your agent's MCP tool set. If they aren't loaded, treat steps 4-5 as UI walkthroughs rather than tool calls.
posthog:llma-evaluation-*| 工具 | 用途 |
|---|---|
| 查找匹配功能 |
| 端到端查看特定跟踪记录的内容 |
| 验证跟踪记录数量、session_id覆盖率、评估结果分布 |
| (通常未开放 — UI回退方案:LLM分析→评估→新建)创建LLM-judge评估(初始状态为禁用) |
| (通常未开放 — UI回退方案:评估详情页有“在事件上运行”按钮)在提示词迭代期间,针对特定生成结果试运行评估 |
| (通常未开放 — UI回退方案:在LLM分析→评估中编辑评估)调整提示词/准备就绪后启用评估 |
| (通常未开放 — UI回退方案:评估详情页有“总结结果”按钮)信息流运行后,获取AI生成的通过/不适用模式摘要,验证信号质量 |
| (通常未开放 — UI方案:数据管道→工作流)浏览现有工作流配置——在设置新信息流时,克隆现有信息流的结构非常有用。仅支持只读;目前未开放创建/更新工具,因此步骤6的Slack工作流设置只能通过UI完成。 |
开始前,请检查Agent的MCP工具集中实际开放了哪些工具。如果未加载这些工具,请将步骤4-5视为UI操作指南,而非工具调用。
posthog:llma-evaluation-*Workflow
工作流
Step 1 — Identify the filter
步骤1 — 确定筛选条件
Pattern A (feature-native trace_id prefix): find the prefix that maps to your feature.
sql
SELECT
splitByChar(':', coalesce(properties.$ai_trace_id, ''))[1] AS root,
splitByChar(':', coalesce(properties.$ai_trace_id, ''))[2] AS subtype,
count() AS events
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
AND event = '$ai_generation'
AND properties.$ai_trace_id IS NOT NULL
GROUP BY root, subtype
ORDER BY events DESC
LIMIT 25Note: is load-bearing — on a nullable column errors out in HogQL otherwise.
coalesce(..., '')splitByCharPattern B (PostHog AI agent mode): verify coverage and volume for the mode you're targeting.
sql
SELECT
properties.agent_mode AS agent_mode,
properties.supermode AS supermode,
count() AS events,
count(DISTINCT properties.$ai_trace_id) AS traces
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
AND event = '$ai_generation'
AND properties.ai_product = 'posthog_ai'
GROUP BY agent_mode, supermode
ORDER BY events DESC
LIMIT 20Expected values for : , , , , , , , . Null ≈ batch jobs + tool-internal calls (not user chat). splits planning turns from execution turns — worth calling out separately if your feed is about plan-mode specifically.
agent_modeerror_trackingproduct_analyticssqlsession_replayflagssurveyllm_analyticsnullsupermode='plan'Record the mode + rough volume. Low-volume modes (<100 events/day) will produce a trickle-feed that's hard to validate early; high-volume modes (>1k/day) may need sampling to avoid Slack flooding. See the "Tips" section on sampling.
模式A(功能原生trace_id前缀):找到与你的功能匹配的前缀。
sql
SELECT
splitByChar(':', coalesce(properties.$ai_trace_id, ''))[1] AS root,
splitByChar(':', coalesce(properties.$ai_trace_id, ''))[2] AS subtype,
count() AS events
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
AND event = '$ai_generation'
AND properties.$ai_trace_id IS NOT NULL
GROUP BY root, subtype
ORDER BY events DESC
LIMIT 25注意:是必需的——否则HogQL中对可空列使用会报错。
coalesce(..., '')splitByChar模式B(PostHog AI Agent模式):验证目标模式的覆盖范围和数量。
sql
SELECT
properties.agent_mode AS agent_mode,
properties.supermode AS supermode,
count() AS events,
count(DISTINCT properties.$ai_trace_id) AS traces
FROM events
WHERE timestamp > now() - INTERVAL 3 DAY
AND event = '$ai_generation'
AND properties.ai_product = 'posthog_ai'
GROUP BY agent_mode, supermode
ORDER BY events DESC
LIMIT 20agent_modeerror_trackingproduct_analyticssqlsession_replayflagssurveyllm_analyticsnullsupermode='plan'记录模式和大致数量。低量模式(<100事件/天)会产生难以早期验证的涓流式信息流;高量模式(>1k事件/天)可能需要抽样以避免Slack消息泛滥。请查看“提示”部分的抽样说明。
Step 2 — Pull a handful of sample traces
步骤2 — 获取少量示例跟踪记录
Use these for prompt iteration in step 4.
Pattern A:
json
posthog:query-llm-traces-list
{
"properties": [
{ "type": "event", "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix-here>" }
],
"limit": 10,
"dateRange": { "date_from": "-2d" },
"randomOrder": true
}Pattern B:
json
posthog:query-llm-traces-list
{
"properties": [
{ "type": "event", "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
{ "type": "event", "key": "agent_mode", "operator": "exact", "value": "<mode-here>" }
],
"limit": 10,
"dateRange": { "date_from": "-2d" },
"randomOrder": true
}randomOrder: trueOutput size warning: with routinely returns 3-6MB of JSON (full input/output per generation). This will blow your context window. Immediately delegate the summarization to a subagent the moment you see the "result exceeds maximum allowed tokens" error — ask the subagent to extract, per trace: the trace id, the first user message (truncated to ~300 chars), the sampled , and a one-sentence description of what the conversation was about. Don't try to read the raw file in-line.
query-llm-traces-listlimit: 10$current_urlWatch for topic drift in Pattern B samples. The tag reflects the user's mode selection at the time of the turn — but chat state retains the mode even if the user drifts off-topic within the same conversation (e.g. user selected "error tracking" mode, then asked an unrelated pricing question three turns later). Your eval prompt's classification step needs to be permissive about topic-drift: PASS should mean "user is doing something recognizably in-scope for this mode", FAIL should catch the off-topic drift. If you don't, your feed will include irrelevant PASS entries that happen to carry the mode tag.
agent_mode这些记录将用于步骤4的提示词迭代。
模式A:
json
posthog:query-llm-traces-list
{
"properties": [
{ "type": "event", "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix-here>" }
],
"limit": 10,
"dateRange": { "date_from": "-2d" },
"randomOrder": true
}模式B:
json
posthog:query-llm-traces-list
{
"properties": [
{ "type": "event", "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
{ "type": "event", "key": "agent_mode", "operator": "exact", "value": "<mode-here>" }
],
"limit": 10,
"dateRange": { "date_from": "-2d" },
"randomOrder": true
}randomOrder: true输出大小警告:设置通常会返回3-6MB的JSON(每条生成结果的完整输入/输出),这会超出上下文窗口。一旦看到“结果超过最大允许token数”错误,请立即将摘要任务委托给子Agent——要求子Agent为每条跟踪记录提取:跟踪ID、第一条用户消息(截断至约300字符)、抽样的,以及一句关于对话内容的描述。不要尝试直接读取原始文件。
query-llm-traces-listlimit: 10$current_url注意模式B样本中的主题偏移。标签反映了用户在该轮次选择的模式——但聊天状态会保留模式,即使用户在同一场对话中偏离主题(例如,用户选择了“错误跟踪”模式,然后在三轮后询问了无关的定价问题)。你的评估提示词分类步骤需要允许一定程度的主题偏移:PASS表示“用户正在做与该模式明显相关的事情”,FAIL表示捕获到偏离主题的内容。否则,你的信息流会包含带有模式标签但无关的PASS条目。
agent_modeStep 3 — Draft the LLM-judge prompt
步骤3 — 编写LLM-judge提示词
The prompt has two responsibilities: (a) classify the trace as relevant or not, (b) produce reasoning text that is directly postable to Slack (no preamble, no meta-description). The reasoning field becomes the Slack message body.
Template:
text
You are analyzing a PostHog [FEATURE NAME] trace to extract its real use case.
Your reasoning text will be posted directly to a Slack channel as a notification.
Write it as a short, ready-to-post message — no preamble, no meta-description.
Step 1 — Classification:
- PASS = this trace is the [feature kind] you care about
- FAIL = a different LLM call or a false match
- N/A = ambiguous from the trace alone
Step 2 — Reasoning (only matters if PASS). Write 2-3 sentences in this exact format:
"[OPENER] [what they targeted/filtered for]. They were
trying to [understand X / debug Y / find Z]. The result surfaced [key pattern
or finding]."
Your output MUST start with the exact phrase "[OPENER]". No other opening is allowed.
Rules:
- No "This is a [feature]..." or "The input contains..." preamble
- No JSON, field names, system-prompt references, or meta-description
- Concrete > generic. "users hitting error tracking for the first time" beats "user behavior"
- If you cannot infer one of the three pieces from the trace, write "(unclear from trace)" in that slot — do not guessPick an that matches how users actually interact with the feature. The forced opener is load-bearing (it prevents the model from drifting into "this trace is a..." meta-description), but the exact verb has to fit the interaction:
[OPENER]| Feature / mode | OPENER |
|---|---|
| Session summary (group / single) | |
| Replay AI search | |
| PostHog AI in error tracking mode | |
| PostHog AI in session replay mode | |
| PostHog AI in SQL mode | |
Note: is a sub-filter that layers on top of an row — it's not its own row. If you want plan-mode-only, filter and pick an opener like .
supermode='plan'agent_modeagent_mode='<mode>' AND supermode='plan'"A user asked PostHog AI to plan"If you force on a chat-based feature, the model will produce awkward contortions ("A user ran a question about...") that read wrong in Slack. The forced-opener pattern is the mechanism — the specific phrase is per-feature.
"A user ran"The negative example list ("No 'This is a...' preamble", etc.) is load-bearing regardless of opener. Don't remove it.
提示词有两个职责:(a) 将跟踪记录分类为相关或不相关,(b) 生成可直接发布到Slack的推理文本(无需开场白,无需元描述)。推理字段将成为Slack消息内容。
模板:
text
You are analyzing a PostHog [FEATURE NAME] trace to extract its real use case.
Your reasoning text will be posted directly to a Slack channel as a notification.
Write it as a short, ready-to-post message — no preamble, no meta-description.
Step 1 — Classification:
- PASS = this trace is the [feature kind] you care about
- FAIL = a different LLM call or a false match
- N/A = ambiguous from the trace alone
Step 2 — Reasoning (only matters if PASS). Write 2-3 sentences in this exact format:
"[OPENER] [what they targeted/filtered for]. They were
trying to [understand X / debug Y / find Z]. The result surfaced [key pattern
or finding]."
Your output MUST start with the exact phrase "[OPENER]". No other opening is allowed.
Rules:
- No "This is a [feature]..." or "The input contains..." preamble
- No JSON, field names, system-prompt references, or meta-description
- Concrete > generic. "users hitting error tracking for the first time" beats "user behavior"
- If you cannot infer one of the three pieces from the trace, write "(unclear from trace)" in that slot — do not guess选择与用户实际交互方式匹配的。强制开场白是必需的(防止模型偏离到“这是一条...”的元描述),但具体动词必须符合交互场景:
[OPENER]| 功能/模式 | 开场白 |
|---|---|
| 会话摘要(群组/单个) | |
| 回放AI搜索 | |
| PostHog AI错误跟踪模式 | |
| PostHog AI会话回放模式 | |
| PostHog AI SQL模式 | |
注意:是叠加在之上的子筛选条件——并非独立行。如果仅需要规划模式,请筛选,并选择类似的开场白。
supermode='plan'agent_modeagent_mode='<mode>' AND supermode='plan'"A user asked PostHog AI to plan"如果对基于聊天的功能强制使用,模型会生成生硬的表述(如"A user ran a question about..."),在Slack中阅读体验不佳。强制开场白是机制——具体短语需根据功能调整。
"A user ran"无论使用何种开场白,负面示例列表(如“No 'This is a...' preamble”等)都是必需的,不要删除。
Step 4 — Create the eval (disabled), test, iterate
步骤4 — 创建评估(禁用状态)、测试、迭代
Create with so it doesn't immediately fan out to all traces.
enabled: falseIf is exposed, use this payload:
posthog:llma-evaluation-createjson
posthog:llma-evaluation-create
{
"name": "[feature] use case feed",
"description": "Extracts canonical use cases for [feature] for the #team-[area]-usage Slack feed",
"evaluation_type": "llm_judge",
"evaluation_config": {
"prompt": "<full prompt from step 3>"
},
"output_type": "boolean",
"output_config": { "allows_na": true },
"model_configuration": {
"provider": "<provider>",
"model": "<model>"
},
"enabled": false,
"conditions": {
"filters": [
// Pattern A — feature-native trace_id prefix:
{ "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix>" }
// Pattern B — PostHog AI agent mode (use these INSTEAD of the trace_id filter):
// { "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
// { "key": "agent_mode", "operator": "exact", "value": "<mode>" }
]
}
}Leave model choice to the user — LLM-judge cost scales linearly with event volume, and cheap-vs-capable is a real tradeoff they should make based on their own spend tolerance and signal-quality requirements. Don't pick for them.
UI fallback (when isn't exposed): LLM analytics → Evaluations → New evaluation. Type = , output = boolean + allow N/A, filters as above, enabled = off. Paste the prompt from step 3.
llma-evaluation-createLLM judgeThen dry-run against your sample traces.
If is exposed:
posthog:llma-evaluation-runjson
posthog:llma-evaluation-run
{
"evaluationId": "<uuid from create>",
"target_event_id": "<a $ai_generation event id from step 2>",
"timestamp": "<ISO timestamp of that event>"
}UI fallback: on the eval detail page, use the "Run on event" button with the trace sample's event id.
Look at the returned . If it preambles, drifts, or describes the input, fix the prompt (via or by editing in the UI) and re-run. Iterate on 3-5 traces before enabling.
$ai_evaluation_reasoningllma-evaluation-updateCommon failure modes during iteration:
| Symptom | Fix |
|---|---|
| Reasoning starts with "This is a..." | Strengthen the forced opener instruction; add a counter-example |
| Reasoning is generic ("user behavior", "various patterns") | Add positive examples of concrete phrasing in the prompt |
| Model classifies everything as PASS | Tighten the FAIL definition; add an example of what a non-match looks like |
| Reasoning is too long for Slack | Add a hard sentence cap ("MAX 3 sentences, hard limit") |
创建时设置,避免立即应用到所有跟踪记录。
enabled: false如果已开放,使用以下负载:
posthog:llma-evaluation-createjson
posthog:llma-evaluation-create
{
"name": "[feature] use case feed",
"description": "Extracts canonical use cases for [feature] for the #team-[area]-usage Slack feed",
"evaluation_type": "llm_judge",
"evaluation_config": {
"prompt": "<full prompt from step 3>"
},
"output_type": "boolean",
"output_config": { "allows_na": true },
"model_configuration": {
"provider": "<provider>",
"model": "<model>"
},
"enabled": false,
"conditions": {
"filters": [
// Pattern A — feature-native trace_id prefix:
{ "key": "$ai_trace_id", "operator": "icontains", "value": "<your-prefix>" }
// Pattern B — PostHog AI agent mode (use these INSTEAD of the trace_id filter):
// { "key": "ai_product", "operator": "exact", "value": "posthog_ai" },
// { "key": "agent_mode", "operator": "exact", "value": "<mode>" }
]
}
}模型选择交给用户——LLM-judge的成本与事件数量线性相关,低成本与高性能之间的权衡需要用户根据自身预算和信号质量需求决定。不要替用户选择。
UI回退方案(当未开放时):LLM分析→评估→新建评估。类型=,输出=布尔值+允许不适用,筛选条件如上,启用状态=关闭。粘贴步骤3的提示词。
llma-evaluation-createLLM judge然后针对示例跟踪记录试运行。
如果已开放:
posthog:llma-evaluation-runjson
posthog:llma-evaluation-run
{
"evaluationId": "<uuid from create>",
"target_event_id": "<a $ai_generation event id from step 2>",
"timestamp": "<ISO timestamp of that event>"
}UI回退方案:在评估详情页,使用“在事件上运行”按钮,输入跟踪记录样本的事件ID。
查看返回的。如果有开场白、偏离主题或描述输入内容,请调整提示词(通过或UI编辑)并重新运行。在启用前,针对3-5条跟踪记录进行迭代。
$ai_evaluation_reasoningllma-evaluation-update迭代期间常见失败模式:
| 症状 | 修复方案 |
|---|---|
| 推理以"This is a..."开头 | 强化强制开场白说明;添加反例 |
| 推理过于通用(如"user behavior"、"various patterns") | 在提示词中添加具体表述的正面示例 |
| 模型将所有内容分类为PASS | 收紧FAIL定义;添加非匹配示例 |
| 推理过长,不适合Slack | 添加严格的句子限制("MAX 3 sentences, hard limit") |
Step 5 — Enable the eval
步骤5 — 启用评估
Once 3-5 sample runs produce clean Slack-ready output.
If is exposed:
posthog:llma-evaluation-updatejson
posthog:llma-evaluation-update
{
"evaluationId": "<uuid>",
"enabled": true
}UI fallback: LLM analytics → Evaluations → open the eval → toggle enabled.
The eval will now run on every new matching event.
$ai_generation当3-5次试运行生成清晰的Slack就绪输出后,启用评估。
如果已开放:
posthog:llma-evaluation-updatejson
posthog:llma-evaluation-update
{
"evaluationId": "<uuid>",
"enabled": true
}UI回退方案:LLM分析→评估→打开评估→切换启用状态。
评估现在会针对每一条新的匹配事件运行。
$ai_generationStep 6 — Build the workflow (UI only)
步骤6 — 构建工作流(仅UI)
Workflow setup is not MCP-accessible for writes ( / are read-only). The steps below are a UI walkthrough.
posthog:workflows-listposthog:workflows-getPrereq: before you start, invite the PostHog Slack bot to your target channel ( in the Slack channel). Without this, the Slack dispatch step will fail with an opaque permission error at send time, not at save time — easy to miss.
/invite @PostHog工作流设置不支持MCP写入( / 仅支持只读)。以下是UI操作指南。
posthog:workflows-listposthog:workflows-get前置条件:开始前,邀请PostHog Slack机器人到目标频道(在Slack频道中输入)。否则,Slack发送步骤会在发送时因权限错误失败,且错误不明显——容易被忽略。
/invite @PostHog6.1 Create the workflow
6.1 创建工作流
Data pipeline → Workflows → New workflow. Name it to match the eval name from step 4.
<feature> use case feed数据管道→工作流→新建工作流。命名为,与步骤4的评估名称匹配。
<feature> use case feed6.2 Trigger step
6.2 触发步骤
- Event: — i.e.
AI evaluation (LLM). This is the event emitted when an eval runs, and it's the only event that carries$ai_evaluationproperties. The original$ai_evaluation_*event is not enriched with eval results, so filtering on$ai_generationhere matches nothing.$ai_generation - Property filters (both required):
- equals
AI Evaluation Name (LLM)<your eval name from step 4> - equals
AI Evaluation Result (LLM)true
⚠️ LOAD-BEARING: the stored values for are the strings / / — NOT / / (despite what the prompt template calls them internally). The Workflows UI property filter normalizes → , so selecting from the dropdown works. But if you were wiring this in raw SQL somewhere else (say a hog function), you'd need the string literal. Verify the stored distribution before saving:
$ai_evaluation_result'True''False''None''PASS''FAIL''N/A'true'True'equals truesql
SELECT DISTINCT toString(properties.$ai_evaluation_result) AS result, count() AS n
FROM events
WHERE event = '$ai_evaluation'
AND properties.$ai_evaluation_name = '<your eval name>'
AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY resultIf the only values are // and dominates, the UI filter will match. If you see anything else, adjust accordingly.
TrueFalseNoneTrueequals true- 事件:— 即
AI evaluation (LLM)。这是评估运行时发送的事件,也是唯一携带$ai_evaluation属性的事件。原始$ai_evaluation_*事件未包含评估结果,因此在此处筛选$ai_generation不会匹配任何内容。$ai_generation - 属性筛选(均为必填):
- 等于
AI Evaluation Name (LLM)<步骤4中的评估名称> - 等于
AI Evaluation Result (LLM)true
⚠️ 关键注意事项:的存储值为字符串 / / — 而非提示词模板内部使用的 / / 。工作流UI属性筛选器会将转换为,因此从下拉菜单选择有效。但如果在其他地方(如hog函数)使用原始SQL,需要使用字符串字面量。保存前验证存储的分布:
$ai_evaluation_result'True''False''None''PASS''FAIL''N/A'true'True'equals truesql
SELECT DISTINCT toString(properties.$ai_evaluation_result) AS result, count() AS n
FROM events
WHERE event = '$ai_evaluation'
AND properties.$ai_evaluation_name = '<your eval name>'
AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY result如果只有//且占主导,UI的筛选器会匹配。如果看到其他值,请相应调整。
TrueFalseNoneTrueequals true6.3 Slack dispatch step
6.3 Slack发送步骤
- Add step → Slack dispatch
- Channel:
#<your-team>-usage-feed - Sender / bot display name: something that reads well in the channel (e.g. )
PostHog Usage Feed - Blocks (Slack block-kit JSON) — paste this and replace with your actual numeric project ID (e.g.
<project_id>):2
json
[
{
"text": {
"text": "<emoji> *{event.properties.$ai_evaluation_name}* triggered by *{person.name}*",
"type": "mrkdwn"
},
"type": "section"
},
{
"text": {
"text": "{event.properties.$ai_evaluation_reasoning}",
"type": "mrkdwn"
},
"type": "section"
},
{
"type": "actions",
"elements": [
{
"url": "https://us.posthog.com/project/<project_id>/llm-analytics/traces/{event.properties.$ai_trace_id}?event={event.properties.$ai_target_event_id}",
"text": { "text": "View Trace", "type": "plain_text" },
"type": "button"
},
{
"url": "https://us.posthog.com/project/<project_id>/replay/{event.properties.$session_id}",
"text": { "text": "View Trigger Session", "type": "plain_text" },
"type": "button"
},
{
"url": "{person.url}",
"text": { "text": "View Person", "type": "plain_text" },
"type": "button"
}
]
}
]Pick an that matches the feature's shape: 📊 product analytics, 🐛 error tracking, 🎬 session replay, 🔎 search/AI search, 🧪 experiments, 🚩 flags, 📋 surveys, 🧠 generic AI.
<emoji>The and placeholders are valid PostHog template syntax and resolve at send time.
{event.properties.X}{person.X}- 添加步骤→Slack发送
- 频道:
#<your-team>-usage-feed - 发送者/机器人显示名称:频道中易读的名称(如)
PostHog Usage Feed - Blocks(Slack block-kit JSON) — 粘贴以下内容,并将替换为实际的数字项目ID(如
<project_id>):2
json
[
{
"text": {
"text": "<emoji> *{event.properties.$ai_evaluation_name}* triggered by *{person.name}*",
"type": "mrkdwn"
},
"type": "section"
},
{
"text": {
"text": "{event.properties.$ai_evaluation_reasoning}",
"type": "mrkdwn"
},
"type": "section"
},
{
"type": "actions",
"elements": [
{
"url": "https://us.posthog.com/project/<project_id>/llm-analytics/traces/{event.properties.$ai_trace_id}?event={event.properties.$ai_target_event_id}",
"text": { "text": "View Trace", "type": "plain_text" },
"type": "button"
},
{
"url": "https://us.posthog.com/project/<project_id>/replay/{event.properties.$session_id}",
"text": { "text": "View Trigger Session", "type": "plain_text" },
"type": "button"
},
{
"url": "{person.url}",
"text": { "text": "View Person", "type": "plain_text" },
"type": "button"
}
]
}
]选择与功能匹配的:📊产品分析、🐛错误跟踪、🎬会话回放、🔎搜索/AI搜索、🧪实验、🚩功能标志、📋调研、🧠通用AI。
<emoji>{event.properties.X}{person.X}6.4 Test before enabling
6.4 启用前测试
The Workflows Test panel has two modes — this matters because naively hitting "Test" can look like a broken integration when it isn't:
- Synthetic event (default) — the Test panel fabricates an payload and runs the flow without hitting Slack's real API. Useful as a dry-run of the block template, but
$ai_evaluationplaceholders may resolve to{event.properties.$ai_*}and Slack's block validator will reject the payload withnull. That's a test-harness artifact, not a real bug — don't chase it.invalid_blocks - "Make real HTTPS requests" — flip this toggle on. Workflows then pulls a recent real event matching your filters and runs the flow end-to-end, including the actual Slack post. This is the test that tells you "it works" for real. If no matching real event exists yet (common if the eval was just enabled), trigger the feature yourself, wait ~1 minute, and retry.
$ai_evaluation
Recommended flow: synthetic → sanity-check the block template renders → flip real-requests on → confirm an actual post lands in the channel → save + enable the workflow.
工作流测试面板有两种模式——这很重要,因为直接点击“测试”可能看起来像是集成故障,但实际并非如此:
- 合成事件(默认)——测试面板会生成一个负载并运行流程,不会调用Slack的真实API。适合作为block模板的试运行,但
$ai_evaluation占位符可能解析为{event.properties.$ai_*},Slack的block验证器会以null拒绝负载。这是测试工具的 artifact,而非真实bug——无需处理。invalid_blocks - “发送真实HTTPS请求” — 开启此开关。工作流会提取最近一条匹配筛选条件的真实事件,并端到端运行流程,包括实际发送Slack消息。这是验证“功能正常”的真实测试。如果还没有匹配的真实事件(评估刚启用时常见),请自行触发功能,等待约1分钟后重试。
$ai_evaluation
推荐流程:合成事件→检查block模板渲染是否正常→开启真实请求开关→确认消息实际发送到频道→保存并启用工作流。
Step 7 — End-to-end verify in production
步骤7 — 生产环境端到端验证
Once the workflow is enabled, trigger the feature yourself. Within a minute or two:
- The event should appear in LLM Analytics
$ai_generation - The eval should auto-run and emit an event
$ai_evaluation - The workflow should fire and the Slack post should land in the configured channel
- Click "View Trigger Session" — should land on the recording of you using the feature, not the replay homepage
If "View Trigger Session" lands on the replay homepage, is missing on the event (which is separate from the event — threading is independent for the two). Backend fix needed — see prerequisites.
$session_id$ai_evaluation$ai_generation工作流启用后,自行触发功能。1-2分钟内:
- 事件应出现在LLM分析中
$ai_generation - 评估应自动运行并发送事件
$ai_evaluation - 工作流应触发,Slack消息应发送到配置的频道
- 点击“View Trigger Session”——应跳转到你使用功能的会话录制页面,而非回放首页
如果“View Trigger Session”跳转到回放首页,说明事件缺少(与事件独立——两者的会话ID传递是分开的)。需要后端修复——请查看前置条件。
$ai_evaluation$session_id$ai_generationWorked example A (Pattern A): group session summary use cases
示例A(模式A):群组会话摘要用例
Pattern: a eval streaming to a channel. Trace prefix: . Opener: . Slack channel showed e.g.:
group_summary_use_case_feed#<team>-usage-feedsession-summary:group:"A user ran a group summary on"📊 group_summary_use_case_feed triggered by some user "A user ran a group summary on a company's onboarding sessions from the last 7 days. They were trying to understand why account activation rates are low. The summary surfaced that most users abandon at the company onboarding wizard after creating accounts." [View Trace] [View Trigger Session] [View Person]
The PRs that made this work (linked here as worked examples of the session_id threading pattern, not as steps in the skill itself):
- PostHog/posthog#54952 — threads through to
trigger_session_idevents on the session summary backend$ai_generation - (Followup PR — threads onto
$session_idevents specifically)$ai_evaluation
模式:评估,推送到频道。跟踪记录前缀:。开场白:。Slack频道示例消息:
group_summary_use_case_feed#<team>-usage-feedsession-summary:group:"A user ran a group summary on"📊 group_summary_use_case_feed triggered by some user "A user ran a group summary on a company's onboarding sessions from the last 7 days. They were trying to understand why account activation rates are low. The summary surfaced that most users abandon at the company onboarding wizard after creating accounts." [View Trace] [View Trigger Session] [View Person]
实现此功能的PR(此处作为会话ID传递模式的示例,而非本方案的步骤):
- PostHog/posthog#54952 — 在会话摘要后端将传递到
trigger_session_id事件$ai_generation - (后续PR — 专门将传递到
$session_id事件)$ai_evaluation
Worked example B (Pattern B): PostHog AI in error tracking mode
示例B(模式B):PostHog AI错误跟踪模式
Pattern: an scoped feed streaming to a channel, answering "what are users actually trying to DO when they chat with PostHog AI in error tracking mode?" Mode sizing varies by an order of magnitude or more across agent modes — spot-check volume per §Step 1 before wiring, because a high-volume mode can flood a channel. Opener: .
agent_mode = 'error_tracking'#<team>-usage-feed"A user asked PostHog AI about"Enabling PR: PostHog/posthog#55160 — threads and onto every emitted by the chat agent loop. Wiring lives in () and passes the dict through the existing field on in . Before this PR, scoping a PostHog AI eval to a specific mode wasn't possible — you'd end up evaluating every PostHog AI generation, which produced noisy feeds with low single-digit PASS rates.
agent_modesupermode$ai_generationee/hogai/core/agent_modes/executables.pyAgentExecutable._get_modelposthog_propertiesMaxChatMixinee/hogai/llm.pyKey observation from setup: the tag reflects the mode at turn-time, but chat state retains mode selection even when users drift off-topic mid-conversation. Spot-check: a random sample included a conversation that ended up being about session replay pricing. The eval prompt's classification must be permissive about topic drift — PASS only when the turn is recognizably in-scope for the mode, FAIL when the conversation has drifted to something else entirely.
agent_modeagent_mode=error_tracking模式:范围限定为的信息流,推送到频道,回答“用户在PostHog AI错误跟踪模式下聊天时实际试图做什么?”不同Agent模式的数量差异可达一个数量级以上——设置前请查看步骤1中的数量,因为高量模式可能导致频道消息泛滥。开场白:。
agent_mode = 'error_tracking'#<team>-usage-feed"A user asked PostHog AI about"启用PR:PostHog/posthog#55160 — 将和传递到聊天Agent循环发送的每条事件中。代码位于(),并通过中的现有字段传递字典。在此PR之前,无法将PostHog AI评估范围限定为特定模式——你会评估所有PostHog AI生成结果,导致信息流噪音大,PASS率仅为个位数。
agent_modesupermode$ai_generationee/hogai/core/agent_modes/executables.pyAgentExecutable._get_modelee/hogai/llm.pyMaxChatMixinposthog_properties设置中的关键发现:标签反映了轮次时的模式,但聊天状态会保留模式选择,即使用户在对话中途偏离主题。抽样检查:随机样本包含一个最终转向会话回放定价的对话。评估提示词的分类必须允许一定程度的主题偏移——仅当轮次明显符合模式范围时标记为PASS,当对话完全偏离主题时标记为FAIL。
agent_modeagent_mode=error_trackingValidating signal quality after launch
上线后验证信号质量
Once the feed has been running for a day or two, sanity-check the eval output at scale.
If is exposed:
posthog:llma-evaluation-summary-createjson
posthog:llma-evaluation-summary-create
{
"evaluation_id": "<uuid>",
"filter": "fail"
}UI fallback: open the eval in LLM analytics → Evaluations → "Summarize results" button, filter = fail.
If the FAIL bucket is large, the classification step is too strict — relax it. If the PASS bucket has lots of generic reasonings, iterate on the prompt to enforce concreteness. The summary tool gives a quick read on this without you having to scroll through individual events.
Spot-check raw events when needed (note: the stored result value is , not — see step 6):
'True''PASS'sql
SELECT
properties.$ai_evaluation_reasoning AS reasoning,
properties.$ai_trace_id AS trace_id,
timestamp
FROM events
WHERE event = '$ai_evaluation'
AND properties.$ai_evaluation_name = '<your eval name>'
AND properties.$ai_evaluation_result = 'True'
AND timestamp > now() - INTERVAL 1 DAY
ORDER BY timestamp DESC
LIMIT 25信息流运行1-2天后,大规模验证评估输出。
如果已开放:
posthog:llma-evaluation-summary-createjson
posthog:llma-evaluation-summary-create
{
"evaluation_id": "<uuid>",
"filter": "fail"
}UI回退方案:在LLM分析→评估中打开评估→点击“总结结果”按钮,筛选条件=fail。
如果FAIL分组数量大,说明分类步骤过于严格——放宽条件。如果PASS分组有大量通用推理,迭代提示词以强制具体化。摘要工具可快速查看此情况,无需逐条浏览事件。
必要时抽样检查原始事件(注意:存储的结果值为,而非——请查看步骤6):
'True''PASS'sql
SELECT
properties.$ai_evaluation_reasoning AS reasoning,
properties.$ai_trace_id AS trace_id,
timestamp
FROM events
WHERE event = '$ai_evaluation'
AND properties.$ai_evaluation_name = '<your eval name>'
AND properties.$ai_evaluation_result = 'True'
AND timestamp > now() - INTERVAL 1 DAY
ORDER BY timestamp DESC
LIMIT 25Tips
提示
- The reasoning field IS the Slack message — design the prompt for that, not for "chain of thought before classification." Models can produce structured Slack-ready text in one pass.
- LLM judges are non-deterministic across reruns. Expect 1-5% noise even with a fixed prompt and model. If you need reproducibility, pin a deterministic provider/seed in .
model_configuration - Keep the eval scoped tightly via on
conditions.filtersprefix. Otherwise it fans out to every$ai_trace_idevent in the project and burns LLM cost.$ai_generation - For high-volume features (>10k traces/week), consider sampling — set the eval to run on a percentage of matching events rather than all of them. Slack flooding is a real failure mode.
- The "View Trigger Session" button is the highest-value link in the alert. Without it, the feed is just text — you can't watch what the user was actually doing. Verify it works in step 7 before considering the feed shipped.
- Once the feed is live, periodically re-run the eval summary tool with to surface the dominant use case clusters. That's how you turn the feed into actual product insights instead of just a notification stream.
filter: "pass"
- 推理字段就是Slack消息——请为此设计提示词,而非“分类前的思维链”。模型可一次性生成结构化的Slack就绪文本。
- 重新运行时,LLM judge的结果是非确定性的。即使提示词和模型固定,也会有1-5%的噪音。如果需要可重复性,请在中固定确定性的提供商/种子。
model_configuration - 通过对
conditions.filters前缀进行严格范围限定。否则评估会应用到项目中的所有$ai_trace_id事件,消耗大量LLM成本。$ai_generation - 对于高量功能(>10k跟踪记录/周),考虑抽样——设置评估仅对一定比例的匹配事件运行,而非全部。Slack消息泛滥是真实的故障模式。
- “View Trigger Session”按钮是告警中价值最高的链接。没有它,信息流只是文本——你无法查看用户实际操作。在步骤7中验证其正常工作后,再认为信息流已上线。
- 信息流上线后,定期使用重新运行评估摘要工具,以发现主要用例集群。这是将信息流转化为实际产品洞察而非单纯通知流的方式。
filter: "pass"