configuring-experiment-analytics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Configuring experiment analytics

配置实验分析

This skill answers: Who is included in the analysis? and How to measure impact?

本技能可解答以下问题：哪些用户会被纳入分析？以及如何衡量实验影响？

Exposure criteria

曝光判定标准

Exposure criteria determine which users are counted in the experiment analysis.

曝光判定标准决定哪些用户会被计入实验分析。

Include people when

纳入用户的条件

Two options:

Feature flag called (default) — users are included when the
```
$feature_flag_called
```
event fires for the experiment's flag. This is the standard approach — it means a user is included only when they actually encounter the feature flag in your code.
Custom exposure event — users are included when a specific custom event fires. Use this when you want tighter control over who enters the analysis (e.g., only users who actually visit the page where the experiment runs).

有两种选项：

调用功能标志（默认）——当实验对应的标志触发
```
$feature_flag_called
```
事件时，用户将被纳入分析。这是标准做法，意味着只有当用户在代码中实际接触到该功能标志时，才会被纳入分析。
自定义曝光事件——当特定的自定义事件触发时，用户将被纳入分析。当你需要更严格地控制哪些用户进入分析时（例如，仅统计实际访问实验运行页面的用户），可使用此选项。

Multiple variant handling

多变体处理

When a user is exposed to multiple variants (e.g., due to flag changes or race conditions):

Exclude multivariate users — removes these users from the analysis entirely. Cleaner data, smaller sample.
First seen variant — assigns users to the first variant they were exposed to. Keeps all users in the analysis. Note that "first seen" can introduce other biases as behavior cannot be clearly attributed to a single variant and is not recommended unless necessary.

Bias risk on uneven splits. "Exclude multivariate users" combined with an uneven variant split can introduce bias — multi-variant users are dropped asymmetrically and the smaller variant loses a larger fraction of its assignments. If those users behave differently from the rest, the smaller variant's metrics will be skewed.

The right mitigation depends on experiment state:

Not yet launched, or only exposed to a few users so far — switch to an even variant split and use the overall rollout percentage to limit test-variant exposure. This removes the bias and preserves statistical power. See
```
configuring-experiment-rollout
```
.
Live experiment with significant exposures — changing the split mid-run reassigns users across variants, which is bad for user experience and data quality. Switch this setting to "First seen variant" instead — it keeps already-assigned users in their original variant (no reassignment) and removes the asymmetric exclusion.

当用户接触到多个变体时（例如，由于标志变更或竞态条件）：

排除多变量用户——将这类用户完全从分析中移除。数据更干净，但样本量会减少。
首次看到的变体——将用户分配给他们首次接触到的变体。保留所有用户在分析中。注意，「首次看到」可能会引入其他偏差，因为用户行为无法明确归因于单个变体，因此除非必要，不建议使用此方式。

非均匀拆分的偏差风险。「排除多变量用户」结合非均匀变体拆分可能会引入偏差——多变体用户会被不对称地剔除，较小的变体将失去更大比例的分配用户。如果这些用户的行为与其他用户不同，较小变体的指标会出现偏差。

正确的缓解措施取决于实验状态：

尚未启动，或仅向少量用户开放——切换为均匀变体拆分，并使用整体发布百分比来限制测试变体的曝光范围。这样可以消除偏差并保留统计效力。请参阅
```
configuring-experiment-rollout
```
。
已上线且已有大量曝光——中途更改拆分比例会在不同变体间重新分配用户，这对用户体验和数据质量都不利。建议将设置切换为「首次看到的变体」——这样可以让已分配的用户保留在原变体中（不会重新分配），同时避免不对称剔除的问题。

Filter test accounts

过滤测试账号

exposure_criteria.filterTestAccounts

(default: true) — excludes internal/test users from the analysis.

exposure_criteria.filterTestAccounts

（默认值：true）——将内部/测试用户排除在分析之外。

Resolving experiments

解析实验

Metric changes require an experiment ID. If the user refers to an experiment by name or description (e.g. "add metrics to the checkout test"), load the

finding-experiments

skill to resolve it to a concrete ID before proceeding.

修改指标需要实验ID。如果用户通过名称或描述指代实验（例如「为结账测试添加指标」），请先调用

finding-experiments

技能将其解析为具体ID，然后再继续操作。

Metrics

指标

Metrics are added via

experiment-update

after creation. The

metrics

array replaces the entire list, so always get the current experiment first via

experiment-get

to preserve existing metrics.

实验创建后，可通过

experiment-update

添加指标。

metrics

数组会替换整个指标列表，因此在操作前务必通过

experiment-get

获取当前实验的现有指标，以保留原有内容。

Step 1: Discover available events (REQUIRED — always do this first)

步骤1：发现可用事件（必填——务必首先执行此步骤）

Before suggesting or configuring ANY metric, you MUST call

read-data-schema

to discover what events actually exist in the project. Do NOT skip this step. Do NOT suggest event names based on what you think the project might track — only use events you have confirmed exist.

This applies even when:

The user provides event names — look them up to confirm they exist and are spelled correctly
The user asks "what metrics do you suggest?" — look up events first, then suggest from real data
The context makes certain events seem obvious — they may not exist or may be named differently

Workflow:

Call
```
read-data-schema
```
to get the project's events
Present relevant events to the user based on the experiment's hypothesis
User picks which events to use for metrics
Configure metrics with those confirmed event names

Legitimate exception —
allow_unknown_events: true
: Pass this on

experiment-create

experiment-update

only when the user is intentionally instrumenting an event that hasn't been ingested yet (e.g. setting up the experiment before the code change ships). Confirm this with the user — never use it as a workaround for "the event lookup didn't return what I expected".

Example:

text

User: "Let's add some metrics for the checkout experiment"

WRONG: "I'd suggest using purchase_completed as the primary metric..."
  (hallucinated event name — never seen the project's actual events)

RIGHT: *calls read-data-schema* → "Here are the events in your project
  related to checkout: `checkout_step_completed`, `payment_processed`,
  `order_confirmed`. Which of these represents a successful checkout?"

在建议或配置任何指标之前，你必须调用

read-data-schema

来发现项目中实际存在的事件。请勿跳过此步骤。请勿根据你认为项目可能跟踪的内容来建议事件名称——仅使用已确认存在的事件。

即使在以下情况下，也必须执行此步骤：

用户提供了事件名称——查找并确认这些事件存在且拼写正确
用户询问「你建议使用哪些指标？」——先查找事件，再根据真实数据给出建议
上下文似乎指向某些明显的事件——这些事件可能不存在或命名不同

工作流程：

调用
```
read-data-schema
```
获取项目的事件列表
根据实验假设向用户展示相关事件
用户选择用于指标的事件
使用这些已确认的事件名称配置指标

合理例外情况——
allow_unknown_events: true
：仅当用户有意配置尚未被采集的事件（例如，在代码变更发布前设置实验）时，才在

experiment-create

experiment-update

中传递此参数。请与用户确认——切勿将其用作「事件查找未返回预期结果」的变通方案。

示例：

text

用户："为结账实验添加一些指标"

错误做法："我建议使用purchase_completed作为主指标……"
  （凭空捏造事件名称——从未查看项目的实际事件）

正确做法：*调用read-data-schema* → "以下是你的项目中与结账相关的事件：`checkout_step_completed`、`payment_processed`、`order_confirmed`。其中哪一个代表结账成功？"

Step 2: Choose metric type

步骤2：选择指标类型

There are four metric types. Each has

kind: "ExperimentMetric"

metric_type	When to use	Key fields
`"mean"`	Average of a numeric property per user (revenue, session duration, pageviews per user)	`source` EventsNode
`"funnel"`	Conversion rate from exposure through one or more ordered actions	`series` array of EventsNode steps (1 or more)
`"ratio"`	Rate of one event relative to another	`numerator` , `denominator` EventsNode
`"retention"`	Do users come back after exposure?	`start_event` , `completion_event` , window config

Funnel metrics and the implicit exposure step

Funnel metrics automatically prepend the experiment's exposure event as

step_0

. So a funnel with 1 step in

series

is a valid 2-step funnel: exposure → action. This is the correct choice for measuring "what percentage of exposed users did X?"

Examples:

"What % of exposed users reached /login?" → funnel with 1 step (
```
$pageview
```
filtered to /login)
"What % of exposed users completed checkout?" → funnel with 1 step (
```
checkout_completed
```
)
"What % of exposed users went cart → checkout → purchase?" → funnel with 3 steps

Mean vs funnel for the same event

Mean measures average count/value per user (e.g. "pageviews per user", "revenue per user").
Funnel measures conversion rate (e.g. "% of exposed users who purchased").

Both can reference the same event — the difference is whether you care about count/magnitude (mean) or yes/no conversion (funnel).

See

references/metric-configuration.md

for detailed JSON examples of each type.

共有四种指标类型。每种类型的

kind

均为

"ExperimentMetric"

：

指标类型	使用场景	关键字段
`"mean"`	每个用户的数值属性平均值（收入、会话时长、人均页面浏览量）	`source` EventsNode
`"funnel"`	从曝光到完成一个或多个有序操作的转化率	`series` 数组（包含1个或多个EventsNode步骤）
`"ratio"`	一个事件相对于另一个事件的比率	`numerator` 、 `denominator` EventsNode
`"retention"`	用户在曝光后是否会回访？	`start_event` 、 `completion_event` 、窗口期配置

漏斗指标与隐含的曝光步骤

漏斗指标会自动将实验的曝光事件作为

step_0

前置。因此，

series

中包含1个步骤的漏斗是有效的两步漏斗：曝光 → 操作。这是衡量「有多少比例的曝光用户完成了X操作？」的正确选择。

示例：

"有多少比例的曝光用户访问了/login页面？" → 包含1个步骤的漏斗（过滤到/login页面的
```
$pageview
```
事件）
"有多少比例的曝光用户完成了结账？" → 包含1个步骤的漏斗（
```
checkout_completed
```
事件）
"有多少比例的曝光用户经历了购物车 → 结账 → 购买的流程？" → 包含3个步骤的漏斗

同一事件的均值指标与漏斗指标对比

均值指标衡量每个用户的平均计数/数值（例如「人均页面浏览量」「人均收入」）。
漏斗指标衡量转化率（例如「完成购买的曝光用户占比」）。

两者可以引用相同的事件——区别在于你关注的是计数/数值大小（均值）还是是否完成转化（漏斗）。

有关每种类型的详细JSON示例，请参阅

references/metric-configuration.md

。

Step 3: Primary vs secondary

步骤3：主指标 vs 次指标

Primary metrics — the main success criteria for the experiment. These drive the ship/end decision.
Secondary metrics — additional measurements for context. Useful for guardrail metrics (e.g., ensuring a conversion improvement doesn't increase error rates).

主指标——实验的主要成功判定标准。这些指标将决定是否发布/结束实验。
次指标——用于补充上下文的额外测量指标。适用于 guardrail 指标（例如，确保转化率提升不会同时增加错误率）。

Interpreting results

解读结果

See

references/interpreting-results.md

for guidance on reading experiment results, statistical significance, and when to ship vs end.

有关如何解读实验结果、统计显著性以及何时发布或结束实验的指导，请参阅

references/interpreting-results.md

。",