triaging-visual-review-runs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Triaging visual review runs

梳理视觉审查运行

Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots, diffs them against committed baseline hashes, and gates the PR until a human approves the visible changes. A PR with visual changes carries a

visual-review

GitHub status check that stays red until each diffed snapshot is approved or tolerated in the VR UI.

This skill teaches an agent how to answer the questions a human reviewer would actually ask, by chaining the read-only VR MCP tools — instead of reaching for

gh pr view

and tab-hopping to the VR web UI.

Visual Review是PostHog的截图回归检测产品：CI会捕获storybook + playwright截图，将其与已提交的基线哈希值进行对比，直到人工批准可见变更后才允许PR合并。带有视觉变更的PR会带有

visual-review

GitHub状态检查标记，在VR UI中每个差异快照被批准或标记为可容忍前，该标记会保持红色。

本技能教Agent如何通过链式调用只读的VR MCP工具，回答人工评审员实际会提出的问题——无需使用

gh pr view

并在VR网页UI间来回切换。

When this skill applies

本技能的适用场景

Trigger this skill on any of:

A PR number, branch name, or commit SHA paired with words like visual review, VR, snapshot, screenshot, storybook diff, playwright snapshot, baseline, approve, tolerated, quarantine.
Questions about why a PR is blocked, what visually changed, or whether a diff is real.
"Is my run done?" / "What's left to review?" / "Has this story flaked recently?"
A failing
```
visual-review
```
GitHub check or a PR comment from the
```
posthog-bot
```
mentioning visual review.

When the user asks for the rendered diff image itself, the VR web UI is faster — direct them there. This skill is for everything around the diff: status, scope, history, triage.

在以下任意场景触发本技能：

PR编号、分支名称或提交SHA搭配以下词汇：visual review、VR、snapshot、screenshot、storybook diff、playwright snapshot、baseline、approve、tolerated、quarantine。
询问PR为何被阻塞、视觉上有哪些变化，或是差异是否为真实问题。
"我的运行完成了吗？" / "还有哪些需要评审？" / "这个story最近偶发故障了吗？"
```
visual-review
```
GitHub检查失败，或是
```
posthog-bot
```
的PR评论提及视觉审查。

当用户要求查看渲染后的差异图像本身时，VR网页UI速度更快——直接引导用户前往该处。本技能负责处理差异之外的所有事项：状态、范围、历史记录、梳理工作。

Tools

工具

All read-only. None of these require write scopes; approval/toleration still happens in the web UI.

Tool	Purpose
`posthog:visual-review-runs-list`	List runs, filter by `pr_number` / `commit_sha` / `branch` / `review_state` . Start here.
`posthog:visual-review-runs-retrieve`	Full detail for a single run (status, summary counts, supersession).
`posthog:visual-review-runs-snapshots-list`	Per-snapshot results inside a run: identifier, `result` , diff %, classification, baseline + current artifact URLs.
`posthog:visual-review-runs-snapshot-history-list`	A single story's last N runs across master/PRs — the flake check.
`posthog:visual-review-runs-counts-retrieve`	Aggregate counts for queue triage (how many runs in `needs_review` , etc.).
`posthog:visual-review-runs-tolerated-hashes-list`	Hashes the team has explicitly accepted as "known flake / acceptable variation".
`posthog:visual-review-repos-list`	Repos (one per GitHub repo) — usually only one matters; useful for filtering.
`posthog:visual-review-repos-retrieve`	Repo metadata: baseline file paths, PR-comment configuration.

所有工具均为只读。这些工具均无需写入权限；批准/标记为可容忍仍需在网页UI中操作。

工具名称	用途
`posthog:visual-review-runs-list`	列出运行记录，可按 `pr_number` / `commit_sha` / `branch` / `review_state` 筛选。从该工具开始操作。
`posthog:visual-review-runs-retrieve`	获取单个运行记录的完整详情（状态、汇总统计、是否被取代）。
`posthog:visual-review-runs-snapshots-list`	获取某一运行记录内的每个快照结果：标识符、 `result` 、差异百分比、分类、基线及当前工件URL。
`posthog:visual-review-runs-snapshot-history-list`	获取单个story在master/PR上的最近N次运行记录——用于偶发故障检查。
`posthog:visual-review-runs-counts-retrieve`	获取队列梳理的汇总统计（如处于 `needs_review` 状态的运行记录数量等）。
`posthog:visual-review-runs-tolerated-hashes-list`	获取团队明确标记为“已知偶发故障/可接受差异”的哈希值。
`posthog:visual-review-repos-list`	仓库列表（每个GitHub仓库对应一个）——通常只需关注一个；适用于筛选。
`posthog:visual-review-repos-retrieve`	获取仓库元数据：基线文件路径、PR评论配置。

Vocabulary cheat sheet

词汇速查

These appear in tool output and matter for interpretation:

Run
review_state
:
```
needs_review
```
(open, awaiting human),
```
clean
```
(zero diffs),
```
processing
```
(CI still uploading),
```
stale
```
(a newer run on the same PR has superseded this one — check
```
superseded_by_id
```
).
Run
run_type
:
```
storybook
```
(component snapshots) or
```
playwright
```
(full-page e2e snapshots).
Snapshot
result
:
```
unchanged
```
,
```
changed
```
(real diff),
```
new
```
(no baseline yet),
```
removed
```
.
Snapshot
classification_reason
:
```
tolerated_hash
```
(matches a known-tolerated hash, no action needed),
```
below_threshold
```
(under the noise floor),
```
exact
```
(byte-identical),
```
""
```
(real diff requiring review).
Snapshot
review_state
:
```
pending
```
or
```
approved
```
.

Run
summary
:

total / changed / new / removed / unchanged / unresolved / tolerated_matched

—

unresolved

is what's actually blocking review.

这些词汇会出现在工具输出中，对解读结果至关重要：

运行记录
review_state
：
```
needs_review
```
（未完成，等待人工处理）、
```
clean
```
（无差异）、
```
processing
```
（CI仍在上传中）、
```
stale
```
（同一PR上的较新运行记录已取代该记录——查看
```
superseded_by_id
```
）。
运行记录
run_type
：
```
storybook
```
（组件快照）或
```
playwright
```
（全页端到端快照）。
快照
result
：
```
unchanged
```
（无变化）、
```
changed
```
（真实差异）、
```
new
```
（尚无基线）、
```
removed
```
（已移除）。
快照
classification_reason
：
```
tolerated_hash
```
（匹配已知可容忍哈希值，无需操作）、
```
below_threshold
```
（低于噪声阈值）、
```
exact
```
（字节级完全一致）、
```
""
```
（需要评审的真实差异）。
快照
review_state
：
```
pending
```
（待处理）或
```
approved
```
（已批准）。

运行记录
summary
：

total / changed / new / removed / unchanged / unresolved / tolerated_matched

——

unresolved

是实际阻塞评审的内容。

Workflows

工作流程

"What's the VR status of this PR?"

“这个PR的VR状态如何？”

The single most common job. Map a PR number to its run state in two calls.

posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }

— sort by

created_at

desc, take the latest non-stale one.

If the run has

summary.changed > 0

summary.unresolved > 0

, drill in:

posthog:visual-review-runs-snapshots-list { id: <run_id> }

and report the

changed

snapshots.

Report back: PR number, run UUID,

review_state

, summary counts, and the

_posthogUrl

deep link so the user can click straight to the diff viewer.

这是最常见的需求。通过两次调用将PR编号映射到其运行状态。

posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }

—— 按

created_at

降序排序，取最新的非过时记录。

如果运行记录的

summary.changed > 0

或

summary.unresolved > 0

，深入查看：

posthog:visual-review-runs-snapshots-list { id: <run_id> }

并报告

changed

快照。

反馈内容：PR编号、运行记录UUID、

review_state

、汇总统计，以及

_posthogUrl

深度链接，方便用户直接点击进入差异查看器。

"Is the diff real or unrelated?"

“这个差异是真实问题还是无关变化？”

The most useful judgment a code-aware agent can add. Combine three signals: scope match, flake history, and the actual rendered images. The agent should look at the screenshots — not just describe metadata.

Scope check —
```
git diff master...HEAD --stat
```
(or against the PR's base branch) → list of touched paths. Cross-reference with
```
posthog:visual-review-runs-snapshots-list { id }
```
filtered to
```
result: changed
```
→ story identifiers. Stories are namespaced like
```
<area>-<scene>--<story>--<theme>
```
; e.g.
```
scenes-app-settings-user--settings-user-profile--dark
```
maps to
```
frontend/src/scenes/settings/user/...
```
. Use this to translate story id → likely source path.
Visual inspection — for each
```
changed
```
snapshot, the tool result contains
```
current_artifact.download_url
```
and
```
baseline_artifact.download_url
```
. These are pre-signed S3 URLs to PNG files; pull them and look:
bash
```
curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
```
Then
```
Read
```
both files (the Read tool renders images visually) and compare. Things to call out:
- The actual visible delta (text changed, button moved, layout shift, color drift, missing element).
- Whether the change is consistent with the diff_pixel_count and diff_percentage in the metadata (e.g. 54% diff but the images look near-identical → screenshot framing changed, not the UI).
- Whether the baseline and current have different dimensions (
```
width
```
  /
```
height
```
  fields). Mismatched dimensions usually mean the story rendered to a different viewport or didn't fully render before screenshot — a flake signal, not a regression.
Flake history — run the flake check below for any story that looks suspect.
Verdict — combine all three:
- Scope plausible + visible regression matches the code change → real diff, recommend approval.
- Scope mismatch + dimensions mismatch + frequent prior changes → flake, recommend tolerating the hash.
- Scope plausible + visible regression looks unintended → push a fix; do not approve.

Always include a one-line description of what you saw in the images — the user uses this to decide whether to trust your verdict without opening the VR UI themselves.

这是具备代码认知的Agent能提供的最有价值判断。结合三个信号：范围匹配、偶发故障历史和实际渲染图像。Agent应查看截图——而不只是描述元数据。

范围检查 ——
```
git diff master...HEAD --stat
```
（或与PR的基准分支对比）→ 列出改动路径。与
```
posthog:visual-review-runs-snapshots-list { id }
```
过滤出的
```
result: changed
```
结果交叉对比→ story标识符。 Story采用命名空间格式，如
```
<area>-<scene>--<story>--<theme>
```
；例如
```
scenes-app-settings-user--settings-user-profile--dark
```
对应
```
frontend/src/scenes/settings/user/...
```
。利用这一点将story id转换为可能的源码路径。
视觉检查 —— 对于每个
```
changed
```
快照，工具结果包含
```
current_artifact.download_url
```
和
```
baseline_artifact.download_url
```
。这些是预签名的S3 URL，指向PNG文件；拉取并查看：
bash
```
curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
```
然后使用Read工具读取两个文件（该工具可可视化渲染图像）并进行对比。需要指出以下内容：
- 实际可见的差异（文本变更、按钮移动、布局偏移、颜色偏差、元素缺失）。
- 变更是否与元数据中的
```
diff_pixel_count
```
  和
```
diff_percentage
```
  一致（例如差异率54%但图像几乎相同→截图取景范围变更，而非UI变更）。
- 基线和当前图像的尺寸（
```
width
```
  /
```
height
```
  字段）是否不同。尺寸不匹配通常意味着story渲染到了不同视口，或是在截图前未完全渲染——这是偶发故障的信号，而非回归问题。
偶发故障历史 —— 对任何看起来可疑的story，运行下方的偶发故障检查。
结论 —— 结合以上三点：
- 范围合理 + 可见回归与代码变更一致→真实差异，建议批准。
- 范围不匹配 + 尺寸不匹配 + 之前频繁变更→偶发故障，建议通过UI标记该哈希值为可容忍。
- 范围合理 + 可见回归看起来是意外变更→推送修复；不要批准。

务必包含一行对图像中所见内容的描述——用户可据此决定是否信任你的结论，无需打开VR UI。

Flake check: "Has this story been changing?"

偶发故障检查：“这个story是否一直在变化？”

Once you have a suspect snapshot identifier:

posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }

→ returns prior outcomes for the same story.

Verdicts:

Mostly
```
unchanged
```
and this run's diff is the outlier → likely a real regression caused by this PR.
Frequent
```
changed
```
across unrelated branches/master → flaky story; recommend tolerating the hash via the UI.
Recent
```
removed
```
or large-jump dimension change → baseline likely stale; recommend re-baselining on master.

拿到可疑的快照标识符后：

posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }

→ 返回同一story的过往结果。

结论：

大部分为
```
unchanged
```
，本次运行的差异是个例→可能是本次PR导致的真实回归。
在无关分支/master上频繁出现
```
changed
```
→story存在偶发故障；建议通过UI标记该哈希值为可容忍。
最近出现
```
removed
```
或尺寸大幅变化→基线可能过时；建议在master上重新生成基线。

Triaging the queue

梳理队列

When the user is doing housekeeping rather than asking about a specific PR:

posthog:visual-review-runs-counts-retrieve

→ total queue size.

posthog:visual-review-runs-list { review_state: needs_review, limit: 50 }

(paginate if needed).

Group by
```
branch
```
author or
```
run_type
```
to surface clusters (e.g., "12 PRs blocked on the same shared component change" usually means a single underlying root cause to address).
Prefer surfacing runs whose
```
summary.changed > 0
```
over runs that are only
```
new
```
—
```
new
```
means no baseline yet, which is usually trivial to approve;
```
changed
```
is the real review work.

当用户进行内务处理而非询问特定PR时：

posthog:visual-review-runs-counts-retrieve

→ 获取队列总大小。

posthog:visual-review-runs-list { review_state: needs_review, limit: 50 }

（如有需要可分页）。

按
```
branch
```
作者或
```
run_type
```
分组，找出集群问题（例如“12个PR因同一共享组件变更而阻塞”通常意味着存在一个需要解决的根本原因）。
优先展示
```
summary.changed > 0
```
的运行记录，而非仅
```
new
```
的记录——
```
new
```
意味着尚无基线，通常只需简单批准；
```
changed
```
才是真正需要评审的工作。

Output expectations

输出预期

For PR-status questions, lead with the verdict in one line, then 2-4 bullets of supporting context. Always include the

_posthogUrl

deep link to the run — humans need to see the rendered images to make the call, the agent can only describe the metadata.

For triage / aggregate questions, a short table beats prose. Group by what the user is going to act on.

对于PR状态查询，先以一行结论开头，再用2-4个项目符号列出支持性上下文。务必包含运行记录的

_posthogUrl

深度链接——人类需要查看渲染图像才能做出决定，Agent只能描述元数据。

对于梳理/汇总查询，简短的表格比散文式描述更合适。按用户将要采取的操作进行分组。

What NOT to do

禁止操作

Do not approve or tolerate snapshots from this skill — those endpoints are intentionally not exposed as MCP tools yet. Direct the user to the run's
```
_posthogUrl
```
.
Do not assume the failing GitHub check on a PR is unrelated to VR — if a
```
visual-review
```
check is red on a PR you're working on, that's the trigger to run this skill.
Do not declare a verdict from metadata alone when
```
result: changed
```
. Pull the baseline and current PNGs and look at them; metadata can only say "something changed", not whether the change is intended.

不要通过本技能批准或标记快照为可容忍——这些端点尚未作为MCP工具开放。引导用户前往运行记录的
```
_posthogUrl
```
。
不要假设PR上失败的GitHub检查与VR无关——如果你正在处理的PR上
```
visual-review
```
检查标记为红色，这就是触发本技能的信号。
当
```
result: changed
```
时，不要仅根据元数据就得出结论。拉取基线和当前PNG图像并查看；元数据只能说明“有东西变了”，无法判断变更是否是预期的。",