triaging-visual-review-runs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Triaging visual review runs

梳理视觉审查运行

Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots, diffs them against committed baseline hashes, and gates the PR until a human approves the visible changes. A PR with visual changes carries a
visual-review
GitHub status check that stays red until each diffed snapshot is approved or tolerated in the VR UI.
This skill teaches an agent how to answer the questions a human reviewer would actually ask, by chaining the read-only VR MCP tools — instead of reaching for
gh pr view
and tab-hopping to the VR web UI.
Visual Review是PostHog的截图回归检测产品:CI会捕获storybook + playwright截图,将其与已提交的基线哈希值进行对比,直到人工批准可见变更后才允许PR合并。带有视觉变更的PR会带有
visual-review
GitHub状态检查标记,在VR UI中每个差异快照被批准或标记为可容忍前,该标记会保持红色。
本技能教Agent如何通过链式调用只读的VR MCP工具,回答人工评审员实际会提出的问题——无需使用
gh pr view
并在VR网页UI间来回切换。

When this skill applies

本技能的适用场景

Trigger this skill on any of:
  • A PR number, branch name, or commit SHA paired with words like visual review, VR, snapshot, screenshot, storybook diff, playwright snapshot, baseline, approve, tolerated, quarantine.
  • Questions about why a PR is blocked, what visually changed, or whether a diff is real.
  • "Is my run done?" / "What's left to review?" / "Has this story flaked recently?"
  • A failing
    visual-review
    GitHub check or a PR comment from the
    posthog-bot
    mentioning visual review.
When the user asks for the rendered diff image itself, the VR web UI is faster — direct them there. This skill is for everything around the diff: status, scope, history, triage.
在以下任意场景触发本技能:
  • PR编号、分支名称或提交SHA搭配以下词汇:visual reviewVRsnapshotscreenshotstorybook diffplaywright snapshotbaselineapprovetoleratedquarantine
  • 询问PR为何被阻塞、视觉上有哪些变化,或是差异是否为真实问题。
  • "我的运行完成了吗?" / "还有哪些需要评审?" / "这个story最近偶发故障了吗?"
  • visual-review
    GitHub检查失败,或是
    posthog-bot
    的PR评论提及视觉审查。
当用户要求查看渲染后的差异图像本身时,VR网页UI速度更快——直接引导用户前往该处。本技能负责处理差异之外的所有事项:状态、范围、历史记录、梳理工作。

Tools

工具

All read-only. None of these require write scopes; approval/toleration still happens in the web UI.
ToolPurpose
posthog:visual-review-runs-list
List runs, filter by
pr_number
/
commit_sha
/
branch
/
review_state
. Start here.
posthog:visual-review-runs-retrieve
Full detail for a single run (status, summary counts, supersession).
posthog:visual-review-runs-snapshots-list
Per-snapshot results inside a run: identifier,
result
, diff %, classification, baseline + current artifact URLs.
posthog:visual-review-runs-snapshot-history-list
A single story's last N runs across master/PRs — the flake check.
posthog:visual-review-runs-counts-retrieve
Aggregate counts for queue triage (how many runs in
needs_review
, etc.).
posthog:visual-review-runs-tolerated-hashes-list
Hashes the team has explicitly accepted as "known flake / acceptable variation".
posthog:visual-review-repos-list
Repos (one per GitHub repo) — usually only one matters; useful for filtering.
posthog:visual-review-repos-retrieve
Repo metadata: baseline file paths, PR-comment configuration.
所有工具均为只读。这些工具均无需写入权限;批准/标记为可容忍仍需在网页UI中操作。
工具名称用途
posthog:visual-review-runs-list
列出运行记录,可按
pr_number
/
commit_sha
/
branch
/
review_state
筛选。从该工具开始操作。
posthog:visual-review-runs-retrieve
获取单个运行记录的完整详情(状态、汇总统计、是否被取代)。
posthog:visual-review-runs-snapshots-list
获取某一运行记录内的每个快照结果:标识符、
result
、差异百分比、分类、基线及当前工件URL。
posthog:visual-review-runs-snapshot-history-list
获取单个story在master/PR上的最近N次运行记录——用于偶发故障检查。
posthog:visual-review-runs-counts-retrieve
获取队列梳理的汇总统计(如处于
needs_review
状态的运行记录数量等)。
posthog:visual-review-runs-tolerated-hashes-list
获取团队明确标记为“已知偶发故障/可接受差异”的哈希值。
posthog:visual-review-repos-list
仓库列表(每个GitHub仓库对应一个)——通常只需关注一个;适用于筛选。
posthog:visual-review-repos-retrieve
获取仓库元数据:基线文件路径、PR评论配置。

Vocabulary cheat sheet

词汇速查

These appear in tool output and matter for interpretation:
  • Run
    review_state
    :
    needs_review
    (open, awaiting human),
    clean
    (zero diffs),
    processing
    (CI still uploading),
    stale
    (a newer run on the same PR has superseded this one — check
    superseded_by_id
    ).
  • Run
    run_type
    :
    storybook
    (component snapshots) or
    playwright
    (full-page e2e snapshots).
  • Snapshot
    result
    :
    unchanged
    ,
    changed
    (real diff),
    new
    (no baseline yet),
    removed
    .
  • Snapshot
    classification_reason
    :
    tolerated_hash
    (matches a known-tolerated hash, no action needed),
    below_threshold
    (under the noise floor),
    exact
    (byte-identical),
    ""
    (real diff requiring review).
  • Snapshot
    review_state
    :
    pending
    or
    approved
    .
  • Run
    summary
    :
    total / changed / new / removed / unchanged / unresolved / tolerated_matched
    unresolved
    is what's actually blocking review.
这些词汇会出现在工具输出中,对解读结果至关重要:
  • 运行记录
    review_state
    needs_review
    (未完成,等待人工处理)、
    clean
    (无差异)、
    processing
    (CI仍在上传中)、
    stale
    (同一PR上的较新运行记录已取代该记录——查看
    superseded_by_id
    )。
  • 运行记录
    run_type
    storybook
    (组件快照)或
    playwright
    (全页端到端快照)。
  • 快照
    result
    unchanged
    (无变化)、
    changed
    (真实差异)、
    new
    (尚无基线)、
    removed
    (已移除)。
  • 快照
    classification_reason
    tolerated_hash
    (匹配已知可容忍哈希值,无需操作)、
    below_threshold
    (低于噪声阈值)、
    exact
    (字节级完全一致)、
    ""
    (需要评审的真实差异)。
  • 快照
    review_state
    pending
    (待处理)或
    approved
    (已批准)。
  • 运行记录
    summary
    total / changed / new / removed / unchanged / unresolved / tolerated_matched
    ——
    unresolved
    是实际阻塞评审的内容。

Workflows

工作流程

"What's the VR status of this PR?"

“这个PR的VR状态如何?”

The single most common job. Map a PR number to its run state in two calls.
  1. posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }
    — sort by
    created_at
    desc, take the latest non-stale one.
  2. If the run has
    summary.changed > 0
    or
    summary.unresolved > 0
    , drill in:
    posthog:visual-review-runs-snapshots-list { id: <run_id> }
    and report the
    changed
    snapshots.
Report back: PR number, run UUID,
review_state
, summary counts, and the
_posthogUrl
deep link so the user can click straight to the diff viewer.
这是最常见的需求。通过两次调用将PR编号映射到其运行状态。
  1. posthog:visual-review-runs-list { pr_number: <n>, limit: 5 }
    —— 按
    created_at
    降序排序,取最新的非过时记录。
  2. 如果运行记录的
    summary.changed > 0
    summary.unresolved > 0
    ,深入查看:
    posthog:visual-review-runs-snapshots-list { id: <run_id> }
    并报告
    changed
    快照。
反馈内容:PR编号、运行记录UUID、
review_state
、汇总统计,以及
_posthogUrl
深度链接,方便用户直接点击进入差异查看器。

"Is the diff real or unrelated?"

“这个差异是真实问题还是无关变化?”

The most useful judgment a code-aware agent can add. Combine three signals: scope match, flake history, and the actual rendered images. The agent should look at the screenshots — not just describe metadata.
  1. Scope check
    git diff master...HEAD --stat
    (or against the PR's base branch) → list of touched paths. Cross-reference with
    posthog:visual-review-runs-snapshots-list { id }
    filtered to
    result: changed
    → story identifiers. Stories are namespaced like
    <area>-<scene>--<story>--<theme>
    ; e.g.
    scenes-app-settings-user--settings-user-profile--dark
    maps to
    frontend/src/scenes/settings/user/...
    . Use this to translate story id → likely source path.
  2. Visual inspection — for each
    changed
    snapshot, the tool result contains
    current_artifact.download_url
    and
    baseline_artifact.download_url
    . These are pre-signed S3 URLs to PNG files; pull them and look:
    bash
    curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
    curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
    Then
    Read
    both files (the Read tool renders images visually) and compare. Things to call out:
    • The actual visible delta (text changed, button moved, layout shift, color drift, missing element).
    • Whether the change is consistent with the diff_pixel_count and diff_percentage in the metadata (e.g. 54% diff but the images look near-identical → screenshot framing changed, not the UI).
    • Whether the baseline and current have different dimensions (
      width
      /
      height
      fields). Mismatched dimensions usually mean the story rendered to a different viewport or didn't fully render before screenshot — a flake signal, not a regression.
  3. Flake history — run the flake check below for any story that looks suspect.
  4. Verdict — combine all three:
    • Scope plausible + visible regression matches the code change → real diff, recommend approval.
    • Scope mismatch + dimensions mismatch + frequent prior changes → flake, recommend tolerating the hash.
    • Scope plausible + visible regression looks unintended → push a fix; do not approve.
Always include a one-line description of what you saw in the images — the user uses this to decide whether to trust your verdict without opening the VR UI themselves.
这是具备代码认知的Agent能提供的最有价值判断。结合三个信号:范围匹配偶发故障历史实际渲染图像。Agent应查看截图——而不只是描述元数据。
  1. 范围检查 ——
    git diff master...HEAD --stat
    (或与PR的基准分支对比)→ 列出改动路径。 与
    posthog:visual-review-runs-snapshots-list { id }
    过滤出的
    result: changed
    结果交叉对比→ story标识符。 Story采用命名空间格式,如
    <area>-<scene>--<story>--<theme>
    ;例如
    scenes-app-settings-user--settings-user-profile--dark
    对应
    frontend/src/scenes/settings/user/...
    。利用这一点将story id转换为可能的源码路径。
  2. 视觉检查 —— 对于每个
    changed
    快照,工具结果包含
    current_artifact.download_url
    baseline_artifact.download_url
    。这些是预签名的S3 URL,指向PNG文件;拉取并查看:
    bash
    curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
    curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
    然后使用Read工具读取两个文件(该工具可可视化渲染图像)并进行对比。需要指出以下内容:
    • 实际可见的差异(文本变更、按钮移动、布局偏移、颜色偏差、元素缺失)。
    • 变更是否与元数据中的
      diff_pixel_count
      diff_percentage
      一致 (例如差异率54%但图像几乎相同→截图取景范围变更,而非UI变更)。
    • 基线和当前图像的尺寸(
      width
      /
      height
      字段)是否不同。尺寸不匹配通常意味着story渲染到了不同视口, 或是在截图前未完全渲染——这是偶发故障的信号,而非回归问题。
  3. 偶发故障历史 —— 对任何看起来可疑的story,运行下方的偶发故障检查。
  4. 结论 —— 结合以上三点:
    • 范围合理 + 可见回归与代码变更一致→真实差异,建议批准。
    • 范围不匹配 + 尺寸不匹配 + 之前频繁变更→偶发故障,建议通过UI标记该哈希值为可容忍。
    • 范围合理 + 可见回归看起来是意外变更→推送修复;不要批准。
务必包含一行对图像中所见内容的描述——用户可据此决定是否信任你的结论,无需打开VR UI。

Flake check: "Has this story been changing?"

偶发故障检查:“这个story是否一直在变化?”

Once you have a suspect snapshot identifier:
posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }
→ returns prior outcomes for the same story.
Verdicts:
  • Mostly
    unchanged
    and this run's diff is the outlier → likely a real regression caused by this PR.
  • Frequent
    changed
    across unrelated branches/master → flaky story; recommend tolerating the hash via the UI.
  • Recent
    removed
    or large-jump dimension change → baseline likely stale; recommend re-baselining on master.
拿到可疑的快照标识符后:
posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> }
→ 返回同一story的过往结果。
结论:
  • 大部分为
    unchanged
    ,本次运行的差异是个例→可能是本次PR导致的真实回归。
  • 在无关分支/master上频繁出现
    changed
    →story存在偶发故障;建议通过UI标记该哈希值为可容忍。
  • 最近出现
    removed
    或尺寸大幅变化→基线可能过时;建议在master上重新生成基线。

Triaging the queue

梳理队列

When the user is doing housekeeping rather than asking about a specific PR:
  1. posthog:visual-review-runs-counts-retrieve
    → total queue size.
  2. posthog:visual-review-runs-list { review_state: needs_review, limit: 50 }
    (paginate if needed).
  3. Group by
    branch
    author or
    run_type
    to surface clusters (e.g., "12 PRs blocked on the same shared component change" usually means a single underlying root cause to address).
  4. Prefer surfacing runs whose
    summary.changed > 0
    over runs that are only
    new
    new
    means no baseline yet, which is usually trivial to approve;
    changed
    is the real review work.
当用户进行内务处理而非询问特定PR时:
  1. posthog:visual-review-runs-counts-retrieve
    → 获取队列总大小。
  2. posthog:visual-review-runs-list { review_state: needs_review, limit: 50 }
    (如有需要可分页)。
  3. branch
    作者或
    run_type
    分组,找出集群问题(例如“12个PR因同一共享组件变更而阻塞”通常意味着存在一个需要解决的根本原因)。
  4. 优先展示
    summary.changed > 0
    的运行记录,而非仅
    new
    的记录——
    new
    意味着尚无基线,通常只需简单批准;
    changed
    才是真正需要评审的工作。

Output expectations

输出预期

For PR-status questions, lead with the verdict in one line, then 2-4 bullets of supporting context. Always include the
_posthogUrl
deep link to the run — humans need to see the rendered images to make the call, the agent can only describe the metadata.
For triage / aggregate questions, a short table beats prose. Group by what the user is going to act on.
对于PR状态查询,先以一行结论开头,再用2-4个项目符号列出支持性上下文。务必包含运行记录的
_posthogUrl
深度链接——人类需要查看渲染图像才能做出决定,Agent只能描述元数据。
对于梳理/汇总查询,简短的表格比散文式描述更合适。按用户将要采取的操作进行分组。

What NOT to do

禁止操作

  • Do not approve or tolerate snapshots from this skill — those endpoints are intentionally not exposed as MCP tools yet. Direct the user to the run's
    _posthogUrl
    .
  • Do not assume the failing GitHub check on a PR is unrelated to VR — if a
    visual-review
    check is red on a PR you're working on, that's the trigger to run this skill.
  • Do not declare a verdict from metadata alone when
    result: changed
    . Pull the baseline and current PNGs and look at them; metadata can only say "something changed", not whether the change is intended.
  • 不要通过本技能批准或标记快照为可容忍——这些端点尚未作为MCP工具开放。引导用户前往运行记录的
    _posthogUrl
  • 不要假设PR上失败的GitHub检查与VR无关——如果你正在处理的PR上
    visual-review
    检查标记为红色,这就是触发本技能的信号。
  • result: changed
    时,不要仅根据元数据就得出结论。拉取基线和当前PNG图像并查看;元数据只能说明“有东西变了”,无法判断变更是否是预期的。",