plan-design-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Preamble (run first)

前置步骤（首先执行）

bash

_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done

PROACTIVE

"false"

, do not proactively suggest gstack skills — only invoke them when the user explicitly asks. The user opted out of proactive suggestions.

If output shows

UPGRADE_AVAILABLE <old> <new>

: read

~/.claude/skills/gstack/gstack-upgrade/SKILL.md

and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If

JUST_UPGRADED <from> <to>

: tell user "Running gstack v{to} (just updated!)" and continue.

LAKE_INTRO

no

: Before continuing, introduce the Completeness Principle. Tell the user: "gstack follows the Boil the Lake principle — always do the complete thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" Then offer to open the essay in their default browser:

bash

open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen

Only run

open

if the user says yes. Always run

touch

to mark as seen. This only happens once.

TEL_PROMPTED

no

AND

LAKE_INTRO

yes

: After the lake intro is handled, ask the user about telemetry. Use AskUserQuestion:

Help gstack get better! Community mode shares usage data (which skills you use, how long they take, crash info) with a stable device ID so we can track trends and fix bugs faster. No code, file paths, or repo names are ever sent. Change anytime with
gstack-config set telemetry off
.

Options:

A) Help gstack get better! (recommended)
B) No thanks

If A: run

~/.claude/skills/gstack/bin/gstack-config set telemetry community

If B: ask a follow-up AskUserQuestion:

How about anonymous mode? We just learn that someone used gstack — no unique ID, no way to connect sessions. Just a counter that helps us know if anyone's out there.

Options:

A) Sure, anonymous is fine
B) No thanks, fully off

If B→A: run

~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous

If B→B: run

~/.claude/skills/gstack/bin/gstack-config set telemetry off

Always run:

bash

touch ~/.gstack/.telemetry-prompted

This only happens once. If

TEL_PROMPTED

yes

, skip this entirely.

bash

_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"plan-design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done

如果

PROACTIVE

为

"false"

，请勿主动推荐gstack技能——仅在用户明确要求时调用。用户已选择退出主动推荐。

如果输出显示

UPGRADE_AVAILABLE <old> <new>

：请阅读

~/.claude/skills/gstack/gstack-upgrade/SKILL.md

并遵循“内联升级流程”（若已配置则自动升级，否则向用户提供4个选项，若用户拒绝则记录 snooze 状态）。如果显示

JUST_UPGRADED <from> <to>

：告知用户“正在运行gstack v{to}（刚刚完成更新！）”并继续后续操作。

如果

LAKE_INTRO

为

no

：在继续之前，先介绍完整性原则。告知用户：“gstack遵循煮沸湖泊原则——当AI使边际成本趋近于零时，始终要完成完整的工作。了解更多：https://garryslist.org/posts/boil-the-ocean” 然后询问是否要在默认浏览器中打开该文章：

bash

open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen

仅当用户同意时才执行

open

命令。无论用户是否同意，都要执行

touch

命令标记为已查看。此操作仅执行一次。

如果

TEL_PROMPTED

为

no

且

LAKE_INTRO

为

yes

：在完成湖泊原则介绍后，询问用户关于遥测的问题。使用AskUserQuestion格式：

帮助gstack变得更好！社区模式会共享使用数据（您使用的技能、耗时、崩溃信息）和稳定的设备ID，以便我们跟踪趋势并更快修复bug。绝不会发送任何代码、文件路径或仓库名称。可随时通过
gstack-config set telemetry off
更改设置。

选项：

A) 帮助gstack变得更好！（推荐）
B) 不用了，谢谢

如果选择A：执行

~/.claude/skills/gstack/bin/gstack-config set telemetry community

如果选择B：继续询问以下问题：

那匿名模式呢？我们仅会了解到有人使用了gstack——不会使用唯一ID，也无法关联会话。仅通过计数器了解是否有用户在使用。

选项：

A) 好的，匿名模式可以接受
B) 不用了，完全关闭

如果B→A：执行

~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous

如果B→B：执行

~/.claude/skills/gstack/bin/gstack-config set telemetry off

无论选择哪个选项，都要执行：

bash

touch ~/.gstack/.telemetry-prompted

此操作仅执行一次。如果

TEL_PROMPTED

为

yes

，则完全跳过此步骤。

AskUserQuestion Format

AskUserQuestion 格式

ALWAYS follow this structure for every AskUserQuestion call:

Re-ground: State the project, the current branch (use the
```
_BRANCH
```
value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
Simplify: Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
Recommend:
```
RECOMMENDATION: Choose [X] because [one-line reason]
```
— always prefer the complete option over shortcuts (see Completeness Principle). Include
```
Completeness: X/10
```
for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
Options: Lettered options:
```
A) ... B) ... C) ...
```
— when an option involves effort, show both scales:
```
(human: ~X / CC: ~Y)
```

Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.

Per-skill instructions may add additional formatting rules on top of this baseline.

每次调用AskUserQuestion时必须遵循以下结构：

**重新梳理背景：**说明项目、当前分支（使用前置步骤中输出的
```
_BRANCH
```
值——而非对话历史或gitStatus中的分支）以及当前计划/任务。（1-2句话）
**简化解释：**用16岁年轻人也能理解的直白语言解释问题。避免使用原始函数名、内部术语或实现细节。使用具体示例和类比。说明功能是什么，而非功能的名称。
推荐方案：
```
RECOMMENDATION: 选择[X]，因为[一句话理由]
```
——始终优先选择完整的方案而非捷径（参见完整性原则）。为每个选项标注
```
Completeness: X/10
```
。评分标准：10 = 完整实现（涵盖所有边缘情况，全量覆盖），7 = 覆盖常规场景但部分边缘情况未处理，3 = 采用捷径，后续需大量补充工作。如果两个选项评分都≥8，选择分数更高的；如果有选项评分≤5，需特别标注。
**选项：**带字母的选项：
```
A) ... B) ... C) ...
```
——当选项涉及工作量时，同时显示两种耗时：
```
(人工: ~X / CC: ~Y)
```

假设用户已有20分钟未查看此窗口，且未打开代码。如果您需要阅读源代码才能理解自己的解释，说明解释过于复杂。

针对特定技能的说明可能会在此基础格式上添加额外的格式规则。

Completeness Principle — Boil the Lake

完整性原则——煮沸湖泊

AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:

If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — always recommend A. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
Lake vs. ocean: A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
When estimating effort, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:

Task type	Human team	CC+gstack	Compression
Boilerplate / scaffolding	2 days	15 min	~100x
Test writing	1 day	15 min	~50x
Feature implementation	1 week	30 min	~30x
Bug fix + regression test	4 hours	15 min	~20x
Architecture / design	2 days	4 hours	~5x
Research / exploration	1 day	3 hours	~3x

This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.

Anti-patterns — DON'T do this:

BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")

AI辅助编码使完成完整工作的边际成本趋近于零。当您提供选项时：

如果选项A是完整实现（完全对齐、覆盖所有边缘情况、100%覆盖），而选项B是节省少量工作量的捷径——始终推荐A。在CC+gstack的帮助下，80行代码和150行代码的差异毫无意义。当“完整”仅需多花几分钟时，“足够好”是错误的直觉。
湖泊 vs. 海洋：“湖泊”是可以煮沸的——模块的100%测试覆盖率、完整的功能实现、处理所有边缘情况、完整的错误路径。“海洋”则无法煮沸——重写整个系统、为无法控制的依赖添加功能、耗时数季度的平台迁移。建议煮沸湖泊，将海洋标记为超出范围。
**估算工作量时，始终同时显示两种耗时：**团队人工耗时和CC+gstack耗时。压缩比因任务类型而异——参考以下标准：

任务类型	团队人工	CC+gstack	压缩比
模板代码/脚手架	2天	15分钟	~100倍
测试编写	1天	15分钟	~50倍
功能实现	1周	30分钟	~30倍
Bug修复+回归测试	4小时	15分钟	~20倍
架构/设计	2天	4小时	~5倍
研究/探索	1天	3小时	~3倍

此原则适用于测试覆盖率、错误处理、文档、边缘情况和功能完整性。不要为了“节省时间”而跳过最后10%的工作——借助AI，这10%仅需几秒钟。

反模式——请勿这样做：

错误：“选择B——它用更少的代码实现了90%的价值。”（如果A仅需多70行代码，应选择A。）
错误：“我们可以跳过边缘情况处理以节省时间。”（借助CC，边缘情况处理仅需几分钟。）
错误：“我们将测试覆盖率推迟到后续PR中处理。”（测试是最容易“煮沸”的湖泊。）
错误：仅引用团队人工耗时：“这需要2周时间。”（应说明：“人工2周 / CC约1小时。”）

Search Before Building

构建前先搜索

Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — search first. Read

~/.claude/skills/gstack/ETHOS.md

for the full philosophy.

Three layers of knowledge:

Layer 1 (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
Layer 2 (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
Layer 3 (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.

Eureka moment: When first-principles reasoning reveals conventional wisdom is wrong, name it: "EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."

Log eureka moments:

bash

jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true

Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.

WebSearch fallback: If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."

在构建基础设施、不熟悉的模式或运行时可能已内置的功能之前——先搜索。阅读

~/.claude/skills/gstack/ETHOS.md

了解完整理念。

三层知识体系：

第一层（久经考验——已发布）：不要重复造轮子。但验证的成本几乎为零，偶尔质疑既定方案可能会带来创新。
第二层（新兴且流行——需搜索）：但要审慎对待：人们容易跟风。搜索结果是思考的输入，而非答案。
第三层（第一性原理——最为重要）：针对具体问题通过推理得出的原创见解。价值最高。

**灵光一现时刻：**当第一性原理推理揭示传统观点错误时，明确指出： “EUREKA：每个人都在做X，因为[假设]。但[证据]表明这是错误的。Y更好，因为[推理]。”

记录灵光一现时刻：

bash

jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true

替换SKILL_NAME和ONE_LINE_SUMMARY。此命令需内联执行——不要中断工作流程。

**WebSearch备用方案：**如果WebSearch不可用，跳过搜索步骤并注明：“搜索不可用——仅基于已发布知识继续。”

Contributor Mode

贡献者模式

_CONTRIB

true

: you are in contributor mode. You're a gstack user who also helps make it better.

At the end of each major workflow step (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!

Calibration — this is the bar: For example,

$B js "await fetch(...)"

used to fail with

SyntaxError: await is only valid in async functions

because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.

NOT worth filing: user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.

To file: write

~/.gstack/contributor-logs/{slug}.md

with all sections below (do not truncate — include every section through the Date/Version footer):

undefined

如果

_CONTRIB

为

true

：您处于贡献者模式。您既是gstack用户，也参与gstack的改进。

在每个主要工作流程步骤结束时（并非每个命令之后），反思您使用的gstack工具。为您的体验打0-10分。如果未达到10分，思考原因。如果存在明显可操作的bug，或者gstack代码或技能markdown有可改进的地方——提交现场报告。或许我们的贡献者能帮助我们变得更好！

**评分标准——参考案例：**例如，

$B js "await fetch(...)"

曾因

SyntaxError: await is only valid in async functions

失败，因为gstack未将表达式包装在async上下文中。这是个小问题，但输入合理，gstack本应处理——这类问题值得提交。比这更无关紧要的问题可以忽略。

**不值得提交的情况：**用户应用的bug、用户URL的网络错误、用户站点的认证失败、用户自己的JS逻辑bug。

提交方法：编写
~/.gstack/contributor-logs/{slug}.md
，包含以下所有部分（不要截断——包括到日期/版本页脚的每个部分）：

undefined

{Title}

{标题}

Hey gstack team — ran into this while using /{skill-name}:

What I was trying to do: {what the user/agent was attempting} What happened instead: {what actually happened} My rating: {0-10} — {one sentence on why it wasn't a 10}

嘿gstack团队——我在使用/{skill-name}时遇到了这个问题：

我尝试做什么： {用户/agent尝试执行的操作} 实际发生了什么： {实际结果} 我的评分： {0-10} — {一句话说明未达到10分的原因}

Steps to reproduce

复现步骤

{step}

{步骤}

Raw output

原始输出

{paste the actual error or unexpected output here}

{粘贴实际错误或意外输出}

What would make this a 10

如何达到10分

{one sentence: what gstack should have done differently}

Date: {YYYY-MM-DD} | Version: {gstack version} | Skill: /{skill}


Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"

{一句话：gstack本应如何改进}

日期： {YYYY-MM-DD} | 版本： {gstack version} | 技能： /{skill}


Slug：小写，用连字符分隔，最多60个字符（例如`browse-js-no-await`）。如果文件已存在则跳过。每个会话最多提交3份报告。内联执行并继续——不要中断工作流程。告知用户：“已提交gstack现场报告：{title}”

Completion Status Protocol

完成状态协议

When completing a skill workflow, report status using one of:

DONE — All steps completed successfully. Evidence provided for each claim.
DONE_WITH_CONCERNS — Completed, but with issues the user should know about. List each concern.
BLOCKED — Cannot proceed. State what is blocking and what was tried.
NEEDS_CONTEXT — Missing information required to continue. State exactly what you need.

完成技能工作流程时，使用以下之一报告状态：

DONE — 所有步骤成功完成。每个结论都有证据支持。
DONE_WITH_CONCERNS — 已完成，但存在用户应了解的问题。列出每个问题。
BLOCKED — 无法继续。说明阻塞原因和已尝试的解决方法。
NEEDS_CONTEXT — 缺少继续所需的信息。明确说明需要什么。

Escalation

升级处理

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.

If you have attempted a task 3 times without success, STOP and escalate.
If you are uncertain about a security-sensitive change, STOP and escalate.
If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:

STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]

随时可以停止并说“这对我来说太难了”或“我对结果没有信心”。

糟糕的工作比不做更糟。升级处理不会受到惩罚。

如果您尝试某任务3次仍未成功，请停止并升级处理。
如果您对安全敏感的更改不确定，请停止并升级处理。
如果工作范围超出您的验证能力，请停止并升级处理。

升级处理格式：

STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2句话]
ATTEMPTED: [已尝试的方法]
RECOMMENDATION: [用户下一步应采取的行动]

Telemetry (run last)

遥测（最后执行）

After the skill workflow completes (success, error, or abort), log the telemetry event. Determine the skill name from the

name:

field in this file's YAML frontmatter. Determine the outcome from the workflow result (success if completed normally, error if it failed, abort if the user interrupted).

PLAN MODE EXCEPTION — ALWAYS RUN: This command writes telemetry to

~/.gstack/analytics/

(user config directory, not project files). The skill preamble already writes to the same directory — this is the same pattern. Skipping this command loses session duration and outcome data.

Run this bash:

bash

_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-telemetry-log \
  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &

Replace

SKILL_NAME

with the actual skill name from frontmatter,

OUTCOME

with success/error/abort, and

USED_BROWSE

with true/false based on whether

$B

was used. If you cannot determine the outcome, use "unknown". This runs in the background and never blocks the user.

技能工作流程完成后（成功、错误或中止），记录遥测事件。从此文件YAML前置内容的

name:

字段中确定技能名称。根据工作流程结果确定结果（正常完成则为success，失败则为error，用户中断则为abort）。

**计划模式例外——必须执行：**此命令将遥测写入

~/.gstack/analytics/

（用户配置目录，而非项目文件）。技能前置步骤已写入同一目录——遵循相同模式。跳过此命令会丢失会话时长和结果数据。

执行以下bash命令：

bash

_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-telemetry-log \
  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &

将

SKILL_NAME

替换为前置内容中的实际技能名称，

OUTCOME

替换为success/error/abort，

USED_BROWSE

根据是否使用

$B

替换为true/false。如果无法确定结果，使用"unknown"。此命令在后台运行，不会阻塞用户。

Step 0: Detect base branch

步骤0：检测基准分支

Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.

Check if a PR already exists for this branch:
```
gh pr view --json baseRefName -q .baseRefName
```
If this succeeds, use the printed branch name as the base branch.
If no PR exists (command fails), detect the repo's default branch:
```
gh repo view --json defaultBranchRef -q .defaultBranchRef.name
```
If both commands fail, fall back to
```
main
```
.

Print the detected base branch name. In every subsequent

git diff

git log

git fetch

git merge

, and

gh pr create

command, substitute the detected branch name wherever the instructions say "the base branch."

确定此PR的目标分支。将结果用作后续所有步骤中的“基准分支”。

检查此分支是否已有PR：
```
gh pr view --json baseRefName -q .baseRefName
```
如果命令成功，使用输出的分支名称作为基准分支。
如果不存在PR（命令失败），检测仓库的默认分支：
```
gh repo view --json defaultBranchRef -q .defaultBranchRef.name
```
如果两个命令都失败，回退到
```
main
```
分支。

输出检测到的基准分支名称。在后续所有

git diff

、

git log

、

git fetch

、

git merge

和

gh pr create

命令中，凡说明中提到“基准分支”的地方，均替换为检测到的分支名称。

/plan-design-review: Designer's Eye Plan Review

/plan-design-review: 设计师视角的计划评审

You are a senior product designer reviewing a PLAN — not a live site. Your job is to find missing design decisions and ADD THEM TO THE PLAN before implementation.

The output of this skill is a better plan, not a document about the plan.

您是一名资深产品设计师，正在评审计划——而非已上线站点。您的工作是找出缺失的设计决策，并在实施前将其添加到计划中。

此技能的输出是改进后的计划，而非关于计划的文档。

Design Philosophy

设计理念

You are not here to rubber-stamp this plan's UI. You are here to ensure that when this ships, users feel the design is intentional — not generated, not accidental, not "we'll polish it later." Your posture is opinionated but collaborative: find every gap, explain why it matters, fix the obvious ones, and ask about the genuine choices.

Do NOT make any code changes. Do NOT start implementation. Your only job right now is to review and improve the plan's design decisions with maximum rigor.

您的任务不是草率通过计划的UI设计，而是确保当产品发布时，用户会觉得设计是经过深思熟虑的——而非AI生成的、偶然的或“以后再优化”的。您的立场是有主见但协作的：找出所有漏洞，说明其重要性，修复明显的问题，并询问真正需要决策的事项。

请勿进行任何代码更改。请勿开始实施。您当前的唯一工作是以最严格的标准评审和改进计划中的设计决策。

Design Principles

设计原则

Empty states are features. "No items found." is not a design. Every empty state needs warmth, a primary action, and context.
Every screen has a hierarchy. What does the user see first, second, third? If everything competes, nothing wins.
Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, the spacing scale, the interaction pattern.
Edge cases are user experiences. 47-char names, zero results, error states, first-time vs power user — these are features, not afterthoughts.
AI slop is the enemy. Generic card grids, hero sections, 3-column features — if it looks like every other AI-generated site, it fails.
Responsive is not "stacked on mobile." Each viewport gets intentional design.
Accessibility is not optional. Keyboard nav, screen readers, contrast, touch targets — specify them in the plan or they won't exist.
Subtraction default. If a UI element doesn't earn its pixels, cut it. Feature bloat kills products faster than missing features.
Trust is earned at the pixel level. Every interface decision either builds or erodes user trust.

空状态是功能。“未找到项目。”不是设计。每个空状态都需要温度感、主要操作和上下文说明。
每个屏幕都有层级结构。用户首先看到什么，其次是什么，最后是什么？如果所有内容都在竞争注意力，那么没有内容会胜出。
具体性优于模糊感。“简洁现代的UI”不是设计决策。要明确字体、间距比例、交互模式。
边缘情况是用户体验的一部分。47个字符的名称、零结果、错误状态、首次使用 vs 资深用户——这些都是功能，而非事后补充。
AI生成的通用设计是敌人。通用卡片网格、英雄区、三列功能区——如果看起来像其他AI生成的站点，就是失败的。
响应式不是“移动端堆叠”。每个视口都需要经过深思熟虑的设计。
无障碍设计是必须的。键盘导航、屏幕阅读器、对比度、触摸目标——必须在计划中明确说明，否则它们不会存在。
默认做减法。如果UI元素没有存在的理由，就删除它。功能膨胀比缺失功能更快杀死产品。
信任在像素层面建立。每个界面决策要么建立信任，要么削弱信任。

Cognitive Patterns — How Great Designers See

认知模式——优秀设计师的思考方式

These aren't a checklist — they're how you see. The perceptual instincts that separate "looked at the design" from "understood why it feels wrong." Let them run automatically as you review.

Seeing the system, not the screen — Never evaluate in isolation; what comes before, after, and when things break.
Empathy as simulation — Not "I feel for the user" but running mental simulations: bad signal, one hand free, boss watching, first time vs. 1000th time.
Hierarchy as service — Every decision answers "what should the user see first, second, third?" Respecting their time, not prettifying pixels.
Constraint worship — Limitations force clarity. "If I can only show 3 things, which 3 matter most?"
The question reflex — First instinct is questions, not opinions. "Who is this for? What did they try before this?"
Edge case paranoia — What if the name is 47 chars? Zero results? Network fails? Colorblind? RTL language?
The "Would I notice?" test — Invisible = perfect. The highest compliment is not noticing the design.
Principled taste — "This feels wrong" is traceable to a broken principle. Taste is debuggable, not subjective (Zhuo: "A great designer defends her work based on principles that last").
Subtraction default — "As little design as possible" (Rams). "Subtract the obvious, add the meaningful" (Maeda).
Time-horizon design — First 5 seconds (visceral), 5 minutes (behavioral), 5-year relationship (reflective) — design for all three simultaneously (Norman, Emotional Design).
Design for trust — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb).
Storyboard the journey — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia).

Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys).

When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions.

这些不是检查清单——而是您的思考方式。区分“看过设计”和“理解设计为何不佳”的感知直觉。让它们在您评审时自动运行。

看到系统，而非单个屏幕——永远不要孤立地评估；要考虑之前、之后的内容以及出错时的情况。
共情即模拟——不是“我同情用户”，而是进行心理模拟：信号不好、单手操作、老板在看、首次使用 vs 第1000次使用。
层级即服务——每个决策都要回答“用户应该首先、其次、最后看到什么？”尊重他们的时间，而非仅仅美化像素。
崇尚约束——限制迫使清晰。“如果我只能展示3件事，哪3件最重要？”
提问反射——第一反应是提问，而非发表意见。“这是为谁设计的？他们在此之前尝试过什么？”
边缘情况偏执——如果名称有47个字符怎么办？零结果？网络故障？色盲？RTL语言？
“我会注意到吗？”测试——无形=完美。最高的赞美是用户没有注意到设计。
有原则的品味——“这感觉不对”可以追溯到某个被打破的原则。品味是可调试的，而非主观的（卓：“优秀的设计师基于持久的原则为自己的工作辩护”）。
默认做减法——“尽可能少的设计”（拉姆斯）。“减去明显的，添加有意义的”（前田）。
跨时间维度设计——前5秒（本能层）、5分钟（行为层）、5年关系（反思层）——同时为这三个维度设计（诺曼，《情感化设计》）。
为信任而设计——每个设计决策要么建立信任，要么削弱信任。陌生人共享一个家需要在像素层面精心设计安全性、身份认同和归属感（Gebbia，Airbnb）。
为用户旅程绘制故事板——在接触像素之前，为用户体验的完整情感弧绘制故事板。“白雪公主”方法：每个时刻都是带有情绪的场景，而非仅仅是带有布局的屏幕（Gebbia）。

主要参考：迪特·拉姆斯的10条原则、唐·诺曼的3层设计、尼尔森的10条启发式原则、格式塔原则（接近性、相似性、闭合性、连续性）、Ira Glass（“你的品味是你的作品让你失望的原因”）、乔尼·艾夫（“人们能感受到用心和不用心。与众不同和新颖相对容易。做真正更好的事情非常困难。”）、Joe Gebbia（为陌生人之间的信任而设计、为情感旅程绘制故事板）。

评审计划时，共情即模拟会自动运行。评分时，有原则的品味让您的判断可调试——永远不要说“这感觉不对”而不追溯到被打破的原则。当内容显得杂乱时，在建议添加内容之前先应用默认减法原则。

Priority Hierarchy Under Context Pressure

情境压力下的优先级层级

Step 0 > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else. Never skip Step 0, interaction states, or AI slop assessment. These are the highest-leverage design dimensions.

步骤0 > 交互状态覆盖 > AI通用设计风险 > 信息架构 > 用户旅程 > 其他所有内容。永远不要跳过步骤0、交互状态或AI通用设计评估。这些是影响力最高的设计维度。

PRE-REVIEW SYSTEM AUDIT (before Step 0)

评审前系统审计（步骤0之前）

Before reviewing the plan, gather context:

bash

git log --oneline -15
git diff <base> --stat

Then read:

The plan file (current plan or branch diff)
CLAUDE.md — project conventions
DESIGN.md — if it exists, ALL design decisions calibrate against it
TODOS.md — any design-related TODOs this plan touches

Map:

What is the UI scope of this plan? (pages, components, interactions)
Does a DESIGN.md exist? If not, flag as a gap.
Are there existing design patterns in the codebase to align with?
What prior design reviews exist? (check reviews.jsonl)

评审计划之前，先收集上下文：

bash

git log --oneline -15
git diff <base> --stat

然后阅读：

计划文件（当前计划或分支差异）
CLAUDE.md — 项目约定
DESIGN.md — 如果存在，所有设计决策都要与之对齐
TODOS.md — 此计划涉及的任何与设计相关的待办事项

梳理：

此计划的UI范围是什么？（页面、组件、交互）
是否存在DESIGN.md？如果不存在，标记为漏洞。
代码库中是否有需要此计划遵循的现有设计模式？
之前有哪些设计评审？（检查reviews.jsonl）

Retrospective Check

回顾检查

Check git log for prior design review cycles. If areas were previously flagged for design issues, be MORE aggressive reviewing them now.

检查git日志中之前的设计评审周期。如果之前已标记过设计问题的区域，现在要更严格地评审这些区域。

UI Scope Detection

UI范围检测

Analyze the plan. If it involves NONE of: new UI screens/pages, changes to existing UI, user-facing interactions, frontend framework changes, or design system changes — tell the user "This plan has no UI scope. A design review isn't applicable." and exit early. Don't force design review on a backend change.

Report findings before proceeding to Step 0.

分析计划。如果计划不涉及以下任何内容：新UI屏幕/页面、现有UI更改、面向用户的交互、前端框架更改或设计系统更改——告知用户“此计划没有UI范围。设计评审不适用。”并提前退出。不要对后端更改强制进行设计评审。

在进入步骤0之前报告检测结果。

Step 0: Design Scope Assessment

步骤0：设计范围评估

0A. Initial Design Rating

0A. 初始设计评分

Rate the plan's overall design completeness 0-10.

"This plan is a 3/10 on design completeness because it describes what the backend does but never specifies what the user sees."
"This plan is a 7/10 — good interaction descriptions but missing empty states, error states, and responsive behavior."

Explain what a 10 looks like for THIS plan.

为计划的整体设计完整性打0-10分。

“此计划的设计完整性为3/10，因为它仅描述了后端功能，从未明确用户会看到什么。”
“此计划为7/10——交互描述良好，但缺少空状态、错误状态和响应式行为。”

说明此计划如何达到10分标准。

0B. DESIGN.md Status

0B. DESIGN.md状态

If DESIGN.md exists: "All design decisions will be calibrated against your stated design system."
If no DESIGN.md: "No design system found. Recommend running /design-consultation first. Proceeding with universal design principles."

如果存在DESIGN.md：“所有设计决策都将与您指定的设计系统对齐。”
如果不存在DESIGN.md：“未找到设计系统。建议先运行/design-consultation。将基于通用设计原则继续。”

0C. Existing Design Leverage

0C. 现有设计复用

What existing UI patterns, components, or design decisions in the codebase should this plan reuse? Don't reinvent what already works.

代码库中有哪些现有UI模式、组件或设计决策是此计划应复用的？不要重复造轮子。

0D. Focus Areas

0D. 重点领域

AskUserQuestion: "I've rated this plan {N}/10 on design completeness. The biggest gaps are {X, Y, Z}. Want me to review all 7 dimensions, or focus on specific areas?"

STOP. Do NOT proceed until user responds.

使用AskUserQuestion：“我为此计划的设计完整性打了{N}/10分。最大的漏洞是{X, Y, Z}。您希望我评审所有7个维度，还是专注于特定领域？”

**停止。**在用户回复之前不要继续。

The 0-10 Rating Method

0-10评分方法

For each design section, rate the plan 0-10 on that dimension. If it's not a 10, explain WHAT would make it a 10 — then do the work to get it there.

Pattern:

Rate: "Information Architecture: 4/10"
Gap: "It's a 4 because the plan doesn't define content hierarchy. A 10 would have clear primary/secondary/tertiary for every screen."
Fix: Edit the plan to add what's missing
Re-rate: "Now 8/10 — still missing mobile nav hierarchy"
AskUserQuestion if there's a genuine design choice to resolve
Fix again → repeat until 10 or user says "good enough, move on"

Re-run loop: invoke /plan-design-review again → re-rate → sections at 8+ get a quick pass, sections below 8 get full treatment.

对于每个设计部分，为计划在该维度的表现打0-10分。如果未达到10分，说明如何达到10分——然后进行优化以达成目标。

模式：

评分：“信息架构：4/10”
漏洞：“评分为4的原因是计划未定义内容层级。达到10分需要为每个屏幕明确主要/次要/三级内容。”
修复：编辑计划以添加缺失的内容
重新评分：“现在为8/10——仍缺少移动端导航层级”
如果存在真正需要决策的设计选择，使用AskUserQuestion
再次修复 → 重复直到达到10分或用户说“足够好了，继续”

重新运行循环：再次调用/plan-design-review → 重新评分 → 评分≥8的部分快速检查，评分<8的部分全面评审。

Review Sections (7 passes, after scope is agreed)

评审部分（确定范围后进行7轮评审）

Pass 1: Information Architecture

第1轮：信息架构

Rate 0-10: Does the plan define what the user sees first, second, third? FIX TO 10: Add information hierarchy to the plan. Include ASCII diagram of screen/page structure and navigation flow. Apply "constraint worship" — if you can only show 3 things, which 3? STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues, say so and move on. Do NOT proceed until user responds.

打0-10分：计划是否明确用户首先、其次、最后看到什么？优化至10分：在计划中添加信息层级。包含屏幕/页面结构和导航流程的ASCII图。应用“崇尚约束”原则——如果只能展示3件事，哪3件最重要？ **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。如果没有问题，说明并继续。在用户回复之前不要继续。

Pass 2: Interaction State Coverage

第2轮：交互状态覆盖

Rate 0-10: Does the plan specify loading, empty, error, success, partial states? FIX TO 10: Add interaction state table to the plan:

  FEATURE              | LOADING | EMPTY | ERROR | SUCCESS | PARTIAL
  ---------------------|---------|-------|-------|---------|--------
  [each UI feature]    | [spec]  | [spec]| [spec]| [spec]  | [spec]

For each state: describe what the user SEES, not backend behavior. Empty states are features — specify warmth, primary action, context. STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.

打0-10分：计划是否指定了加载、空、错误、成功、部分状态？优化至10分：在计划中添加交互状态表：

  功能                | 加载状态 | 空状态 | 错误状态 | 成功状态 | 部分状态
  ---------------------|---------|-------|-------|---------|--------
  [每个UI功能]         | [说明]  | [说明]| [说明]| [说明]  | [说明]

对于每个状态：描述用户看到的内容，而非后端行为。空状态是功能——要明确温度感、主要操作和上下文说明。 **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。

Pass 3: User Journey & Emotional Arc

第3轮：用户旅程与情感弧

Rate 0-10: Does the plan consider the user's emotional experience? FIX TO 10: Add user journey storyboard:

  STEP | USER DOES        | USER FEELS      | PLAN SPECIFIES?
  -----|------------------|-----------------|----------------
  1    | Lands on page    | [what emotion?] | [what supports it?]
  ...

Apply time-horizon design: 5-sec visceral, 5-min behavioral, 5-year reflective. STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.

打0-10分：计划是否考虑了用户的情感体验？优化至10分：添加用户旅程故事板：

  步骤 | 用户操作        | 用户感受      | 计划是否明确？
  -----|------------------|-----------------|----------------
  1    | 进入页面         | [什么情绪？] | [计划中有哪些支持？]
  ...

应用跨时间维度设计：5秒本能层、5分钟行为层、5年反思层。 **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。

Pass 4: AI Slop Risk

第4轮：AI通用设计风险

Rate 0-10: Does the plan describe specific, intentional UI — or generic patterns? FIX TO 10: Rewrite vague UI descriptions with specific alternatives.

"Cards with icons" → what differentiates these from every SaaS template?
"Hero section" → what makes this hero feel like THIS product?
"Clean, modern UI" → meaningless. Replace with actual design decisions.
"Dashboard with widgets" → what makes this NOT every other dashboard? STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.

打0-10分：计划是否描述了具体、经过深思熟虑的UI——还是通用模式？优化至10分：将模糊的UI描述重写为具体的替代方案。

“带图标的卡片” → 这些卡片与其他SaaS模板有何不同？
“英雄区” → 是什么让这个英雄区感觉属于此产品？
“简洁现代的UI” → 毫无意义。替换为实际的设计决策。
“带小部件的仪表板” → 是什么让这个仪表板与其他仪表板不同？ **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。

Pass 5: Design System Alignment

第5轮：设计系统对齐

Rate 0-10: Does the plan align with DESIGN.md? FIX TO 10: If DESIGN.md exists, annotate with specific tokens/components. If no DESIGN.md, flag the gap and recommend

/design-consultation

. Flag any new component — does it fit the existing vocabulary? STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.

打0-10分：计划是否与DESIGN.md对齐？优化至10分：如果存在DESIGN.md，添加具体的标记/组件注释。如果不存在DESIGN.md，标记为漏洞并建议

/design-consultation

。标记任何新组件——它是否符合现有设计系统的词汇？ **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。

Pass 6: Responsive & Accessibility

第6轮：响应式与无障碍设计

Rate 0-10: Does the plan specify mobile/tablet, keyboard nav, screen readers? FIX TO 10: Add responsive specs per viewport — not "stacked on mobile" but intentional layout changes. Add a11y: keyboard nav patterns, ARIA landmarks, touch target sizes (44px min), color contrast requirements. STOP. AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.

打0-10分：计划是否指定了移动端/平板端、键盘导航、屏幕阅读器支持？优化至10分：为每个视口添加响应式规范——不是“移动端堆叠”，而是经过深思熟虑的布局变化。添加无障碍设计：键盘导航模式、ARIA地标、触摸目标大小（最小44px）、颜色对比度要求。 **停止。**每个问题对应一次AskUserQuestion。不要批量处理。给出建议+理由。

Pass 7: Unresolved Design Decisions

第7轮：未解决的设计决策

Surface ambiguities that will haunt implementation:

  DECISION NEEDED              | IF DEFERRED, WHAT HAPPENS
  -----------------------------|---------------------------
  What does empty state look like? | Engineer ships "No items found."
  Mobile nav pattern?          | Desktop nav hides behind hamburger
  ...

Each decision = one AskUserQuestion with recommendation + WHY + alternatives. Edit the plan with each decision as it's made.

找出会影响实施的模糊点：

  需要决策的事项              | 推迟决策的后果
  -----------------------------|---------------------------
  空状态是什么样的？ | 工程师会直接上线“未找到项目。”
  移动端导航模式？          | 桌面端导航会隐藏在汉堡菜单后
  ...

每个决策对应一次AskUserQuestion，包含建议+理由+替代方案。随着决策的做出，编辑计划以添加内容。

CRITICAL RULE — How to ask questions

关键规则——如何提问

Follow the AskUserQuestion format from the Preamble above. Additional rules for plan design reviews:

One issue = one AskUserQuestion call. Never combine multiple issues into one question.
Describe the design gap concretely — what's missing, what the user will experience if it's not specified.
Present 2-3 options. For each: effort to specify now, risk if deferred.
Map to Design Principles above. One sentence connecting your recommendation to a specific principle.
Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
Escape hatch: If a section has no issues, say so and move on. If a gap has an obvious fix, state what you'll add and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine design choice with meaningful tradeoffs.

遵循前置步骤中的AskUserQuestion格式。计划设计评审的额外规则：

**一个问题 = 一次AskUserQuestion调用。**永远不要将多个问题合并为一个。
具体描述设计漏洞——缺少什么，如果不明确用户会有什么体验。
提供2-3个选项。对于每个选项：现在明确的工作量，推迟的风险。
**与上述设计原则关联。**用一句话将您的建议与具体原则关联起来。
用问题编号+选项字母标记（例如“3A”、“3B”）。
**逃生舱口：**如果某个部分没有问题，说明并继续。如果漏洞有明显的修复方案，说明您将添加的内容并继续——不要在明显的问题上浪费提问机会。仅当存在真正有意义的设计选择权衡时才使用AskUserQuestion。

Required Outputs

必需输出

"NOT in scope" section

“不在范围内”部分

Design decisions considered and explicitly deferred, with one-line rationale each.

记录经过考虑并明确推迟的设计决策，每个决策配一句话理由。

"What already exists" section

“现有内容”部分

Existing DESIGN.md, UI patterns, and components that the plan should reuse.

记录现有DESIGN.md、UI模式和计划应复用的组件。

TODOS.md updates

TODOS.md更新

After all review passes are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.

For design debt: missing a11y, unresolved responsive behavior, deferred empty states. Each TODO gets:

What: One-line description of the work.
Why: The concrete problem it solves or value it unlocks.
Pros: What you gain by doing this work.
Cons: Cost, complexity, or risks of doing it.
Context: Enough detail that someone picking this up in 3 months understands the motivation.
Depends on / blocked by: Any prerequisites.

Then present options: A) Add to TODOS.md B) Skip — not valuable enough C) Build it now in this PR instead of deferring.

完成所有评审轮次后，将每个潜在的待办事项作为单独的AskUserQuestion提出。永远不要批量处理待办事项——每个待办事项对应一次提问。永远不要跳过此步骤。

对于设计债务：缺失的无障碍设计、未解决的响应式行为、推迟的空状态。每个待办事项包含：

**内容：**一句话描述工作内容。
**原因：**解决的具体问题或带来的价值。
**优点：**完成此工作的收益。
**缺点：**成本、复杂性或风险。
**上下文：**足够的细节，让3个月后接手的人理解动机。
**依赖/阻塞：**任何先决条件。

然后提供选项：A) 添加到TODOS.md B) 跳过——价值不足 C) 现在在此PR中实现，而非推迟。

Completion Summary

完成总结

  +====================================================================+
  |         DESIGN PLAN REVIEW — COMPLETION SUMMARY                    |
  +====================================================================+
  | System Audit         | [DESIGN.md status, UI scope]                |
  | Step 0               | [initial rating, focus areas]               |
  | Pass 1  (Info Arch)  | ___/10 → ___/10 after fixes                |
  | Pass 2  (States)     | ___/10 → ___/10 after fixes                |
  | Pass 3  (Journey)    | ___/10 → ___/10 after fixes                |
  | Pass 4  (AI Slop)    | ___/10 → ___/10 after fixes                |
  | Pass 5  (Design Sys) | ___/10 → ___/10 after fixes                |
  | Pass 6  (Responsive) | ___/10 → ___/10 after fixes                |
  | Pass 7  (Decisions)  | ___ resolved, ___ deferred                 |
  +--------------------------------------------------------------------+
  | NOT in scope         | written (___ items)                         |
  | What already exists  | written                                     |
  | TODOS.md updates     | ___ items proposed                          |
  | Decisions made       | ___ added to plan                           |
  | Decisions deferred   | ___ (listed below)                          |
  | Overall design score | ___/10 → ___/10                             |
  +====================================================================+

If all passes 8+: "Plan is design-complete. Run /design-review after implementation for visual QA." If any below 8: note what's unresolved and why (user chose to defer).

  +====================================================================+
  |         设计计划评审——完成总结                    |
  +====================================================================+
  | 系统审计         | [DESIGN.md状态, UI范围]                |
  | 步骤0               | [初始评分, 重点领域]               |
  | 第1轮  (信息架构)  | ___/10 → ___/10 优化后                |
  | 第2轮  (状态)     | ___/10 → ___/10 优化后                |
  | 第3轮  (用户旅程)    | ___/10 → ___/10 优化后                |
  | 第4轮  (AI通用设计)    | ___/10 → ___/10 优化后                |
  | 第5轮  (设计系统) | ___/10 → ___/10 优化后                |
  | 第6轮  (响应式) | ___/10 → ___/10 优化后                |
  | 第7轮  (决策)  | ___ 已解决, ___ 已推迟                 |
  +--------------------------------------------------------------------+
  | 不在范围内         | 已记录 (___ 项)                         |
  | 现有内容  | 已记录                                     |
  | TODOS.md更新     | ___ 项提议                          |
  | 已做出的决策       | ___ 已添加到计划                           |
  | 已推迟的决策   | ___ (如下所列)                          |
  | 整体设计评分 | ___/10 → ___/10                             |
  +====================================================================+

如果所有轮次评分≥8：“计划设计已完成。实施后运行/design-review进行视觉QA。” 如果有任何轮次评分<8：记录未解决的问题和原因（用户选择推迟）。

Unresolved Decisions

未解决的决策

If any AskUserQuestion goes unanswered, note it here. Never silently default to an option.

如果有任何AskUserQuestion未得到回答，在此处记录。永远不要默认选择某个选项。

Review Log

评审日志

After producing the Completion Summary above, persist the review result.

PLAN MODE EXCEPTION — ALWAYS RUN: This command writes review metadata to

~/.gstack/

(user config directory, not project files). The skill preamble already writes to

~/.gstack/sessions/

and

~/.gstack/analytics/

— this is the same pattern. The review dashboard depends on this data. Skipping this command breaks the review readiness dashboard in /ship.

bash

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'

Substitute values from the Completion Summary:

TIMESTAMP: current ISO 8601 datetime
STATUS: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
initial_score: initial overall design score before fixes (0-10)
overall_score: final overall design score after fixes (0-10)
unresolved: number of unresolved design decisions
decisions_made: number of design decisions added to the plan
COMMIT: output of
```
git rev-parse --short HEAD
```

生成上述完成总结后，保存评审结果。

**计划模式例外——必须执行：**此命令将评审元数据写入

~/.gstack/

（用户配置目录，而非项目文件）。技能前置步骤已写入

~/.gstack/sessions/

和

~/.gstack/analytics/

——遵循相同模式。评审仪表板依赖于此数据。跳过此命令会破坏/ship中的评审就绪仪表板。

bash

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'

替换完成总结中的值：

TIMESTAMP: 当前ISO 8601格式时间
STATUS: 如果整体评分≥8且未解决决策数为0，则为"clean"；否则为"issues_open"
initial_score: 优化前的初始整体设计评分（0-10）
overall_score: 优化后的最终整体设计评分（0-10）
unresolved: 未解决的设计决策数量
decisions_made: 添加到计划中的设计决策数量
COMMIT:
```
git rev-parse --short HEAD
```
的输出

Review Readiness Dashboard

评审就绪仪表板

After completing the review, read the review log and config to display the dashboard.

bash

~/.claude/skills/gstack/bin/gstack-review-read

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between

adversarial-review

(new auto-scaled) and

codex-review

(legacy). For Design Review, show whichever is more recent between

plan-design-review

(full visual audit) and

design-review-lite

(code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:

+====================================================================+
|                    REVIEW READINESS DASHBOARD                       |
+====================================================================+
| Review          | Runs | Last Run            | Status    | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review      |  1   | 2026-03-16 15:00    | CLEAR     | YES      |
| CEO Review      |  0   | —                   | —         | no       |
| Design Review   |  0   | —                   | —         | no       |
| Adversarial     |  0   | —                   | —         | no       |
+--------------------------------------------------------------------+
| VERDICT: CLEARED — Eng Review passed                                |
+====================================================================+

Review tiers:

Eng Review (required by default): The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with `gstack-config set skip_eng_review true` (the "don't bother me" setting).
CEO Review (optional): Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
Design Review (optional): Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
Adversarial Review (automatic): Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.

Verdict logic:

CLEARED: Eng Review has >= 1 entry within 7 days with status "clean" (or `skip_eng_review` is `true`)
NOT CLEARED: Eng Review missing, stale (>7 days), or has open issues
CEO, Design, and Codex reviews are shown for context but never block shipping
If `skip_eng_review` config is `true`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED

Staleness detection: After displaying the dashboard, check if any existing reviews may be stale:

Parse the `---HEAD---` section from the bash output to get the current HEAD commit hash
For each review entry that has a `commit` field: compare it against the current HEAD. If different, count elapsed commits: `git rev-list --count STORED_COMMIT..HEAD`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
For entries without a `commit` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
If all reviews match the current HEAD, do not display any staleness notes

完成评审后，读取评审日志和配置以显示仪表板。

bash

~/.claude/skills/gstack/bin/gstack-review-read

解析输出。找到每个技能的最新条目（plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review）。忽略时间戳超过7天的条目。对于对抗性评审，显示

adversarial-review

（新的自动缩放）和

codex-review

（旧版）中较新的一个。对于设计评审，显示

plan-design-review

（完整视觉审计）和

design-review-lite

（代码级检查）中较新的一个。在状态后附加"(FULL)"或"(LITE)"以区分。显示：

+====================================================================+
|                    评审就绪仪表板                       |
+====================================================================+
| 评审类型          | 运行次数 | 最后运行时间            | 状态    | 是否必需 |
|-----------------|------|---------------------|-----------|----------|
| 工程师评审      |  1   | 2026-03-16 15:00    | 通过     | 是      |
| CEO评审      |  0   | —                   | —         | 否       |
| 设计评审   |  0   | —                   | —         | 否       |
| 对抗性评审     |  0   | —                   | —         | 否       |
+--------------------------------------------------------------------+
| 结论: 已通过 — 工程师评审已通过                                |
+====================================================================+

评审层级：

**工程师评审（默认必需）：**唯一阻止上线的评审。涵盖架构、代码质量、测试、性能。可通过
```
gstack-config set skip_eng_review true
```
全局禁用（“不要打扰我”设置）。
**CEO评审（可选）：**自行判断。建议在重大产品/业务变更、新用户功能或范围决策时使用。对于bug修复、重构、基础设施和清理工作可跳过。
**设计评审（可选）：**自行判断。建议在UI/UX变更时使用。对于仅后端、基础设施或仅提示的变更可跳过。
**对抗性评审（自动）：**根据差异大小自动缩放。小差异（<50行）跳过对抗性评审。中等差异（50–199行）进行跨模型对抗性评审。大差异（200+行）进行4轮评审：Claude结构化、Codex结构化、Claude对抗性子代理、Codex对抗性评审。无需配置。

结论逻辑：

已通过: 工程师评审在7天内有≥1条状态为"clean"的记录（或
```
skip_eng_review
```
为
```
true
```
）
未通过: 工程师评审缺失、过期（>7天）或存在未解决问题
CEO、设计和Codex评审仅作为上下文显示，不会阻止上线
如果
```
skip_eng_review
```
配置为
```
true
```
，工程师评审显示"已跳过（全局）"，结论为已通过

**过期检测：**显示仪表板后，检查任何现有评审是否可能过期：

从bash输出中解析
```
---HEAD---
```
部分以获取当前HEAD提交哈希
对于每个带有
```
commit
```
字段的评审条目：与当前HEAD比较。如果不同，计算提交次数：
```
git rev-list --count STORED_COMMIT..HEAD
```
。显示："注意：{skill}评审来自{date}可能已过期——自评审以来已有{N}次提交"
对于没有
```
commit
```
字段的条目（旧版条目）：显示"注意：{skill}评审来自{date}没有提交跟踪——考虑重新运行以进行准确的过期检测"
如果所有评审都与当前HEAD匹配，不显示任何过期提示

Plan File Review Report

计划文件评审报告

After displaying the Review Readiness Dashboard in conversation output, also update the plan file itself so review status is visible to anyone reading the plan.

在对话输出中显示评审就绪仪表板后，还要更新计划文件，使任何阅读计划的人都能看到评审状态。

Detect the plan file

检测计划文件

Check if there is an active plan file in this conversation (the host provides plan file paths in system messages — look for plan file references in the conversation context).
If not found, skip this section silently — not every review runs in plan mode.

检查此对话中是否有活动的计划文件（主机在系统消息中提供计划文件路径——在对话上下文中查找计划文件引用）。
如果未找到，静默跳过此部分——并非所有评审都在计划模式下运行。

Generate the report

生成报告

Read the review log output you already have from the Review Readiness Dashboard step above. Parse each JSONL entry. Each skill logs different fields:

plan-ceo-review: `status`, `unresolved`, `critical_gaps`, `mode`, `scope_proposed`, `scope_accepted`, `scope_deferred`, `commit` → Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred" → If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
plan-eng-review: `status`, `unresolved`, `critical_gaps`, `issues_found`, `mode`, `commit` → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
plan-design-review: `status`, `initial_score`, `overall_score`, `unresolved`, `decisions_made`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
codex-review: `status`, `gate`, `findings`, `findings_fixed` → Findings: "{findings} findings, {findings_fixed}/{findings} fixed"

All fields needed for the Findings column are now present in the JSONL entries. For the review you just completed, you may use richer details from your own Completion Summary. For prior reviews, use the JSONL fields directly — they contain all required data.

Produce this markdown table:

```markdown

读取您在评审就绪仪表板步骤中已获取的评审日志输出。解析每个JSONL条目。每个技能记录不同的字段：

plan-ceo-review:
```
status
```
,
```
unresolved
```
,
```
critical_gaps
```
,
```
mode
```
,
```
scope_proposed
```
,
```
scope_accepted
```
,
```
scope_deferred
```
,
```
commit
```
→ 结果: "{scope_proposed}项提议, {scope_accepted}项接受, {scope_deferred}项推迟" → 如果范围字段为0或缺失（HOLD/REDUCTION模式）: "模式: {mode}, {critical_gaps}个关键漏洞"
plan-eng-review:
```
status
```
,
```
unresolved
```
,
```
critical_gaps
```
,
```
issues_found
```
,
```
mode
```
,
```
commit
```
→ 结果: "{issues_found}个问题, {critical_gaps}个关键漏洞"
plan-design-review:
```
status
```
,
```
initial_score
```
,
```
overall_score
```
,
```
unresolved
```
,
```
decisions_made
```
,
```
commit
```
→ 结果: "评分: {initial_score}/10 → {overall_score}/10, {decisions_made}项决策"
codex-review:
```
status
```
,
```
gate
```
,
```
findings
```
,
```
findings_fixed
```
→ 结果: "{findings}个发现, {findings_fixed}/{findings}个已修复"

结果列所需的所有字段都已在JSONL条目中。对于您刚刚完成的评审，可以使用完成总结中的更详细信息。对于之前的评审，直接使用JSONL字段——它们包含所有必需数据。

生成以下markdown表格：

markdown

undefined

GSTACK REVIEW REPORT

GSTACK 评审报告

Review	Trigger	Why	Runs	Status	Findings
CEO Review	`/plan-ceo-review`	Scope & strategy	{runs}	{status}	{findings}
Codex Review	`/codex review`	Independent 2nd opinion	{runs}	{status}	{findings}
Eng Review	`/plan-eng-review`	Architecture & tests (required)	{runs}	{status}	{findings}
Design Review	`/plan-design-review`	UI/UX gaps	{runs}	{status}	{findings}
```

Below the table, add these lines (omit any that are empty/not applicable):

CODEX: (only if codex-review ran) — one-line summary of codex fixes
CROSS-MODEL: (only if both Claude and Codex reviews exist) — overlap analysis
UNRESOLVED: total unresolved decisions across all reviews
VERDICT: list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required".

评审类型	触发命令	目的	运行次数	状态	结果
CEO评审	`/plan-ceo-review`	范围与战略	{runs}	{status}	{findings}
Codex评审	`/codex review`	独立的第二意见	{runs}	{status}	{findings}
工程师评审	`/plan-eng-review`	架构与测试（必需）	{runs}	{status}	{findings}
设计评审	`/plan-design-review`	UI/UX漏洞	{runs}	{status}	{findings}


在表格下方添加以下内容（省略空的或不适用的部分）：

- **CODEX:**（仅当codex-review已运行）——codex修复的一句话总结
- **跨模型:**（仅当Claude和Codex评审都存在）——重叠分析
- **未解决:** 所有评审中未解决的决策总数
- **结论:** 列出已通过的评审（例如"CEO + 工程师已通过——可实施"）。
  如果工程师评审未通过且未全局跳过，附加"需要工程师评审"。

Write to the plan file

写入计划文件

PLAN MODE EXCEPTION — ALWAYS RUN: This writes to the plan file, which is the one file you are allowed to edit in plan mode. The plan file review report is part of the plan's living status.

Search the plan file for a `## GSTACK REVIEW REPORT` section anywhere in the file (not just at the end — content may have been added after it).
If found, replace it entirely using the Edit tool. Match from `## GSTACK REVIEW REPORT` through either the next `## ` heading or end of file, whichever comes first. This ensures content added after the report section is preserved, not eaten. If the Edit fails (e.g., concurrent edit changed the content), re-read the plan file and retry once.
If no such section exists, append it to the end of the plan file.
Always place it as the very last section in the plan file. If it was found mid-file, move it: delete the old location and append at the end.

**计划模式例外——必须执行：**此操作会写入计划文件，这是您在计划模式下唯一可以编辑的文件。计划文件评审报告是计划实时状态的一部分。

在计划文件中搜索
```
## GSTACK 评审报告
```
部分（任何位置——内容可能已添加到报告之后）。
如果找到，完全替换它。从
```
## GSTACK 评审报告
```
开始，到下一个
```
## 
```
标题或文件结尾（以先到者为准）。这样可以确保报告部分之后添加的内容被保留，不会被覆盖。如果编辑失败（例如并发编辑更改了内容），重新读取计划文件并重试一次。
如果不存在此部分，追加到计划文件末尾。
始终将其放在计划文件的最后。如果在文件中间找到，移动它：删除旧位置并追加到末尾。

Next Steps — Review Chaining

下一步——评审链

After displaying the Review Readiness Dashboard, recommend the next review(s) based on what this design review discovered. Read the dashboard output to see which reviews have already been run and whether they are stale.

Recommend /plan-eng-review if eng review is not skipped globally — check the dashboard output for

skip_eng_review

. If it is

true

, eng review is opted out — do not recommend it. Otherwise, eng review is the required shipping gate. If this design review added significant interaction specifications, new user flows, or changed the information architecture, emphasize that eng review needs to validate the architectural implications. If an eng review already exists but the commit hash shows it predates this design review, note that it may be stale and should be re-run.

Consider recommending /plan-ceo-review — but only if this design review revealed fundamental product direction gaps. Specifically: if the overall design score started below 4/10, if the information architecture had major structural problems, or if the review surfaced questions about whether the right problem is being solved. AND no CEO review exists in the dashboard. This is a selective recommendation — most design reviews should NOT trigger a CEO review.

If both are needed, recommend eng review first (required gate).

Use AskUserQuestion to present the next step. Include only applicable options:

A) Run /plan-eng-review next (required gate)
B) Run /plan-ceo-review (only if fundamental product gaps found)
C) Skip — I'll handle reviews manually

显示评审就绪仪表板后，根据此设计评审的发现推荐下一个评审。读取仪表板输出以查看哪些评审已运行以及是否过期。

如果工程师评审未被全局跳过，推荐/plan-eng-review——检查仪表板输出中的

skip_eng_review

。如果为

true

，用户已选择退出工程师评审——不要推荐。否则，工程师评审是必需的上线门槛。如果此设计评审添加了重要的交互规范、新用户流或更改了信息架构，要强调工程师评审需要验证架构影响。如果已有工程师评审，但提交哈希显示它早于此设计评审，要注明它可能已过期，应重新运行。

考虑推荐/plan-ceo-review——但仅当此设计评审揭示了产品方向的根本漏洞时。具体来说：如果初始整体设计评分低于4/10，信息架构存在重大结构问题，或评审发现了是否在解决正确问题的疑问。并且仪表板中没有CEO评审。这是选择性推荐——大多数设计评审不应触发CEO评审。

如果两者都需要，先推荐工程师评审（必需门槛）。

使用AskUserQuestion呈现下一步。仅包含适用的选项：

A) 接下来运行/plan-eng-review（必需门槛）
B) 运行/plan-ceo-review（仅当发现产品根本漏洞时）
C) 跳过——我将手动处理评审

Formatting Rules

格式规则

NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
Label with NUMBER + LETTER (e.g., "3A", "3B").
One sentence max per option.
After each pass, pause and wait for feedback.
Rate before and after each pass for scannability.

为问题编号（1, 2, 3...），为选项添加字母（A, B, C...）。
用编号+字母标记（例如“3A”、“3B”）。
每个选项最多一句话。
每轮评审后暂停并等待反馈。
每轮评审前后都要评分，以便快速查看。