benchmark-models

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Preamble (run first)

前置操作（需先运行）

bash

_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -exec rm {} + 2>/dev/null || true
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_PROACTIVE_PROMPTED=$([ -f ~/.gstack/.proactive-prompted ] && echo "yes" || echo "no")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
_SKILL_PREFIX=$(~/.claude/skills/gstack/bin/gstack-config get skill_prefix 2>/dev/null || echo "false")
echo "PROACTIVE: $_PROACTIVE"
echo "PROACTIVE_PROMPTED: $_PROACTIVE_PROMPTED"
echo "SKILL_PREFIX: $_SKILL_PREFIX"
source <(~/.claude/skills/gstack/bin/gstack-repo-mode 2>/dev/null) || true
REPO_MODE=${REPO_MODE:-unknown}
echo "REPO_MODE: $REPO_MODE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EXPLAIN_LEVEL=$(~/.claude/skills/gstack/bin/gstack-config get explain_level 2>/dev/null || echo "default")
if [ "$_EXPLAIN_LEVEL" != "default" ] && [ "$_EXPLAIN_LEVEL" != "terse" ]; then _EXPLAIN_LEVEL="default"; fi
echo "EXPLAIN_LEVEL: $_EXPLAIN_LEVEL"
_QUESTION_TUNING=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QUESTION_TUNING"
mkdir -p ~/.gstack/analytics
if [ "$_TEL" != "off" ]; then
echo '{"skill":"benchmark-models","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
fi
for _PF in $(find ~/.gstack/analytics -maxdepth 1 -name '.pending-*' 2>/dev/null); do
  if [ -f "$_PF" ]; then
    if [ "$_TEL" != "off" ] && [ -x "~/.claude/skills/gstack/bin/gstack-telemetry-log" ]; then
      ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true
    fi
    rm -f "$_PF" 2>/dev/null || true
  fi
  break
done
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
_LEARN_FILE="${GSTACK_HOME:-$HOME/.gstack}/projects/${SLUG:-unknown}/learnings.jsonl"
if [ -f "$_LEARN_FILE" ]; then
  _LEARN_COUNT=$(wc -l < "$_LEARN_FILE" 2>/dev/null | tr -d ' ')
  echo "LEARNINGS: $_LEARN_COUNT entries loaded"
  if [ "$_LEARN_COUNT" -gt 5 ] 2>/dev/null; then
    ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 3 2>/dev/null || true
  fi
else
  echo "LEARNINGS: 0"
fi
~/.claude/skills/gstack/bin/gstack-timeline-log '{"skill":"benchmark-models","event":"started","branch":"'"$_BRANCH"'","session":"'"$_SESSION_ID"'"}' 2>/dev/null &
_HAS_ROUTING="no"
if [ -f CLAUDE.md ] && grep -q "## Skill routing" CLAUDE.md 2>/dev/null; then
  _HAS_ROUTING="yes"
fi
_ROUTING_DECLINED=$(~/.claude/skills/gstack/bin/gstack-config get routing_declined 2>/dev/null || echo "false")
echo "HAS_ROUTING: $_HAS_ROUTING"
echo "ROUTING_DECLINED: $_ROUTING_DECLINED"
_VENDORED="no"
if [ -d ".claude/skills/gstack" ] && [ ! -L ".claude/skills/gstack" ]; then
  if [ -f ".claude/skills/gstack/VERSION" ] || [ -d ".claude/skills/gstack/.git" ]; then
    _VENDORED="yes"
  fi
fi
echo "VENDORED_GSTACK: $_VENDORED"
echo "MODEL_OVERLAY: claude"
_CHECKPOINT_MODE=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_mode 2>/dev/null || echo "explicit")
_CHECKPOINT_PUSH=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_push 2>/dev/null || echo "false")
echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE"
echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH"
[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true

bash

_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -exec rm {} + 2>/dev/null || true
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_PROACTIVE_PROMPTED=$([ -f ~/.gstack/.proactive-prompted ] && echo "yes" || echo "no")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
_SKILL_PREFIX=$(~/.claude/skills/gstack/bin/gstack-config get skill_prefix 2>/dev/null || echo "false")
echo "PROACTIVE: $_PROACTIVE"
echo "PROACTIVE_PROMPTED: $_PROACTIVE_PROMPTED"
echo "SKILL_PREFIX: $_SKILL_PREFIX"
source <(~/.claude/skills/gstack/bin/gstack-repo-mode 2>/dev/null) || true
REPO_MODE=${REPO_MODE:-unknown}
echo "REPO_MODE: $REPO_MODE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
_EXPLAIN_LEVEL=$(~/.claude/skills/gstack/bin/gstack-config get explain_level 2>/dev/null || echo "default")
if [ "$_EXPLAIN_LEVEL" != "default" ] && [ "$_EXPLAIN_LEVEL" != "terse" ]; then _EXPLAIN_LEVEL="default"; fi
echo "EXPLAIN_LEVEL: $_EXPLAIN_LEVEL"
_QUESTION_TUNING=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QUESTION_TUNING"
mkdir -p ~/.gstack/analytics
if [ "$_TEL" != "off" ]; then
echo '{"skill":"benchmark-models","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
fi
for _PF in $(find ~/.gstack/analytics -maxdepth 1 -name '.pending-*' 2>/dev/null); do
  if [ -f "$_PF" ]; then
    if [ "$_TEL" != "off" ] && [ -x "~/.claude/skills/gstack/bin/gstack-telemetry-log" ]; then
      ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true
    fi
    rm -f "$_PF" 2>/dev/null || true
  fi
  break
done
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
_LEARN_FILE="${GSTACK_HOME:-$HOME/.gstack}/projects/${SLUG:-unknown}/learnings.jsonl"
if [ -f "$_LEARN_FILE" ]; then
  _LEARN_COUNT=$(wc -l < "$_LEARN_FILE" 2>/dev/null | tr -d ' ')
  echo "LEARNINGS: $_LEARN_COUNT entries loaded"
  if [ "$_LEARN_COUNT" -gt 5 ] 2>/dev/null; then
    ~/.claude/skills/gstack/bin/gstack-learnings-search --limit 3 2>/dev/null || true
  fi
else
  echo "LEARNINGS: 0"
fi
~/.claude/skills/gstack/bin/gstack-timeline-log '{"skill":"benchmark-models","event":"started","branch":"'"$_BRANCH"'","session":"'"$_SESSION_ID"'"}' 2>/dev/null &
_HAS_ROUTING="no"
if [ -f CLAUDE.md ] && grep -q "## Skill routing" CLAUDE.md 2>/dev/null; then
  _HAS_ROUTING="yes"
fi
_ROUTING_DECLINED=$(~/.claude/skills/gstack/bin/gstack-config get routing_declined 2>/dev/null || echo "false")
echo "HAS_ROUTING: $_HAS_ROUTING"
echo "ROUTING_DECLINED: $_ROUTING_DECLINED"
_VENDORED="no"
if [ -d ".claude/skills/gstack" ] && [ ! -L ".claude/skills/gstack" ]; then
  if [ -f ".claude/skills/gstack/VERSION" ] || [ -d ".claude/skills/gstack/.git" ]; then
    _VENDORED="yes"
  fi
fi
echo "VENDORED_GSTACK: $_VENDORED"
echo "MODEL_OVERLAY: claude"
_CHECKPOINT_MODE=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_mode 2>/dev/null || echo "explicit")
_CHECKPOINT_PUSH=$(~/.claude/skills/gstack/bin/gstack-config get checkpoint_push 2>/dev/null || echo "false")
echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE"
echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH"
[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true

Plan Mode Safe Operations

计划模式安全操作

In plan mode, allowed because they inform the plan:

$B

$D

codex exec

codex review

, writes to

~/.gstack/

, writes to the plan file, and

open

for generated artifacts.

在计划模式下，允许执行以下操作（用于辅助制定计划）：

$B

、

$D

、

codex exec

codex review

、写入

~/.gstack/

、写入计划文件，以及对生成的产物执行

open

操作。

Skill Invocation During Plan Mode

计划模式下的技能调用

If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. Treat the skill file as executable instructions, not reference. Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant —

mcp__*__AskUserQuestion

or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report

BLOCKED — AskUserQuestion unavailable

per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.

PROACTIVE

"false"

, do not auto-invoke or proactively suggest skills. If a skill seems useful, ask: "I think /skillname might help here — want me to run it?"

SKILL_PREFIX

"true"

, suggest/invoke

/gstack-*

names. Disk paths stay

~/.claude/skills/gstack/[skill-name]/SKILL.md

If output shows

UPGRADE_AVAILABLE <old> <new>

: read

~/.claude/skills/gstack/gstack-upgrade/SKILL.md

and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined).

If output shows

JUST_UPGRADED <from> <to>

: print "Running gstack v{to} (just updated!)". If

SPAWNED_SESSION

is true, skip feature discovery.

Feature discovery, max one prompt per session:

Missing

~/.claude/skills/gstack/.feature-prompted-continuous-checkpoint

: AskUserQuestion for Continuous checkpoint auto-commits. If accepted, run

~/.claude/skills/gstack/bin/gstack-config set checkpoint_mode continuous

. Always touch marker.

Missing
```
~/.claude/skills/gstack/.feature-prompted-model-overlay
```
: inform "Model overlays are active. MODEL_OVERLAY shows the patch." Always touch marker.

After upgrade prompts, continue workflow.

WRITING_STYLE_PENDING

yes

: ask once about writing style:

v1 prompts are simpler: first-use jargon glosses, outcome-framed questions, shorter prose. Keep default or restore terse?

Options:

A) Keep the new default (recommended — good writing helps everyone)
B) Restore V0 prose — set
```
explain_level: terse
```

If A: leave

explain_level

unset (defaults to

default

). If B: run

~/.claude/skills/gstack/bin/gstack-config set explain_level terse

Always run (regardless of choice):

bash

rm -f ~/.gstack/.writing-style-prompt-pending
touch ~/.gstack/.writing-style-prompted

Skip if

WRITING_STYLE_PENDING

no

LAKE_INTRO

no

: say "gstack follows the Boil the Lake principle — do the complete thing when AI makes marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" Offer to open:

bash

open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen

Only run

open

if yes. Always run

touch

TEL_PROMPTED

no

AND

LAKE_INTRO

yes

: ask telemetry once via AskUserQuestion:

Help gstack get better. Share usage data only: skill, duration, crashes, stable device ID. No code, file paths, or repo names.

Options:

A) Help gstack get better! (recommended)
B) No thanks

If A: run

~/.claude/skills/gstack/bin/gstack-config set telemetry community

If B: ask follow-up:

Anonymous mode sends only aggregate usage, no unique ID.

Options:

A) Sure, anonymous is fine
B) No thanks, fully off

If B→A: run

~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous

If B→B: run

~/.claude/skills/gstack/bin/gstack-config set telemetry off

Always run:

bash

touch ~/.gstack/.telemetry-prompted

Skip if

TEL_PROMPTED

yes

PROACTIVE_PROMPTED

no

AND

TEL_PROMPTED

yes

: ask once:

Let gstack proactively suggest skills, like /qa for "does this work?" or /investigate for bugs?

Options:

A) Keep it on (recommended)
B) Turn it off — I'll type /commands myself

If A: run

~/.claude/skills/gstack/bin/gstack-config set proactive true

If B: run

~/.claude/skills/gstack/bin/gstack-config set proactive false

Always run:

bash

touch ~/.gstack/.proactive-prompted

Skip if

PROACTIVE_PROMPTED

yes

HAS_ROUTING

no

AND

ROUTING_DECLINED

false

AND

PROACTIVE_PROMPTED

yes

: Check if a CLAUDE.md file exists in the project root. If it does not exist, create it.

Use AskUserQuestion:

gstack works best when your project's CLAUDE.md includes skill routing rules.

Options:

A) Add routing rules to CLAUDE.md (recommended)
B) No thanks, I'll invoke skills manually

If A: Append this section to the end of CLAUDE.md:

markdown

undefined

如果用户在计划模式下调用技能，技能优先级高于通用计划模式行为。将技能文件视为可执行指令，而非参考文档。 从步骤0开始逐步执行；第一个AskUserQuestion是工作流进入计划模式的标志，不属于违规行为。AskUserQuestion（任何变体——

mcp__*__AskUserQuestion

或原生；请参见“AskUserQuestion格式 → 工具解析”）满足计划模式的回合结束要求。如果无法调用任何变体，则技能被阻塞——根据AskUserQuestion格式规则，停止并报告

BLOCKED — AskUserQuestion unavailable

。在STOP节点，立即停止，不要继续工作流或调用ExitPlanMode。标记为“PLAN MODE EXCEPTION — ALWAYS RUN”的命令会执行。仅在技能工作流完成后，或用户要求取消技能/退出计划模式时，调用ExitPlanMode。

如果

PROACTIVE

为

"false"

，请勿自动调用或主动推荐技能。如果某个技能看起来有用，请询问：“我认为/skillname可能会有帮助——需要我运行它吗？”

如果

SKILL_PREFIX

为

"true"

，建议/调用

/gstack-*

命名的技能。磁盘路径保持为

~/.claude/skills/gstack/[skill-name]/SKILL.md

。

如果输出显示

UPGRADE_AVAILABLE <old> <new>

：请阅读

~/.claude/skills/gstack/gstack-upgrade/SKILL.md

并遵循“内联升级流程”（如果已配置则自动升级，否则通过AskUserQuestion提供4个选项，若用户拒绝则记录 snooze 状态）。

如果输出显示

JUST_UPGRADED <from> <to>

：打印“Running gstack v{to} (just updated!)”。如果

SPAWNED_SESSION

为true，跳过功能发现环节。

功能发现环节，每个会话最多触发一次提示：

如果缺少
```
~/.claude/skills/gstack/.feature-prompted-continuous-checkpoint
```
：通过AskUserQuestion询问是否开启自动提交持续检查点。如果用户同意，运行
```
~/.claude/skills/gstack/bin/gstack-config set checkpoint_mode continuous
```
。无论用户选择如何，都要创建标记文件。
如果缺少
```
~/.claude/skills/gstack/.feature-prompted-model-overlay
```
：告知用户“模型覆盖已激活。MODEL_OVERLAY显示当前补丁”。无论用户选择如何，都要创建标记文件。

完成升级提示后，继续工作流。

如果

WRITING_STYLE_PENDING

为

yes

：询问一次写作风格偏好：

v1版本的提示词更简洁：首次使用时会提供术语解释、以结果为框架的问题，以及更简短的文本。是保留默认风格还是恢复简洁风格？

选项：

A) 保留新的默认风格（推荐——良好的写作体验有助于所有人）
B) 恢复V0版本的文本——设置
```
explain_level: terse
```

如果选择A：保持

explain_level

未设置（默认值为

default

）。如果选择B：运行

~/.claude/skills/gstack/bin/gstack-config set explain_level terse

。

无论选择哪个选项，始终运行：

bash

rm -f ~/.gstack/.writing-style-prompt-pending
touch ~/.gstack/.writing-style-prompted

如果

WRITING_STYLE_PENDING

为

no

，跳过此环节。

如果

LAKE_INTRO

为

no

：告知用户“gstack遵循Boil the Lake原则——当AI的边际成本接近零时，完成完整的任务。了解更多：https://garryslist.org/posts/boil-the-ocean”，并询问是否打开链接：

bash

open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen

仅在用户同意时运行

open

。无论用户选择如何，始终运行

touch

。

如果

TEL_PROMPTED

为

no

且

LAKE_INTRO

为

yes

：通过AskUserQuestion询问一次遥测设置：

帮助gstack变得更好。仅共享使用数据：技能信息、时长、崩溃情况、稳定设备ID。不包含代码、文件路径或仓库名称。

选项：

A) 帮助gstack改进！（推荐）
B) 不用了，谢谢

如果选择A：运行

~/.claude/skills/gstack/bin/gstack-config set telemetry community

如果选择B：继续询问后续选项：

匿名模式仅发送汇总使用数据，不包含唯一ID。

选项：

A) 好的，匿名模式可以接受
B) 不用了，完全关闭

如果选择B→A：运行

~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous

如果选择B→B：运行

~/.claude/skills/gstack/bin/gstack-config set telemetry off

无论选择哪个选项，始终运行：

bash

touch ~/.gstack/.telemetry-prompted

如果

TEL_PROMPTED

为

yes

，跳过此环节。

如果

PROACTIVE_PROMPTED

为

no

且

TEL_PROMPTED

为

yes

：询问一次：

是否让gstack主动推荐技能，比如针对“这个功能能用吗？”调用/qa，或针对bug调用/investigate？

选项：

A) 保持开启（推荐）
B) 关闭——我会手动输入/命令

如果选择A：运行

~/.claude/skills/gstack/bin/gstack-config set proactive true

如果选择B：运行

~/.claude/skills/gstack/bin/gstack-config set proactive false

无论选择哪个选项，始终运行：

bash

touch ~/.gstack/.proactive-prompted

如果

PROACTIVE_PROMPTED

为

yes

，跳过此环节。

如果

HAS_ROUTING

为

no

且

ROUTING_DECLINED

为

false

且

PROACTIVE_PROMPTED

为

yes

：检查项目根目录是否存在CLAUDE.md文件。如果不存在，创建该文件。

通过AskUserQuestion询问：

当项目的CLAUDE.md包含技能路由规则时，gstack的效果最佳。

选项：

A) 向CLAUDE.md添加路由规则（推荐）
B) 不用了，谢谢，我会手动调用技能

如果选择A：将以下部分追加到CLAUDE.md末尾：

markdown

undefined

Skill routing

When the user's request matches an available skill, invoke it via the Skill tool. When in doubt, invoke the skill.

Key routing rules:

Product ideas/brainstorming → invoke /office-hours
Strategy/scope → invoke /plan-ceo-review
Architecture → invoke /plan-eng-review
Design system/plan review → invoke /design-consultation or /plan-design-review
Full review pipeline → invoke /autoplan
Bugs/errors → invoke /investigate
QA/testing site behavior → invoke /qa or /qa-only
Code review/diff check → invoke /review
Visual polish → invoke /design-review
Ship/deploy/PR → invoke /ship or /land-and-deploy
Save progress → invoke /context-save
Resume context → invoke /context-restore


Then commit the change: `git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"`

If B: run `~/.claude/skills/gstack/bin/gstack-config set routing_declined true` and say they can re-enable with `gstack-config set routing_declined false`.

This only happens once per project. Skip if `HAS_ROUTING` is `yes` or `ROUTING_DECLINED` is `true`.

If `VENDORED_GSTACK` is `yes`, warn once via AskUserQuestion unless `~/.gstack/.vendoring-warned-$SLUG` exists:

> This project has gstack vendored in `.claude/skills/gstack/`. Vendoring is deprecated.
> Migrate to team mode?

Options:
- A) Yes, migrate to team mode now
- B) No, I'll handle it myself

If A:
1. Run `git rm -r .claude/skills/gstack/`
2. Run `echo '.claude/skills/gstack/' >> .gitignore`
3. Run `~/.claude/skills/gstack/bin/gstack-team-init required` (or `optional`)
4. Run `git add .claude/ .gitignore CLAUDE.md && git commit -m "chore: migrate gstack from vendored to team mode"`
5. Tell the user: "Done. Each developer now runs: `cd ~/.claude/skills/gstack && ./setup --team`"

If B: say "OK, you're on your own to keep the vendored copy up to date."

Always run (regardless of choice):
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
touch ~/.gstack/.vendoring-warned-${SLUG:-unknown}

If marker exists, skip.

SPAWNED_SESSION

"true"

, you are running inside a session spawned by an AI orchestrator (e.g., OpenClaw). In spawned sessions:

Do NOT use AskUserQuestion for interactive prompts. Auto-choose the recommended option.
Do NOT run upgrade checks, telemetry prompts, routing injection, or lake intro.
Focus on completing the task and reporting results via prose output.
End with a completion report: what shipped, decisions made, anything uncertain.

当用户的请求与可用技能匹配时，通过Skill工具调用该技能。如有疑问，调用技能。

关键路由规则：

产品创意/头脑风暴 → 调用/office-hours
策略/范围规划 → 调用/plan-ceo-review
架构设计 → 调用/plan-eng-review
设计系统/计划评审 → 调用/design-consultation或/plan-design-review
完整评审流程 → 调用/autoplan
漏洞/错误排查 → 调用/investigate
QA/测试站点行为 → 调用/qa或/qa-only
代码评审/差异检查 → 调用/review
视觉优化 → 调用/design-review
发布/部署/PR → 调用/ship或/land-and-deploy
保存进度 → 调用/context-save
恢复上下文 → 调用/context-restore


然后提交更改：`git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"`

如果选择B：运行`~/.claude/skills/gstack/bin/gstack-config set routing_declined true`并告知用户可以通过`gstack-config set routing_declined false`重新启用该功能。

每个项目仅执行一次此操作。如果`HAS_ROUTING`为`yes`或`ROUTING_DECLINED`为`true`，跳过此环节。

如果`VENDORED_GSTACK`为`yes`，且`~/.gstack/.vendoring-warned-$SLUG`不存在，则通过AskUserQuestion发出一次警告：

> 此项目已将gstack嵌入到`.claude/skills/gstack/`目录中。嵌入方式已被弃用。
> 是否迁移到团队模式？

选项：
- A) 是，立即迁移到团队模式
- B) 不用，我会自行处理

如果选择A：
1. 运行`git rm -r .claude/skills/gstack/`
2. 运行`echo '.claude/skills/gstack/' >> .gitignore`
3. 运行`~/.claude/skills/gstack/bin/gstack-team-init required`（或`optional`）
4. 运行`git add .claude/ .gitignore CLAUDE.md && git commit -m "chore: migrate gstack from vendored to team mode"`
5. 告知用户：“完成。每位开发者现在需要运行：`cd ~/.claude/skills/gstack && ./setup --team`”

如果选择B：告知用户“好的，你需要自行负责保持嵌入版本的更新。”

无论选择哪个选项，始终运行：
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" 2>/dev/null || true
touch ~/.gstack/.vendoring-warned-${SLUG:-unknown}

如果标记文件已存在，跳过此环节。

如果

SPAWNED_SESSION

为

"true"

，则你正在AI编排器（如OpenClaw）生成的会话中运行。在生成的会话中：

请勿使用AskUserQuestion进行交互式提示。自动选择推荐选项。
请勿运行升级检查、遥测提示、路由注入或Lake介绍环节。
专注于完成任务并通过文本输出报告结果。
最后提交完成报告：已完成的工作、做出的决策、任何不确定的事项。

Artifacts Sync (skill start)

产物同步（技能启动时）

bash

_GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"

bash

_GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"

Prefer the v1.27.0.0 artifacts file; fall back to brain file for users

优先使用v1.27.0.0版本的产物文件；对于在迁移脚本运行前中途升级的用户，回退到brain文件。

upgrading mid-stream before the migration script runs.

—

if [ -f "$HOME/.gstack-artifacts-remote.txt" ]; then _BRAIN_REMOTE_FILE="$HOME/.gstack-artifacts-remote.txt" else _BRAIN_REMOTE_FILE="$HOME/.gstack-brain-remote.txt" fi _BRAIN_SYNC_BIN="~~/.claude/skills/gstack/bin/gstack-brain-sync" _BRAIN_CONFIG_BIN="~~/.claude/skills/gstack/bin/gstack-config"

/sync-gbrain context-load: teach the agent to use gbrain when it's available.

/sync-gbrain context-load：教导agent在可用时使用gbrain。

Per-worktree pin: post-spike redesign uses kubectl-style

.gbrain-source

in the

每个工作树独立固定：后峰值重新设计使用kubectl风格的

.gbrain-source

文件放在git根目录，用于限定查询范围。在工作树中查找固定文件（而非全局状态文件），这样打开没有固定文件的工作树B时，不会因为工作树A已同步而显示“已索引”。当未配置gbrain时，为空字符串（非gbrain用户的上下文成本为零）。

git toplevel to scope queries. Look for the pin in the worktree (not a global

—

state file) so that opening worktree B without a pin doesn't claim "indexed"

—

just because worktree A was synced. Empty string when gbrain is not

—

configured (zero context cost for non-gbrain users).

—

_GBRAIN_CONFIG="$HOME/.gbrain/config.json" if [ -f "$_GBRAIN_CONFIG" ] && command -v gbrain >/dev/null 2>&1; then _GBRAIN_VERSION_OK=$(gbrain --version 2>/dev/null | grep -c '^gbrain ' || echo 0) if [ "$_GBRAIN_VERSION_OK" -gt 0 ] 2>/dev/null; then _GBRAIN_PIN_PATH="" _REPO_TOP=$(git rev-parse --show-toplevel 2>/dev/null || echo "") if [ -n "$_REPO_TOP" ] && [ -f "$_REPO_TOP/.gbrain-source" ]; then _GBRAIN_PIN_PATH="$_REPO_TOP/.gbrain-source" fi if [ -n "$_GBRAIN_PIN_PATH" ]; then echo "GBrain configured. Prefer `gbrain search`/`gbrain query` over Grep for" echo "semantic questions; use `gbrain code-def`/`code-refs`/`code-callers` for" echo "symbol-aware code lookup. See "## GBrain Search Guidance" in CLAUDE.md." echo "Run /sync-gbrain to refresh." else echo "GBrain configured but this worktree isn't pinned yet. Run `/sync-gbrain --full`" echo "before relying on `gbrain search` for code questions in this worktree." echo "Falls back to Grep until pinned." fi fi fi

_BRAIN_SYNC_MODE=$("$_BRAIN_CONFIG_BIN" get artifacts_sync_mode 2>/dev/null || echo off)

gbrain search\\

gbrain query\\

而非Grep；" echo "对于符号感知的代码查找，使用\

gbrain code-def\\

code-refs\\

code-callers\\

。" echo "请查看CLAUDE.md中的\"## GBrain Search Guidance\"部分。" echo "运行/sync-gbrain以刷新数据。" else echo "GBrain已配置，但此工作树尚未固定。在依赖\

gbrain search\\

解决此工作树中的代码问题之前，" echo "请先运行\

/sync-gbrain --full\\

。在固定之前，会回退到Grep。" fi fi fi

_BRAIN_SYNC_MODE=$("$_BRAIN_CONFIG_BIN" get artifacts_sync_mode 2>/dev/null || echo off)

Detect remote-MCP mode (Path 4 of /setup-gbrain). Local artifacts sync is

检测remote-MCP模式（/setup-gbrain的路径4）。在远程模式下，本地产物同步无效；brain服务器会按自己的节奏从GitHub/GitLab拉取数据。直接读取claude.json以保持前置操作的速度（无需在每次技能启动时调用claude CLI子进程）。

a no-op in remote mode; the brain server pulls from GitHub/GitLab on its

—

own cadence. Read claude.json directly to keep this preamble fast (no

—

subprocess to claude CLI on every skill start).

—

_GBRAIN_MCP_MODE="none" if command -v jq >/dev/null 2>&1 && [ -f "$HOME/.claude.json" ]; then _GBRAIN_MCP_TYPE=$(jq -r '.mcpServers.gbrain.type // .mcpServers.gbrain.transport // empty' "$HOME/.claude.json" 2>/dev/null) case "$_GBRAIN_MCP_TYPE" in url|http|sse) _GBRAIN_MCP_MODE="remote-http" ;; stdio) _GBRAIN_MCP_MODE="local-stdio" ;; esac fi

if [ -f "$_BRAIN_REMOTE_FILE" ] && [ ! -d "$_GSTACK_HOME/.git" ] && [ "$_BRAIN_SYNC_MODE" = "off" ]; then _BRAIN_NEW_URL=$(head -1 "$_BRAIN_REMOTE_FILE" 2>/dev/null | tr -d '[:space:]') if [ -n "$_BRAIN_NEW_URL" ]; then echo "ARTIFACTS_SYNC: artifacts repo detected: $_BRAIN_NEW_URL" echo "ARTIFACTS_SYNC: run 'gstack-brain-restore' to pull your cross-machine artifacts (or 'gstack-config set artifacts_sync_mode off' to dismiss forever)" fi fi

if [ -d "$_GSTACK_HOME/.git" ] && [ "$_BRAIN_SYNC_MODE" != "off" ]; then _BRAIN_LAST_PULL_FILE="$_GSTACK_HOME/.brain-last-pull" _BRAIN_NOW=$(date +%s) _BRAIN_DO_PULL=1 if [ -f "$_BRAIN_LAST_PULL_FILE" ]; then _BRAIN_LAST=$(cat "$_BRAIN_LAST_PULL_FILE" 2>/dev/null || echo 0) _BRAIN_AGE=$(( _BRAIN_NOW - _BRAIN_LAST )) [ "$_BRAIN_AGE" -lt 86400 ] && _BRAIN_DO_PULL=0 fi if [ "$_BRAIN_DO_PULL" = "1" ]; then ( cd "$_GSTACK_HOME" && git fetch origin >/dev/null 2>&1 && git merge --ff-only "origin/$(git rev-parse --abbrev-ref HEAD)" >/dev/null 2>&1 ) || true echo "$_BRAIN_NOW" > "$_BRAIN_LAST_PULL_FILE" fi "$_BRAIN_SYNC_BIN" --once 2>/dev/null || true fi

if [ "$_GBRAIN_MCP_MODE" = "remote-http" ]; then

Remote-MCP mode: local artifacts sync is a no-op (brain admin's server

pulls from GitHub/GitLab). Show the user this is by design, not broken.

_GBRAIN_HOST=$(jq -r '.mcpServers.gbrain.url // empty' "$HOME/.claude.json" 2>/dev/null | sed -E 's|^https?://([^/:]+).*|\1|') echo "ARTIFACTS_SYNC: remote-mode (managed by brain server ${_GBRAIN_HOST:-remote})" elif [ -d "$_GSTACK_HOME/.git" ] && [ "$_BRAIN_SYNC_MODE" != "off" ]; then _BRAIN_QUEUE_DEPTH=0 [ -f "$_GSTACK_HOME/.brain-queue.jsonl" ] && _BRAIN_QUEUE_DEPTH=$(wc -l < "$_GSTACK_HOME/.brain-queue.jsonl" | tr -d ' ') _BRAIN_LAST_PUSH="never" [ -f "$_GSTACK_HOME/.brain-last-push" ] && _BRAIN_LAST_PUSH=$(cat "$_GSTACK_HOME/.brain-last-push" 2>/dev/null || echo never) echo "ARTIFACTS_SYNC: mode=$_BRAIN_SYNC_MODE | last_push=$_BRAIN_LAST_PUSH | queue=$_BRAIN_QUEUE_DEPTH" else echo "ARTIFACTS_SYNC: off" fi




Privacy stop-gate: if output shows `ARTIFACTS_SYNC: off`, `artifacts_sync_mode_prompted` is `false`, and gbrain is on PATH or `gbrain doctor --fast --json` works, ask once:

> gstack can publish your artifacts (CEO plans, designs, reports) to a private GitHub repo that GBrain indexes across machines. How much should sync?

Options:
- A) Everything allowlisted (recommended)
- B) Only artifacts
- C) Decline, keep everything local

After answer:

```bash

if [ -f "$_BRAIN_REMOTE_FILE" ] && [ ! -d "$_GSTACK_HOME/.git" ] && [ "$_BRAIN_SYNC_MODE" = "off" ]; then _BRAIN_NEW_URL=$(head -1 "$_BRAIN_REMOTE_FILE" 2>/dev/null | tr -d '[:space:]') if [ -n "$_BRAIN_NEW_URL" ]; then echo "ARTIFACTS_SYNC: 检测到产物仓库: $_BRAIN_NEW_URL" echo "ARTIFACTS_SYNC: 运行'gstack-brain-restore'以拉取跨机器的产物（或运行'gstack-config set artifacts_sync_mode off'永久关闭此提示）" fi fi

if [ "$_GBRAIN_MCP_MODE" = "remote-http" ]; then

Remote-MCP模式：本地产物同步无效（由brain管理员的服务器从GitHub/GitLab拉取数据）。告知用户这是设计如此，而非功能故障。

_GBRAIN_HOST=$(jq -r '.mcpServers.gbrain.url // empty' "$HOME/.claude.json" 2>/dev/null | sed -E 's|^https?://([^/:]+).*|\1|') echo "ARTIFACTS_SYNC: 远程模式（由brain服务器${_GBRAIN_HOST:-remote}管理）" elif [ -d "$_GSTACK_HOME/.git" ] && [ "$_BRAIN_SYNC_MODE" != "off" ]; then _BRAIN_QUEUE_DEPTH=0 [ -f "$_GSTACK_HOME/.brain-queue.jsonl" ] && _BRAIN_QUEUE_DEPTH=$(wc -l < "$_GSTACK_HOME/.brain-queue.jsonl" | tr -d ' ') _BRAIN_LAST_PUSH="never" [ -f "$_GSTACK_HOME/.brain-last-push" ] && _BRAIN_LAST_PUSH=$(cat "$_GSTACK_HOME/.brain-last-push" 2>/dev/null || echo never) echo "ARTIFACTS_SYNC: mode=$_BRAIN_SYNC_MODE | last_push=$_BRAIN_LAST_PUSH | queue=$_BRAIN_QUEUE_DEPTH" else echo "ARTIFACTS_SYNC: off" fi




隐私检查：如果输出显示`ARTIFACTS_SYNC: off`，`artifacts_sync_mode_prompted`为`false`，且gbrain已在PATH中或`gbrain doctor --fast --json`可正常运行，则询问一次：

> gstack可以将你的产物（CEO计划、设计、报告）发布到私有GitHub仓库，供GBrain跨机器索引。需要同步多少内容？

选项：
- A) 所有允许的内容（推荐）
- B) 仅同步产物
- C) 拒绝，保持所有内容本地存储

用户回答后：

```bash

Chosen mode: full | artifacts-only | off

选择的模式：full | artifacts-only | off

"$_BRAIN_CONFIG_BIN" set artifacts_sync_mode <choice> "$_BRAIN_CONFIG_BIN" set artifacts_sync_mode_prompted true


If A/B and `~/.gstack/.git` is missing, ask whether to run `gstack-artifacts-init`. Do not block the skill.

At skill END before telemetry:

```bash
"~/.claude/skills/gstack/bin/gstack-brain-sync" --discover-new 2>/dev/null || true
"~/.claude/skills/gstack/bin/gstack-brain-sync" --once 2>/dev/null || true

"$_BRAIN_CONFIG_BIN" set artifacts_sync_mode <choice> "$_BRAIN_CONFIG_BIN" set artifacts_sync_mode_prompted true


如果选择A/B且`~/.gstack/.git`不存在，询问是否运行`gstack-artifacts-init`。不要阻塞技能运行。

在技能结束、遥测之前：

```bash
"~/.claude/skills/gstack/bin/gstack-brain-sync" --discover-new 2>/dev/null || true
"~/.claude/skills/gstack/bin/gstack-brain-sync" --once 2>/dev/null || true

Model-Specific Behavioral Patch (claude)

模型特定行为补丁（claude）

The following nudges are tuned for the claude model family. They are subordinate to skill workflow, STOP points, AskUserQuestion gates, plan-mode safety, and /ship review gates. If a nudge below conflicts with skill instructions, the skill wins. Treat these as preferences, not rules.

Todo-list discipline. When working through a multi-step plan, mark each task complete individually as you finish it. Do not batch-complete at the end. If a task turns out to be unnecessary, mark it skipped with a one-line reason.

Think before heavy actions. For complex operations (refactors, migrations, non-trivial new features), briefly state your approach before executing. This lets the user course-correct cheaply instead of mid-flight.

Dedicated tools over Bash. Prefer Read, Edit, Write, Glob, Grep over shell equivalents (cat, sed, find, grep). The dedicated tools are cheaper and clearer.

以下调整针对claude模型家族进行了优化。它们优先级低于技能工作流、STOP节点、AskUserQuestion检查点、计划模式安全规则和/ship评审检查点。如果以下调整与技能指令冲突，以技能指令为准。将这些视为偏好设置，而非强制规则。

待办事项纪律。在执行多步骤计划时，完成每个任务后单独标记为已完成。不要在最后批量标记完成。如果某个任务被证明是不必要的，标记为已跳过并附上一行说明原因。

执行复杂操作前先思考。对于复杂操作（重构、迁移、非平凡的新功能），在执行前简要说明你的方法。这样用户可以在早期低成本地纠正方向，而不是在操作中途。

优先使用专用工具而非Bash。优先使用Read、Edit、Write、Glob、Grep而非shell等效命令（cat、sed、find、grep）。专用工具成本更低、更清晰。

Voice

语言风格

Direct, concrete, builder-to-builder. Name the file, function, command, and user-visible impact. No filler.

No em dashes. No AI vocabulary: delve, crucial, robust, comprehensive, nuanced, multifaceted. Never corporate or academic. Short paragraphs. End with what to do.

The user has context you do not. Cross-model agreement is a recommendation, not a decision. The user decides.

直接、具体，以开发者对开发者的语气沟通。明确提及文件名、函数、命令和对用户可见的影响。不要冗余内容。

不要使用破折号。不要使用AI词汇：delve、crucial、robust、comprehensive、nuanced、multifaceted。避免企业或学术风格。使用短段落。结尾明确下一步操作。

用户拥有你不知道的上下文。跨模型一致性是建议，而非决策。最终由用户决定。

Completion Status Protocol

完成状态协议

When completing a skill workflow, report status using one of:

DONE — completed with evidence.
DONE_WITH_CONCERNS — completed, but list concerns.
BLOCKED — cannot proceed; state blocker and what was tried.
NEEDS_CONTEXT — missing info; state exactly what is needed.

Escalate after 3 failed attempts, uncertain security-sensitive changes, or scope you cannot verify. Format:

STATUS

REASON

ATTEMPTED

RECOMMENDATION

完成技能工作流时，使用以下状态之一报告：

DONE — 已完成并提供证据。
DONE_WITH_CONCERNS — 已完成，但列出存在的问题。
BLOCKED — 无法继续；说明阻塞原因和已尝试的操作。
NEEDS_CONTEXT — 缺少信息；明确说明需要的内容。

在3次尝试失败、不确定的安全敏感更改或无法验证的范围时，升级问题。格式：

STATUS

、

REASON

、

ATTEMPTED

、

RECOMMENDATION

。

Operational Self-Improvement

操作自改进

Before completing, if you discovered a durable project quirk or command fix that would save 5+ minutes next time, log it:

bash

~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"SKILL_NAME","type":"operational","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":N,"source":"observed"}'

Do not log obvious facts or one-time transient errors.

完成前，如果你发现了持久的项目特性或命令修复，能在下次节省5分钟以上时间，请记录：

bash

~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"SKILL_NAME","type":"operational","key":"SHORT_KEY","insight":"DESCRIPTION","confidence":N,"source":"observed"}'

不要记录明显的事实或一次性临时错误。

Telemetry (run last)

遥测（最后运行）

After workflow completion, log telemetry. Use skill

name:

from frontmatter. OUTCOME is success/error/abort/unknown.

PLAN MODE EXCEPTION — ALWAYS RUN: This command writes telemetry to

~/.gstack/analytics/

, matching preamble analytics writes.

Run this bash:

bash

_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true

工作流完成后，记录遥测数据。使用前置内容中的技能

name:

。OUTCOME为success/error/abort/unknown。

PLAN MODE EXCEPTION — ALWAYS RUN: 此命令将遥测数据写入

~/.gstack/analytics/

，与前置操作中的分析写入一致。

运行以下bash命令：

bash

_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true

Session timeline: record skill completion (local-only, never sent anywhere)

会话时间线：记录技能完成情况（仅本地存储，绝不会发送到外部）

~/.claude/skills/gstack/bin/gstack-timeline-log '{"skill":"SKILL_NAME","event":"completed","branch":"'$(git branch --show-current 2>/dev/null || echo unknown)'","outcome":"OUTCOME","duration_s":"'"$_TEL_DUR"'","session":"'"$_SESSION_ID"'"}' 2>/dev/null || true

Local analytics (gated on telemetry setting)

本地分析（受遥测设置控制）

if [ "$_TEL" != "off" ]; then echo '{"skill":"SKILL_NAME","duration_s":"'"$_TEL_DUR"'","outcome":"OUTCOME","browse":"USED_BROWSE","session":"'"$_SESSION_ID"'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true fi

Remote telemetry (opt-in, requires binary)

远程遥测（可选加入，需要二进制文件）

if [ "$_TEL" != "off" ] && [ -x ~/.claude/skills/gstack/bin/gstack-telemetry-log ]; then ~/.claude/skills/gstack/bin/gstack-telemetry-log
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME"
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null & fi


Replace `SKILL_NAME`, `OUTCOME`, and `USED_BROWSE` before running.

if [ "$_TEL" != "off" ] && [ -x ~/.claude/skills/gstack/bin/gstack-telemetry-log ]; then ~/.claude/skills/gstack/bin/gstack-telemetry-log \ --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \ --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null & fi


运行前替换`SKILL_NAME`、`OUTCOME`和`USED_BROWSE`。

Plan Status Footer

计划状态页脚

In plan mode before ExitPlanMode: if the plan file lacks

## GSTACK REVIEW REPORT

, run

~/.claude/skills/gstack/bin/gstack-review-read

and append the standard runs/status/findings table. With

NO_REVIEWS

or empty, append a 5-row placeholder with verdict "NO REVIEWS YET — run

/autoplan

". If a richer report exists, skip.

PLAN MODE EXCEPTION — always allowed (it's the plan file).

在计划模式下、调用ExitPlanMode之前：如果计划文件缺少

## GSTACK REVIEW REPORT

，运行

~/.claude/skills/gstack/bin/gstack-review-read

并追加标准的运行/状态/发现表格。如果显示

NO_REVIEWS

或为空，追加5行占位符， verdict为"NO REVIEWS YET — run

/autoplan

"。如果已有更丰富的报告，跳过此环节。

PLAN MODE EXCEPTION — 始终允许执行（因为操作的是计划文件）。

/benchmark-models — Cross-Model Skill Benchmark

/benchmark-models — 跨模型技能基准测试

You are running the

/benchmark-models

workflow. Wraps the

gstack-model-benchmark

binary with an interactive flow that picks a prompt, confirms providers, previews auth, and runs the benchmark.

Different from

/benchmark

— that skill measures web page performance (Core Web Vitals, load times). This skill measures AI model performance on gstack skills or arbitrary prompts.

你正在运行

/benchmark-models

工作流。该工作流封装了

gstack-model-benchmark

二进制文件，提供交互式流程：选择提示词、确认提供商、预览授权状态并运行基准测试。

与

/benchmark

不同——该技能用于衡量网页性能（Core Web Vitals、加载时间）。本技能用于衡量AI模型在gstack技能或任意提示词上的性能。

Step 0: Locate the binary

步骤0：定位二进制文件

bash

BIN="$HOME/.claude/skills/gstack/bin/gstack-model-benchmark"
[ -x "$BIN" ] || BIN=".claude/skills/gstack/bin/gstack-model-benchmark"
[ -x "$BIN" ] || { echo "ERROR: gstack-model-benchmark not found. Run ./setup in the gstack install dir." >&2; exit 1; }
echo "BIN: $BIN"

If not found, stop and tell the user to reinstall gstack.

bash

BIN="$HOME/.claude/skills/gstack/bin/gstack-model-benchmark"
[ -x "$BIN" ] || BIN=".claude/skills/gstack/bin/gstack-model-benchmark"
[ -x "$BIN" ] || { echo "ERROR: gstack-model-benchmark未找到。请在gstack安装目录运行./setup。" >&2; exit 1; }
echo "BIN: $BIN"

如果未找到，停止并告知用户重新安装gstack。

Step 1: Choose a prompt

步骤1：选择提示词

Use AskUserQuestion with the preamble format:

Re-ground: current project + branch.
Simplify: "A cross-model benchmark runs the same prompt through 2-3 AI models and shows you how they compare on speed, cost, and output quality. What prompt should we use?"
RECOMMENDATION: A because benchmarking against a real skill exposes tool-use differences, not just raw generation.
Options:
- A) Benchmark one of my gstack skills (we'll pick which skill next). Completeness: 10/10.
- B) Use an inline prompt — type it on the next turn. Completeness: 8/10.
- C) Point at a prompt file on disk — specify path on the next turn. Completeness: 8/10.

If A: list top-level gstack skills that have SKILL.md files (from

find . -maxdepth 2 -name SKILL.md -not -path './.*'

), ask the user to pick one via a second AskUserQuestion. Use the picked SKILL.md path as the prompt file.

If B: ask the user for the inline prompt. Use it verbatim via

--prompt "<text>"

If C: ask for the path. Verify it exists. Use as positional argument.

使用AskUserQuestion，格式如下：

重新定位上下文: 当前项目 + 分支。
简化说明: "跨模型基准测试会将同一提示词通过2-3个AI模型运行，并展示它们在速度、成本和输出质量上的差异。我们应该使用哪个提示词？"
推荐选项: A，因为针对真实技能进行基准测试能暴露工具使用差异，而不仅仅是原始生成能力。
选项:
- A) 基准测试我的一个gstack技能（接下来我们会选择具体技能）。完整性：10/10。
- B) 使用内联提示词——在下一回合输入。完整性：8/10。
- C) 指定磁盘上的提示词文件——在下一回合提供路径。完整性：8/10。

如果选择A：列出顶级的gstack技能（包含SKILL.md文件，来自

find . -maxdepth 2 -name SKILL.md -not -path './.*'

），通过第二个AskUserQuestion让用户选择一个。将选中的SKILL.md路径作为提示词文件。

如果选择B：询问用户输入内联提示词。通过

--prompt "<text>"

直接使用该提示词。

如果选择C：询问文件路径。验证文件是否存在。将其作为位置参数使用。

Step 2: Choose providers

步骤2：选择提供商

bash

"$BIN" --prompt "unused, dry-run" --models claude,gpt,gemini --dry-run

Show the dry-run output. The "Adapter availability" section tells the user which providers will actually run (OK) vs skip (NOT READY — remediation hint included).

If ALL three show NOT READY: stop with a clear message — benchmark can't run without at least one authed provider. Suggest

claude login

codex login

, or

gemini login

export GOOGLE_API_KEY

If at least one is OK: AskUserQuestion:

Simplify: "Which models should we include? The dry-run above showed which are authed. Unauthed ones will be skipped cleanly — they won't abort the batch."
RECOMMENDATION: A (all authed providers) because running as many as possible gives the richest comparison.
Options:
- A) All authed providers. Completeness: 10/10.
- B) Only Claude. Completeness: 6/10 (no cross-model signal — use /ship's review for solo claude benchmarks instead).
- C) Pick two — specify on next turn. Completeness: 8/10.

bash

"$BIN" --prompt "unused, dry-run" --models claude,gpt,gemini --dry-run

显示试运行输出。“Adapter availability”部分会告知用户哪些提供商可以实际运行（OK），哪些会被跳过（NOT READY — 包含修复提示）。

如果三个提供商都显示NOT READY：停止并给出明确信息——基准测试至少需要一个已授权的提供商才能运行。建议用户执行

claude login

、

codex login

或

gemini login

export GOOGLE_API_KEY

。

如果至少有一个显示OK：通过AskUserQuestion询问：

简化说明: "我们应该包含哪些模型？上面的试运行显示了已授权的提供商。未授权的提供商会被干净地跳过——不会中断批量运行。"
推荐选项: A（所有已授权的提供商），因为运行尽可能多的模型能提供最全面的对比。
选项:
- A) 所有已授权的提供商。完整性：10/10。
- B) 仅Claude。完整性：6/10（没有跨模型对比信号——请使用/ship的评审进行单独Claude基准测试）。
- C) 选择两个——在下一回合指定。完整性：8/10。

Step 3: Decide on judge

步骤3：决定是否启用评判者

bash

[ -n "$ANTHROPIC_API_KEY" ] || grep -q 'ANTHROPIC' "$HOME/.claude/.credentials.json" 2>/dev/null && echo "JUDGE_AVAILABLE" || echo "JUDGE_UNAVAILABLE"

If judge is available, AskUserQuestion:

Simplify: "The quality judge scores each model's output on a 0-10 scale using Anthropic's Claude as a tiebreaker. Adds ~$0.05/run. Recommended if you care about output quality, not just latency and cost."
RECOMMENDATION: A — the whole point is comparing quality, not just speed.
Options:
- A) Enable judge (adds ~$0.05). Completeness: 10/10.
- B) Skip judge — speed/cost/tokens only. Completeness: 7/10.

If judge is NOT available, skip this question and omit the

--judge

flag.

bash

[ -n "$ANTHROPIC_API_KEY" ] || grep -q 'ANTHROPIC' "$HOME/.claude/.credentials.json" 2>/dev/null && echo "JUDGE_AVAILABLE" || echo "JUDGE_UNAVAILABLE"

如果评判者可用，通过AskUserQuestion询问：

简化说明: "质量评判者会使用Anthropic的Claude作为决胜者，对每个模型的输出进行0-10分的评分。每次运行约增加$0.05成本。如果你关心输出质量（而不仅仅是延迟和成本），推荐启用。"
推荐选项: A——基准测试的核心就是对比质量，而不仅仅是速度。
选项:
- A) 启用评判者（约增加$0.05成本）。完整性：10/10。
- B) 跳过评判者——仅对比速度/成本/token。完整性：7/10。

如果评判者不可用，跳过此问题并省略

--judge

标志。

Step 4: Run the benchmark

步骤4：运行基准测试

Construct the command from Step 1, 2, 3 decisions:

bash

"$BIN" <prompt-spec> --models <picked-models> [--judge] --output table

Where

<prompt-spec>

is either

--prompt "<text>"

(Step 1B), a file path (Step 1A or 1C), and

<picked-models>

is the comma-separated list from Step 2.

Stream the output as it arrives. This is slow — each provider runs the prompt fully. Expect 30s-5min depending on prompt complexity and whether

--judge

is on.

根据步骤1、2、3的决定构建命令：

bash

"$BIN" <prompt-spec> --models <picked-models> [--judge] --output table

其中

<prompt-spec>

是

--prompt "<text>"

（步骤1B）、文件路径（步骤1A或1C），

<picked-models>

是步骤2选择的逗号分隔列表。

实时流式输出结果。此过程较慢——每个提供商都会完整运行提示词。根据提示词复杂度和是否启用

--judge

，预计耗时30秒到5分钟。

Step 5: Interpret results

步骤5：解读结果

After the table prints, summarize for the user:

Fastest — provider with lowest latency.
Cheapest — provider with lowest cost.
Highest quality (if
```
--judge
```
ran) — provider with highest score.
Best overall — use judgment. If judge ran: quality-weighted. Otherwise: note the tradeoff the user needs to make.

If any provider hit an error (auth/timeout/rate_limit), call it out with the remediation path.

表格打印完成后，为用户总结：

最快 — 延迟最低的提供商。
最便宜 — 成本最低的提供商。
最高质量（如果启用了
```
--judge
```
） — 评分最高的提供商。
整体最佳 — 综合判断。如果启用了评判者：以质量为权重。否则：说明用户需要做出的权衡。

如果任何提供商出现错误（授权/超时/速率限制），指出问题并提供修复路径。

Step 6: Offer to save results

步骤6：提供保存结果选项

AskUserQuestion:

Simplify: "Save this benchmark as JSON so you can compare future runs against it?"
RECOMMENDATION: A — skill performance drifts as providers update their models; a saved baseline catches quality regressions.
Options:
- A) Save to
```
~/.gstack/benchmarks/<date>-<skill-or-prompt-slug>.json
```
  . Completeness: 10/10.
- B) Just print, don't save. Completeness: 5/10 (loses trend data).

If A: re-run with

--output json

and tee to the dated file. Print the path so the user can diff future runs against it.

通过AskUserQuestion询问：

简化说明: "是否将此基准测试结果保存为JSON，以便未来运行时进行对比？"
推荐选项: A——随着提供商更新模型，技能性能会变化；保存基线可以发现质量退化。
选项:
- A) 保存到
```
~/.gstack/benchmarks/<date>-<skill-or-prompt-slug>.json
```
  。完整性：10/10。
- B) 仅打印，不保存。完整性：5/10（失去趋势数据）。

如果选择A：使用

--output json

重新运行并将结果写入带日期的文件。打印文件路径，方便用户未来对比。

Important Rules

重要规则

Never run a real benchmark without Step 2's dry-run first. Users need to see auth status before spending API calls.
Never hardcode model names. Always pass providers from user's Step 2 choice — the binary handles the rest.
Never auto-include
--judge
. It adds real cost; user must opt in.
If zero providers are authed, STOP. Don't attempt the benchmark — it produces no useful output.
Cost is visible. Every run shows per-provider cost in the table. Users should see it before the next run.

永远不要在步骤2的试运行前运行真实基准测试。 用户需要在消耗API调用前查看授权状态。
永远不要硬编码模型名称。 始终传递用户步骤2选择的提供商——二进制文件会处理其余部分。
永远不要自动包含
--judge
。这会产生实际成本；必须由用户选择启用。
如果没有已授权的提供商，立即停止。 不要尝试运行基准测试——不会产生有用输出。
成本可见。 每次运行都会在表格中显示每个提供商的成本。用户应该在下次运行前看到成本信息。",