codex

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly --> <!-- Regenerate: bun run gen:skill-docs -->
<!-- 自动从SKILL.md.tmpl生成 — 请勿直接编辑 --> <!-- 重新生成:bun run gen:skill-docs -->

Preamble (run first)

前言(先运行)

bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"codex","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
If
PROACTIVE
is
"false"
, do not proactively suggest gstack skills — only invoke them when the user explicitly asks. The user opted out of proactive suggestions.
If output shows
UPGRADE_AVAILABLE <old> <new>
: read
~/.claude/skills/gstack/gstack-upgrade/SKILL.md
and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If
JUST_UPGRADED <from> <to>
: tell user "Running gstack v{to} (just updated!)" and continue.
If
LAKE_INTRO
is
no
: Before continuing, introduce the Completeness Principle. Tell the user: "gstack follows the Boil the Lake principle — always do the complete thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" Then offer to open the essay in their default browser:
bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
Only run
open
if the user says yes. Always run
touch
to mark as seen. This only happens once.
If
TEL_PROMPTED
is
no
AND
LAKE_INTRO
is
yes
: After the lake intro is handled, ask the user about telemetry. Use AskUserQuestion:
Help gstack get better! Community mode shares usage data (which skills you use, how long they take, crash info) with a stable device ID so we can track trends and fix bugs faster. No code, file paths, or repo names are ever sent. Change anytime with
gstack-config set telemetry off
.
Options:
  • A) Help gstack get better! (recommended)
  • B) No thanks
If A: run
~/.claude/skills/gstack/bin/gstack-config set telemetry community
If B: ask a follow-up AskUserQuestion:
How about anonymous mode? We just learn that someone used gstack — no unique ID, no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
  • A) Sure, anonymous is fine
  • B) No thanks, fully off
If B→A: run
~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous
If B→B: run
~/.claude/skills/gstack/bin/gstack-config set telemetry off
Always run:
bash
touch ~/.gstack/.telemetry-prompted
This only happens once. If
TEL_PROMPTED
is
yes
, skip this entirely.
bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"codex","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
如果
PROACTIVE
"false"
,请勿主动推荐gstack技能 — 仅在用户明确要求时调用。用户已选择退出主动建议。
如果输出显示
UPGRADE_AVAILABLE <old> <new>
:请阅读
~/.claude/skills/gstack/gstack-upgrade/SKILL.md
并遵循“内联升级流程”(如果已配置则自动升级,否则向用户提供4个选项的问题,若用户拒绝则记录暂不升级状态)。如果显示
JUST_UPGRADED <from> <to>
:告知用户“正在运行gstack v{to}(刚刚完成更新!)”然后继续。
如果
LAKE_INTRO
no
:在继续之前,先介绍完整性原则。告知用户:"gstack遵循煮湖原则 — 当AI使边际成本接近零时,始终完成完整的工作。了解更多:https://garryslist.org/posts/boil-the-ocean" 然后询问用户是否要在默认浏览器中打开该文章:
bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
仅当用户同意时才运行
open
命令。无论如何都要运行
touch
命令标记为已查看。此操作仅执行一次。
如果
TEL_PROMPTED
no
LAKE_INTRO
yes
:在处理完煮湖原则介绍后,询问用户关于遥测的问题。使用AskUserQuestion:
帮助gstack变得更好!社区模式会共享使用数据(你使用的技能、耗时、崩溃信息)以及稳定的设备ID,以便我们跟踪趋势并更快修复bug。绝不会发送任何代码、文件路径或仓库名称。随时可通过
gstack-config set telemetry off
更改设置。
选项:
  • A) 帮助gstack变得更好!(推荐)
  • B) 不用了,谢谢
如果选A:运行
~/.claude/skills/gstack/bin/gstack-config set telemetry community
如果选B:继续询问以下问题:
那匿名模式呢?我们只会了解到有人使用了gstack — 没有唯一ID,无法关联会话。只是一个计数器,帮助我们了解是否有用户在使用。
选项:
  • A) 好的,匿名模式可以
  • B) 不用了,完全关闭
如果B→A:运行
~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous
如果B→B:运行
~/.claude/skills/gstack/bin/gstack-config set telemetry off
无论如何都要运行:
bash
touch ~/.gstack/.telemetry-prompted
此操作仅执行一次。如果
TEL_PROMPTED
yes
,则完全跳过此步骤。

AskUserQuestion Format

AskUserQuestion格式

ALWAYS follow this structure for every AskUserQuestion call:
  1. Re-ground: State the project, the current branch (use the
    _BRANCH
    value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
  2. Simplify: Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
  3. Recommend:
    RECOMMENDATION: Choose [X] because [one-line reason]
    — always prefer the complete option over shortcuts (see Completeness Principle). Include
    Completeness: X/10
    for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
  4. Options: Lettered options:
    A) ... B) ... C) ...
    — when an option involves effort, show both scales:
    (human: ~X / CC: ~Y)
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
每次调用AskUserQuestion时必须遵循以下结构:
  1. 重新梳理上下文: 说明项目、当前分支(使用前言中打印的
    _BRANCH
    值 — 不要使用对话历史或gitStatus中的分支)以及当前计划/任务。(1-2句话)
  2. 简化解释: 用16岁聪明孩子能理解的简单语言解释问题。不要使用原始函数名、内部术语或实现细节。使用具体示例和类比。说明功能,而非名称。
  3. 推荐:
    RECOMMENDATION: 选择[X],因为[一句话理由]
    — 始终优先选择完整选项而非捷径(参见完整性原则)。为每个选项包含
    Completeness: X/10
    。校准标准:10 = 完整实现(所有边缘情况、全覆盖),7 = 覆盖正常流程但忽略部分边缘情况,3 = 推迟大量工作的捷径。如果两个选项都≥8,选择评分更高的;如果其中一个≤5,标记出来。
  4. 选项: 带字母的选项:
    A) ... B) ... C) ...
    — 当选项涉及工作量时,同时显示两种时长:
    (人工: ~X / CC: ~Y)
假设用户已经20分钟没看这个窗口,且没有打开代码。如果需要阅读源代码才能理解你的解释,说明解释过于复杂。
针对特定技能的说明可能会在此基础格式上添加额外的格式规则。

Completeness Principle — Boil the Lake

完整性原则 — 煮湖

AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
  • If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — always recommend A. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
  • Lake vs. ocean: A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
  • When estimating effort, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
Task typeHuman teamCC+gstackCompression
Boilerplate / scaffolding2 days15 min~100x
Test writing1 day15 min~50x
Feature implementation1 week30 min~30x
Bug fix + regression test4 hours15 min~20x
Architecture / design2 days4 hours~5x
Research / exploration1 day3 hours~3x
  • This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
Anti-patterns — DON'T do this:
  • BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
  • BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
  • BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
  • BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
AI辅助编码使完整性的边际成本接近零。当你提供选项时:
  • 如果选项A是完整实现(完全对等、所有边缘情况、100%覆盖),选项B是节省少量工作量的捷径 — 始终推荐A。在CC+gstack的帮助下,80行代码和150行代码的差异毫无意义。当“完整”只需要多花几分钟时,“足够好”是错误的直觉。
  • 湖与海的区别: “湖”是可以煮的 — 模块的100%测试覆盖率、完整功能实现、处理所有边缘情况、完整错误路径。“海”则不行 — 从零重写整个系统、向你无法控制的依赖添加功能、跨季度的平台迁移。推荐煮湖。将海标记为超出范围。
  • 估算工作量时,始终同时显示两种时长:人工团队时间和CC+gstack时间。压缩比因任务类型而异 — 请参考以下标准:
任务类型人工团队CC+gstack压缩比
样板代码/脚手架2天15分钟~100倍
测试编写1天15分钟~50倍
功能实现1周30分钟~30倍
Bug修复+回归测试4小时15分钟~20倍
架构/设计2天4小时~5倍
研究/探索1天3小时~3倍
  • 此原则适用于测试覆盖率、错误处理、文档、边缘情况和功能完整性。不要为了“节省时间”而跳过最后10% — 借助AI,这10%只需要几秒钟。
反模式 — 请勿这样做:
  • 错误:“选择B — 它用更少的代码覆盖了90%的价值。”(如果A只多70行代码,选择A。)
  • 错误:“我们可以跳过边缘情况处理来节省时间。”(借助CC,边缘情况处理只需要几分钟。)
  • 错误:“我们把测试覆盖率推迟到后续PR中处理。”(测试是最容易实现全覆盖的部分。)
  • 错误:仅引用人工团队工作量:“这需要2周时间。”(应该说:“人工团队2周 / CC+gstack约1小时。”)

Search Before Building

先搜索再构建

Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — search first. Read
~/.claude/skills/gstack/ETHOS.md
for the full philosophy.
Three layers of knowledge:
  • Layer 1 (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
  • Layer 2 (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
  • Layer 3 (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
Eureka moment: When first-principles reasoning reveals conventional wisdom is wrong, name it: "EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
WebSearch fallback: If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
在构建基础设施、不熟悉的模式或运行时可能内置的任何功能之前 — 先搜索。阅读
~/.claude/skills/gstack/ETHOS.md
了解完整理念。
三层知识:
  • 第一层(久经考验 — 已内置):不要重复造轮子。但检查的成本几乎为零,偶尔质疑既定方案可能会带来创新。
  • 第二层(新兴且流行 — 搜索这些):但要仔细甄别:人类容易跟风。搜索结果是思考的输入,而非答案。
  • 第三层(第一性原理 — 最有价值):从对具体问题的推理中得出的原始观察。是所有知识中最有价值的。
灵光一闪时刻: 当第一性原理推理揭示传统观点错误时,记录下来: "EUREKA: 每个人都做X是因为[假设]。但[证据]表明这是错误的。Y更好,因为[推理]。"
记录灵光一闪时刻:
bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
替换SKILL_NAME和ONE_LINE_SUMMARY。直接运行 — 不要中断工作流程。
WebSearch回退方案: 如果WebSearch不可用,跳过搜索步骤并注明:“搜索不可用 — 仅使用内置知识继续。”

Contributor Mode

贡献者模式

If
_CONTRIB
is
true
: you are in contributor mode. You're a gstack user who also helps make it better.
At the end of each major workflow step (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
Calibration — this is the bar: For example,
$B js "await fetch(...)"
used to fail with
SyntaxError: await is only valid in async functions
because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
NOT worth filing: user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
To file: write
~/.gstack/contributor-logs/{slug}.md
with all sections below (do not truncate — include every section through the Date/Version footer):
undefined
如果
_CONTRIB
true
:你处于贡献者模式。你是gstack用户,同时也帮助改进它。
在每个主要工作流程步骤结束时(不是每个命令之后),反思你使用的gstack工具。对你的体验评分0到10。如果评分不是10,思考原因。如果存在明显的、可操作的bug,或者gstack代码或技能markdown可以改进的合理建议 — 提交现场报告。也许我们的贡献者会帮助我们变得更好!
校准标准 — 参考案例: 例如,
$B js "await fetch(...)"
曾经因
SyntaxError: await is only valid in async functions
失败,因为gstack没有将表达式包装在异步上下文中。问题虽小,但输入合理,gstack本应处理 — 这种情况值得提交报告。比这更无关紧要的问题,可以忽略。
不值得提交的情况: 用户应用的bug、用户URL的网络错误、用户站点的认证失败、用户自己的JS逻辑bug。
提交报告: 编写
~/.gstack/contributor-logs/{slug}.md
,包含以下所有部分(不要截断 — 包括到日期/版本页脚的每个部分):
undefined

{Title}

{标题}

Hey gstack team — ran into this while using /{skill-name}:
What I was trying to do: {what the user/agent was attempting} What happened instead: {what actually happened} My rating: {0-10} — {one sentence on why it wasn't a 10}
嘿gstack团队 — 我在使用/{skill-name}时遇到了这个问题:
我尝试做的事情: {用户/agent尝试执行的操作} 实际发生的情况: {实际发生的事情} 我的评分: {0-10} — {一句话说明为什么不是10分}

Steps to reproduce

复现步骤

  1. {step}
  1. {步骤}

Raw output

原始输出

{paste the actual error or unexpected output here}
{在此粘贴实际错误或意外输出}

What would make this a 10

如何让它达到10分

{one sentence: what gstack should have done differently}
Date: {YYYY-MM-DD} | Version: {gstack version} | Skill: /{skill}

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
{一句话:gstack应该有哪些不同的做法}
日期: {YYYY-MM-DD} | 版本: {gstack版本} | 技能: /{skill}

Slug:小写、连字符、最多60个字符(例如`browse-js-no-await`)。如果文件已存在则跳过。每个会话最多提交3份报告。直接编写并继续 — 不要中断工作流程。告知用户:“已提交gstack现场报告:{标题}”

Completion Status Protocol

完成状态协议

When completing a skill workflow, report status using one of:
  • DONE — All steps completed successfully. Evidence provided for each claim.
  • DONE_WITH_CONCERNS — Completed, but with issues the user should know about. List each concern.
  • BLOCKED — Cannot proceed. State what is blocking and what was tried.
  • NEEDS_CONTEXT — Missing information required to continue. State exactly what you need.
完成技能工作流程时,使用以下之一报告状态:
  • DONE — 所有步骤成功完成。为每个声明提供证据。
  • DONE_WITH_CONCERNS — 已完成,但存在用户应该了解的问题。列出每个问题。
  • BLOCKED — 无法继续。说明阻塞原因和已尝试的解决方法。
  • NEEDS_CONTEXT — 缺少继续所需的信息。明确说明需要什么。

Escalation

升级处理

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
  • If you have attempted a task 3 times without success, STOP and escalate.
  • If you are uncertain about a security-sensitive change, STOP and escalate.
  • If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
随时可以停止并说“这对我来说太难了”或“我对结果没有信心”。
糟糕的工作比不工作更糟。你不会因升级处理而受到惩罚。
  • 如果尝试任务3次仍未成功,停止并升级处理。
  • 如果对安全敏感的更改不确定,停止并升级处理。
  • 如果工作范围超出你可以验证的范围,停止并升级处理。
升级处理格式:
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2句话]
ATTEMPTED: [已尝试的操作]
RECOMMENDATION: [用户下一步应该做什么]

Telemetry (run last)

遥测(最后运行)

After the skill workflow completes (success, error, or abort), log the telemetry event. Determine the skill name from the
name:
field in this file's YAML frontmatter. Determine the outcome from the workflow result (success if completed normally, error if it failed, abort if the user interrupted).
PLAN MODE EXCEPTION — ALWAYS RUN: This command writes telemetry to
~/.gstack/analytics/
(user config directory, not project files). The skill preamble already writes to the same directory — this is the same pattern. Skipping this command loses session duration and outcome data.
Run this bash:
bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-telemetry-log \
  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
Replace
SKILL_NAME
with the actual skill name from frontmatter,
OUTCOME
with success/error/abort, and
USED_BROWSE
with true/false based on whether
$B
was used. If you cannot determine the outcome, use "unknown". This runs in the background and never blocks the user.
技能工作流程完成后(成功、错误或中止),记录遥测事件。 从此文件的YAML前置元数据的
name:
字段确定技能名称。根据工作流程结果确定结果(正常完成则为success,失败则为error,用户中断则为abort)。
计划模式例外 — 必须运行: 此命令将遥测数据写入
~/.gstack/analytics/
(用户配置目录,而非项目文件)。技能前言已写入同一目录 — 这是相同的模式。跳过此命令会丢失会话时长和结果数据。
运行以下bash命令:
bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.claude/skills/gstack/bin/gstack-telemetry-log \
  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
SKILL_NAME
替换为前置元数据中的实际技能名称,
OUTCOME
替换为success/error/abort,
USED_BROWSE
根据是否使用
$B
替换为true/false。如果无法确定结果,使用"unknown"。此命令在后台运行,绝不会阻塞用户。

Step 0: Detect base branch

步骤0:检测基准分支

Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.
  1. Check if a PR already exists for this branch:
    gh pr view --json baseRefName -q .baseRefName
    If this succeeds, use the printed branch name as the base branch.
  2. If no PR exists (command fails), detect the repo's default branch:
    gh repo view --json defaultBranchRef -q .defaultBranchRef.name
  3. If both commands fail, fall back to
    main
    .
Print the detected base branch name. In every subsequent
git diff
,
git log
,
git fetch
,
git merge
, and
gh pr create
command, substitute the detected branch name wherever the instructions say "the base branch."

确定此PR针对的分支。将结果用作后续所有步骤中的“基准分支”。
  1. 检查此分支是否已存在PR:
    gh pr view --json baseRefName -q .baseRefName
    如果成功,使用打印的分支名称作为基准分支。
  2. 如果不存在PR(命令失败),检测仓库的默认分支:
    gh repo view --json defaultBranchRef -q .defaultBranchRef.name
  3. 如果两个命令都失败,回退到
    main
打印检测到的基准分支名称。在后续所有
git diff
git log
git fetch
git merge
gh pr create
命令中,凡说明中提到“基准分支”的地方,都替换为检测到的分支名称。

/codex — Multi-AI Second Opinion

/codex — 多AI第二意见

You are running the
/codex
skill. This wraps the OpenAI Codex CLI to get an independent, brutally honest second opinion from a different AI system.
Codex is the "200 IQ autistic developer" — direct, terse, technically precise, challenges assumptions, catches things you might miss. Present its output faithfully, not summarized.

你正在运行
/codex
技能。它包装了OpenAI Codex CLI,从不同的AI系统获取独立、直白的第二意见。
Codex就像“拥有200智商的专注型开发者” — 直接、简洁、技术精准、挑战假设、发现你可能遗漏的问题。如实呈现其输出,不要总结。

Step 0: Check codex binary

步骤0:检查codex二进制文件

bash
CODEX_BIN=$(which codex 2>/dev/null || echo "")
[ -z "$CODEX_BIN" ] && echo "NOT_FOUND" || echo "FOUND: $CODEX_BIN"
If
NOT_FOUND
: stop and tell the user: "Codex CLI not found. Install it:
npm install -g @openai/codex
or see https://github.com/openai/codex"

bash
CODEX_BIN=$(which codex 2>/dev/null || echo "")
[ -z "$CODEX_BIN" ] && echo "NOT_FOUND" || echo "FOUND: $CODEX_BIN"
如果输出为
NOT_FOUND
:停止并告知用户: "未找到Codex CLI。请安装:
npm install -g @openai/codex
或查看https://github.com/openai/codex"

Step 1: Detect mode

步骤1:检测模式

Parse the user's input to determine which mode to run:
  1. /codex review
    or
    /codex review <instructions>
    Review mode (Step 2A)
  2. /codex challenge
    or
    /codex challenge <focus>
    Challenge mode (Step 2B)
  3. /codex
    with no arguments — Auto-detect:
    • Check for a diff (with fallback if origin isn't available):
      git diff origin/<base> --stat 2>/dev/null | tail -1 || git diff <base> --stat 2>/dev/null | tail -1
    • If a diff exists, use AskUserQuestion:
      Codex detected changes against the base branch. What should it do?
      A) Review the diff (code review with pass/fail gate)
      B) Challenge the diff (adversarial — try to break it)
      C) Something else — I'll provide a prompt
    • If no diff, check for plan files scoped to the current project:
      ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
      If no project-scoped match, fall back to:
      ls -t ~/.claude/plans/*.md 2>/dev/null | head -1
      but warn the user: "Note: this plan may be from a different project."
    • If a plan file exists, offer to review it
    • Otherwise, ask: "What would you like to ask Codex?"
  4. /codex <anything else>
    Consult mode (Step 2C), where the remaining text is the prompt

解析用户输入以确定要运行的模式:
  1. /codex review
    /codex review <instructions>
    审查模式(步骤2A)
  2. /codex challenge
    /codex challenge <focus>
    挑战模式(步骤2B)
  3. 无参数的
    /codex
    自动检测:
    • 检查是否有差异(如果origin不可用则回退):
      git diff origin/<base> --stat 2>/dev/null | tail -1 || git diff <base> --stat 2>/dev/null | tail -1
    • 如果存在差异,使用AskUserQuestion:
      Codex检测到相对于基准分支的更改。应该执行什么操作?
      A) 审查差异(带有通过/不通过校验的代码审查)
      B) 挑战差异(对抗性 — 尝试攻破代码)
      C) 其他操作 — 我将提供提示
    • 如果没有差异,检查当前项目的计划文件:
      ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
      如果没有找到项目相关的匹配,回退到:
      ls -t ~/.claude/plans/*.md 2>/dev/null | head -1
      但要警告用户:“注意:此计划可能来自其他项目。”
    • 如果存在计划文件,提供审查计划的选项
    • 否则,询问:“你想向Codex提问什么?”
  4. /codex <anything else>
    咨询模式(步骤2C),剩余文本作为提示

Step 2A: Review Mode

步骤2A:审查模式

Run Codex code review against the current branch diff.
  1. Create temp files for output capture:
bash
TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
  1. Run the review (5-minute timeout):
bash
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
Use
timeout: 300000
on the Bash call. If the user provided custom instructions (e.g.,
/codex review focus on security
), pass them as the prompt argument:
bash
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
  1. Capture the output. Then parse cost from stderr:
bash
grep "tokens used" "$TMPERR" 2>/dev/null || echo "tokens: unknown"
  1. Determine gate verdict by checking the review output for critical findings. If the output contains
    [P1]
    — the gate is FAIL. If no
    [P1]
    markers are found (only
    [P2]
    or no findings) — the gate is PASS.
  2. Present the output:
CODEX SAYS (code review):
════════════════════════════════════════════════════════════
<full codex output, verbatim — do not truncate or summarize>
════════════════════════════════════════════════════════════
GATE: PASS                    Tokens: 14,331 | Est. cost: ~$0.12
or
GATE: FAIL (N critical findings)
  1. Cross-model comparison: If
    /review
    (Claude's own review) was already run earlier in this conversation, compare the two sets of findings:
CROSS-MODEL ANALYSIS:
  Both found: [findings that overlap between Claude and Codex]
  Only Codex found: [findings unique to Codex]
  Only Claude found: [findings unique to Claude's /review]
  Agreement rate: X% (N/M total unique findings overlap)
  1. Persist the review result:
bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"), findings (count of [P1] + [P2] markers), findings_fixed (count of findings that were addressed/fixed before shipping).
  1. Clean up temp files:
bash
rm -f "$TMPERR"
针对当前分支差异运行Codex代码审查。
  1. 创建用于捕获输出的临时文件:
bash
TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
  1. 运行审查(5分钟超时):
bash
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
在Bash调用中使用
timeout: 300000
。如果用户提供了自定义说明(例如
/codex review focus on security
),将其作为prompt参数传递:
bash
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
  1. 捕获输出。然后从stderr解析成本:
bash
grep "tokens used" "$TMPERR" 2>/dev/null || echo "tokens: unknown"
  1. 通过检查审查输出中的关键发现确定校验结果。 如果输出包含
    [P1]
    — 校验结果为FAIL。 如果未找到
    [P1]
    标记(只有
    [P2]
    或没有发现) — 校验结果为PASS
  2. 呈现输出:
CODEX的代码审查结果:
════════════════════════════════════════════════════════════
<完整的codex输出,原文呈现 — 不要截断或总结>
════════════════════════════════════════════════════════════
校验结果:PASS                    令牌数:14,331 | 估算成本:~$0.12
校验结果:FAIL(N个关键发现)
  1. 跨模型比较: 如果此对话中之前已运行过
    /review
    (Claude自己的审查),比较两组发现:
跨模型分析:
  两者都发现:[Claude和Codex重叠的发现]
  只有Codex发现:[Codex独有的发现]
  只有Claude发现:[Claude的/review独有的发现]
  一致率:X%(M个独特发现中有N个重叠)
  1. 保存审查结果:
bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
替换:TIMESTAMP(ISO 8601格式)、STATUS(如果PASS则为"clean",如果FAIL则为"issues_found")、GATE("pass"或"fail")、findings([P1]+[P2]标记的数量)、findings_fixed(发布前已解决的发现数量)。
  1. 清理临时文件:
bash
rm -f "$TMPERR"

Plan File Review Report

计划文件审查报告

After displaying the Review Readiness Dashboard in conversation output, also update the plan file itself so review status is visible to anyone reading the plan.
在对话输出中显示审查就绪仪表板后,还要更新计划文件本身,以便任何阅读计划的人都能看到审查状态。

Detect the plan file

检测计划文件

  1. Check if there is an active plan file in this conversation (the host provides plan file paths in system messages — look for plan file references in the conversation context).
  2. If not found, skip this section silently — not every review runs in plan mode.
  1. 检查此对话中是否有活动的计划文件(主机在系统消息中提供计划文件路径 — 在对话上下文中查找计划文件引用)。
  2. 如果未找到,静默跳过此部分 — 并非所有审查都在计划模式下运行。

Generate the report

生成报告

Read the review log output you already have from the Review Readiness Dashboard step above. Parse each JSONL entry. Each skill logs different fields:
  • plan-ceo-review: `status`, `unresolved`, `critical_gaps`, `mode`, `scope_proposed`, `scope_accepted`, `scope_deferred`, `commit` → Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred" → If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
  • plan-eng-review: `status`, `unresolved`, `critical_gaps`, `issues_found`, `mode`, `commit` → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
  • plan-design-review: `status`, `initial_score`, `overall_score`, `unresolved`, `decisions_made`, `commit` → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
  • codex-review: `status`, `gate`, `findings`, `findings_fixed` → Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
All fields needed for the Findings column are now present in the JSONL entries. For the review you just completed, you may use richer details from your own Completion Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
Produce this markdown table:
```markdown
读取你从审查就绪仪表板步骤中获得的审查日志输出。解析每个JSONL条目。每个技能记录不同的字段:
  • plan-ceo-review
    status
    unresolved
    critical_gaps
    mode
    scope_proposed
    scope_accepted
    scope_deferred
    commit
    → 发现:"{scope_proposed}个提案,{scope_accepted}个已接受,{scope_deferred}个已推迟" → 如果scope字段为0或缺失(HOLD/REDUCTION模式):"模式:{mode},{critical_gaps}个关键差距"
  • plan-eng-review
    status
    unresolved
    critical_gaps
    issues_found
    mode
    commit
    → 发现:"{issues_found}个问题,{critical_gaps}个关键差距"
  • plan-design-review
    status
    initial_score
    overall_score
    unresolved
    decisions_made
    commit
    → 发现:"评分:{initial_score}/10 → {overall_score}/10,{decisions_made}个决策"
  • codex-review
    status
    gate
    findings
    findings_fixed
    → 发现:"{findings}个发现,{findings_fixed}/{findings}个已修复"
发现列所需的所有字段现在都在JSONL条目中。对于你刚刚完成的审查,可以使用自己完成摘要中的更详细信息。对于之前的审查,直接使用JSONL字段 — 它们包含所有必需的数据。
生成以下markdown表格:
markdown
undefined

GSTACK REVIEW REPORT

GSTACK审查报告

ReviewTriggerWhyRunsStatusFindings
CEO Review`/plan-ceo-review`Scope & strategy{runs}{status}{findings}
Codex Review`/codex review`Independent 2nd opinion{runs}{status}{findings}
Eng Review`/plan-eng-review`Architecture & tests (required){runs}{status}{findings}
Design Review`/plan-design-review`UI/UX gaps{runs}{status}{findings}
```
Below the table, add these lines (omit any that are empty/not applicable):
  • CODEX: (only if codex-review ran) — one-line summary of codex fixes
  • CROSS-MODEL: (only if both Claude and Codex reviews exist) — overlap analysis
  • UNRESOLVED: total unresolved decisions across all reviews
  • VERDICT: list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required".
审查类型触发命令目的运行次数状态发现
CEO审查
/plan-ceo-review
范围与战略{runs}{status}{findings}
Codex审查
/codex review
独立第二意见{runs}{status}{findings}
工程审查
/plan-eng-review
架构与测试(必填){runs}{status}{findings}
设计审查
/plan-design-review
UI/UX差距{runs}{status}{findings}

在表格下方添加以下行(省略任何空的/不适用的行):

- **CODEX:**(仅当codex-review运行时) — Codex修复的一句话摘要
- **CROSS-MODEL:**(仅当Claude和Codex审查都存在时) — 重叠分析
- **UNRESOLVED:** 所有审查中未解决的决策总数
- **VERDICT:** 列出已通过的审查(例如"CEO + 工程审查已通过 — 可开始实施")。如果工程审查未通过且未全局跳过,附加"需要工程审查"。

Write to the plan file

写入计划文件

PLAN MODE EXCEPTION — ALWAYS RUN: This writes to the plan file, which is the one file you are allowed to edit in plan mode. The plan file review report is part of the plan's living status.
  • Search the plan file for a `## GSTACK REVIEW REPORT` section anywhere in the file (not just at the end — content may have been added after it).
  • If found, replace it entirely using the Edit tool. Match from `## GSTACK REVIEW REPORT` through either the next `## ` heading or end of file, whichever comes first. This ensures content added after the report section is preserved, not eaten. If the Edit fails (e.g., concurrent edit changed the content), re-read the plan file and retry once.
  • If no such section exists, append it to the end of the plan file.
  • Always place it as the very last section in the plan file. If it was found mid-file, move it: delete the old location and append at the end.

计划模式例外 — 必须运行: 此操作写入计划文件,这是你在计划模式下可以编辑的唯一文件。计划文件审查报告是计划动态状态的一部分。
  • 在计划文件中搜索
    ## GSTACK REVIEW REPORT
    部分(文件中的任何位置,不仅是末尾 — 内容可能添加在它之后)。
  • 如果找到,完全替换它:使用编辑工具,从
    ## GSTACK REVIEW REPORT
    开始,到下一个
    ## 
    标题或文件末尾(以先到者为准)。这样可以确保报告部分之后添加的内容被保留,不会被覆盖。如果编辑失败(例如并发编辑更改了内容),重新读取计划文件并重试一次。
  • 如果不存在此部分,追加到文件末尾。
  • 始终将其放在计划文件的最后一节。如果在文件中间找到,移动它:删除旧位置并追加到末尾。

Step 2B: Challenge (Adversarial) Mode

步骤2B:挑战(对抗性)模式

Codex tries to break your code — finding edge cases, race conditions, security holes, and failure modes that a normal review would miss.
  1. Construct the adversarial prompt. If the user provided a focus area (e.g.,
    /codex challenge security
    ), include it:
Default prompt (no focus): "Review the changes on this branch against the base branch. Run
git diff origin/<base>
to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."
With focus (e.g., "security"): "Review the changes on this branch against the base branch. Run
git diff origin/<base>
to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
  1. Run codex exec with JSONL output to capture reasoning traces and tool calls (5-minute timeout):
bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c "
import sys, json
for line in sys.stdin:
    line = line.strip()
    if not line: continue
    try:
        obj = json.loads(line)
        t = obj.get('type','')
        if t == 'item.completed' and 'item' in obj:
            item = obj['item']
            itype = item.get('type','')
            text = item.get('text','')
            if itype == 'reasoning' and text:
                print(f'[codex thinking] {text}')
                print()
            elif itype == 'agent_message' and text:
                print(text)
            elif itype == 'command_execution':
                cmd = item.get('command','')
                if cmd: print(f'[codex ran] {cmd}')
        elif t == 'turn.completed':
            usage = obj.get('usage',{})
            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
            if tokens: print(f'\ntokens used: {tokens}')
    except: pass
"
This parses codex's JSONL events to extract reasoning traces, tool calls, and the final response. The
[codex thinking]
lines show what codex reasoned through before its answer.
  1. Present the full streamed output:
CODEX SAYS (adversarial challenge):
════════════════════════════════════════════════════════════
<full output from above, verbatim>
════════════════════════════════════════════════════════════
Tokens: N | Est. cost: ~$X.XX

Codex尝试攻破你的代码 — 发现边缘情况、竞争条件、安全漏洞和正常审查会遗漏的故障模式。
  1. 构建对抗性提示。如果用户提供了重点领域(例如
    /codex challenge security
    ),将其包含在内:
默认提示(无重点): "审查此分支相对于基准分支的更改。运行
git diff origin/<base>
查看差异。你的任务是找出此代码在生产环境中失败的方式。像攻击者和混沌工程师一样思考。找出边缘情况、竞争条件、安全漏洞、资源泄漏、故障模式和静默数据损坏路径。保持对抗性。彻底检查。不要赞美 — 只列出问题。"
带重点的提示(例如"security"): "审查此分支相对于基准分支的更改。运行
git diff origin/<base>
查看差异。重点关注安全性。你的任务是找出攻击者可以利用此代码的所有方式。考虑注入向量、认证绕过、权限提升、数据泄露和时序攻击。保持对抗性。"
  1. 使用JSONL输出运行codex exec以捕获推理轨迹和工具调用(5分钟超时):
bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | python3 -c "
import sys, json
for line in sys.stdin:
    line = line.strip()
    if not line: continue
    try:
        obj = json.loads(line)
        t = obj.get('type','')
        if t == 'item.completed' and 'item' in obj:
            item = obj['item']
            itype = item.get('type','')
            text = item.get('text','')
            if itype == 'reasoning' and text:
                print(f'[codex思考] {text}')
                print()
            elif itype == 'agent_message' and text:
                print(text)
            elif itype == 'command_execution':
                cmd = item.get('command','')
                if cmd: print(f'[codex运行] {cmd}')
        elif t == 'turn.completed':
            usage = obj.get('usage',{})
            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
            if tokens: print(f'\n使用令牌数:{tokens}')
    except: pass
"
此命令解析codex的JSONL事件,提取推理轨迹、工具调用和最终响应。
[codex思考]
行显示codex在给出答案前的推理过程。
  1. 呈现完整的流式输出:
CODEX的对抗性挑战结果:
════════════════════════════════════════════════════════════
<上述完整输出,原文呈现>
════════════════════════════════════════════════════════════
令牌数:N | 估算成本:~$X.XX

Step 2C: Consult Mode

步骤2C:咨询模式

Ask Codex anything about the codebase. Supports session continuity for follow-ups.
  1. Check for existing session:
bash
cat .context/codex-session-id 2>/dev/null || echo "NO_SESSION"
If a session file exists (not
NO_SESSION
), use AskUserQuestion:
You have an active Codex conversation from earlier. Continue it or start fresh?
A) Continue the conversation (Codex remembers the prior context)
B) Start a new conversation
  1. Create temp files:
bash
TMPRESP=$(mktemp /tmp/codex-resp-XXXXXX.txt)
TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
  1. Plan review auto-detection: If the user's prompt is about reviewing a plan, or if plan files exist and the user said
    /codex
    with no arguments:
bash
ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
If no project-scoped match, fall back to
ls -t ~/.claude/plans/*.md 2>/dev/null | head -1
but warn: "Note: this plan may be from a different project — verify before sending to Codex." Read the plan file and prepend the persona to the user's prompt: "You are a brutally honest technical reviewer. Review this plan for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems.
THE PLAN: <plan content>"
  1. Run codex exec with JSONL output to capture reasoning traces (5-minute timeout):
For a new session:
bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
import sys, json
for line in sys.stdin:
    line = line.strip()
    if not line: continue
    try:
        obj = json.loads(line)
        t = obj.get('type','')
        if t == 'thread.started':
            tid = obj.get('thread_id','')
            if tid: print(f'SESSION_ID:{tid}')
        elif t == 'item.completed' and 'item' in obj:
            item = obj['item']
            itype = item.get('type','')
            text = item.get('text','')
            if itype == 'reasoning' and text:
                print(f'[codex thinking] {text}')
                print()
            elif itype == 'agent_message' and text:
                print(text)
            elif itype == 'command_execution':
                cmd = item.get('command','')
                if cmd: print(f'[codex ran] {cmd}')
        elif t == 'turn.completed':
            usage = obj.get('usage',{})
            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
            if tokens: print(f'\ntokens used: {tokens}')
    except: pass
"
For a resumed session (user chose "Continue"):
bash
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
<same python streaming parser as above>
"
  1. Capture session ID from the streamed output. The parser prints
    SESSION_ID:<id>
    from the
    thread.started
    event. Save it for follow-ups:
bash
mkdir -p .context
Save the session ID printed by the parser (the line starting with
SESSION_ID:
) to
.context/codex-session-id
.
  1. Present the full streamed output:
CODEX SAYS (consult):
════════════════════════════════════════════════════════════
<full output, verbatim — includes [codex thinking] traces>
════════════════════════════════════════════════════════════
Tokens: N | Est. cost: ~$X.XX
Session saved — run /codex again to continue this conversation.
  1. After presenting, note any points where Codex's analysis differs from your own understanding. If there is a disagreement, flag it: "Note: Claude Code disagrees on X because Y."

向Codex询问关于代码库的任何问题。支持会话连续性以便跟进。
  1. 检查现有会话:
bash
cat .context/codex-session-id 2>/dev/null || echo "NO_SESSION"
如果存在会话文件(不是
NO_SESSION
),使用AskUserQuestion:
你有一个之前的Codex对话会话。是继续还是重新开始?
A) 继续对话(Codex会记住之前的上下文)
B) 开始新对话
  1. 创建临时文件:
bash
TMPRESP=$(mktemp /tmp/codex-resp-XXXXXX.txt)
TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
  1. 计划审查自动检测: 如果用户的提示是关于审查计划,或者存在计划文件且用户输入了无参数的
    /codex
bash
ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
如果没有找到项目相关的匹配,回退到
ls -t ~/.claude/plans/*.md 2>/dev/null | head -1
但要警告:“注意:此计划可能来自其他项目 — 发送给Codex前请确认。” 读取计划文件并在用户提示前添加角色设定: "你是一个直白的技术审查者。审查此计划的以下方面:逻辑差距和未说明的假设、缺失的错误处理或边缘情况、过度复杂性(是否有更简单的方法?)、可行性风险(可能出现什么问题?)以及缺失的依赖项或顺序问题。直接表达。简洁明了。不要赞美。只列出问题。
计划内容: <计划内容>"
  1. 使用JSONL输出运行codex exec以捕获推理轨迹(5分钟超时):
对于新会话:
bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
import sys, json
for line in sys.stdin:
    line = line.strip()
    if not line: continue
    try:
        obj = json.loads(line)
        t = obj.get('type','')
        if t == 'thread.started':
            tid = obj.get('thread_id','')
            if tid: print(f'SESSION_ID:{tid}')
        elif t == 'item.completed' and 'item' in obj:
            item = obj['item']
            itype = item.get('type','')
            text = item.get('text','')
            if itype == 'reasoning' and text:
                print(f'[codex思考] {text}')
                print()
            elif itype == 'agent_message' and text:
                print(text)
            elif itype == 'command_execution':
                cmd = item.get('command','')
                if cmd: print(f'[codex运行] {cmd}')
        elif t == 'turn.completed':
            usage = obj.get('usage',{})
            tokens = usage.get('input_tokens',0) + usage.get('output_tokens',0)
            if tokens: print(f'\n使用令牌数:{tokens}')
    except: pass
"
对于恢复的会话(用户选择“继续”):
bash
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
<与上述相同的python流式解析器>
"
  1. 从流式输出中捕获会话ID。解析器从
    thread.started
    事件中打印
    SESSION_ID:<id>
    。保存它以便跟进:
bash
mkdir -p .context
将解析器打印的会话ID(以
SESSION_ID:
开头的行)保存到
.context/codex-session-id
  1. 呈现完整的流式输出:
CODEX的咨询结果:
════════════════════════════════════════════════════════════
<完整输出,原文呈现 — 包括[codex思考]轨迹>
════════════════════════════════════════════════════════════
令牌数:N | 估算成本:~$X.XX
会话已保存 — 再次运行/codex即可继续此对话。
  1. 呈现后,记录Codex分析与你自己理解不同的任何点。如果存在分歧,标记出来: "注意:Claude Code在X问题上存在分歧,因为Y。"

Model & Reasoning

模型与推理

Model: No model is hardcoded — codex uses whatever its current default is (the frontier agentic coding model). This means as OpenAI ships newer models, /codex automatically uses them. If the user wants a specific model, pass
-m
through to codex.
Reasoning effort: All modes use
xhigh
— maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
Web search: All codex commands use
--enable web_search_cached
so Codex can look up docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
If the user specifies a model (e.g.,
/codex review -m gpt-5.1-codex-max
or
/codex challenge -m gpt-5.2
), pass the
-m
flag through to codex.

模型: 没有硬编码模型 — codex使用其当前默认模型(前沿的智能编码模型)。这意味着当OpenAI发布更新的模型时,/codex会自动使用它们。如果用户想要特定模型,将
-m
参数传递给codex。
推理力度: 所有模式都使用
xhigh
— 最大推理能力。在审查代码、攻破代码或咨询架构时,你希望模型尽可能深入思考。
网络搜索: 所有codex命令都使用
--enable web_search_cached
,以便Codex在审查期间查找文档和API。这是OpenAI的缓存索引 — 快速、无额外成本。
如果用户指定了模型(例如
/codex review -m gpt-5.1-codex-max
/codex challenge -m gpt-5.2
),将
-m
标志传递给codex。

Cost Estimation

成本估算

Parse token count from stderr. Codex prints
tokens used\nN
to stderr.
Display as:
Tokens: N
If token count is not available, display:
Tokens: unknown

从stderr解析令牌数。Codex会将
tokens used\nN
打印到stderr。
显示为:
令牌数:N
如果无法获取令牌数,显示:
令牌数:未知

Error Handling

错误处理

  • Binary not found: Detected in Step 0. Stop with install instructions.
  • Auth error: Codex prints an auth error to stderr. Surface the error: "Codex authentication failed. Run
    codex login
    in your terminal to authenticate via ChatGPT."
  • Timeout: If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow. Try again or use a smaller scope."
  • Empty response: If
    $TMPRESP
    is empty or doesn't exist, tell the user: "Codex returned no response. Check stderr for errors."
  • Session resume failure: If resume fails, delete the session file and start fresh.

  • 未找到二进制文件: 在步骤0中检测到。停止并显示安装说明。
  • 认证错误: Codex会将认证错误打印到stderr。显示错误: "Codex认证失败。在终端中运行
    codex login
    通过ChatGPT进行认证。"
  • 超时: 如果Bash调用超时(5分钟),告知用户: "Codex在5分钟后超时。差异可能太大或API速度较慢。请重试或缩小范围。"
  • 空响应: 如果
    $TMPRESP
    为空或不存在,告知用户: "Codex未返回响应。请检查stderr中的错误。"
  • 会话恢复失败: 如果恢复失败,删除会话文件并重新开始。

Important Rules

重要规则

  • Never modify files. This skill is read-only. Codex runs in read-only sandbox mode.
  • Present output verbatim. Do not truncate, summarize, or editorialize Codex's output before showing it. Show it in full inside the CODEX SAYS block.
  • Add synthesis after, not instead of. Any Claude commentary comes after the full output.
  • 5-minute timeout on all Bash calls to codex (
    timeout: 300000
    ).
  • No double-reviewing. If the user already ran
    /review
    , Codex provides a second independent opinion. Do not re-run Claude Code's own review.
  • 绝不修改文件。 此技能为只读模式。Codex在只读沙箱模式下运行。
  • 原文呈现输出。 在显示前不要截断、总结或编辑Codex的输出。在CODEX SAYS块中完整显示。
  • 在输出后添加综合分析,而非替代输出。 任何Claude的评论都要在完整输出之后。
  • 所有Bash调用codex时设置5分钟超时
    timeout: 300000
    )。
  • 不要重复审查。 如果用户已经运行过
    /review
    ,Codex提供独立的第二意见。不要重新运行Claude Code自己的审查。