/goal Prompt Builder
This skill turns a fuzzy task description ("我想让 Codex 帮我重构鉴权") into a complete, audit-friendly
command that's ready to paste into Codex CLI 0.128+.
Why this skill exists
Codex 0.128 added
as a persistent objective with a runtime-injected audit prompt (
). The audit prompt forces the model to build a "prompt-to-artifact checklist" — but
only if the user's goal text can be mapped to one. Vague goals produce vague checklists, which produce false completions. This skill exists to make sure every
you generate has the structure that lets the audit mechanism actually work.
The golden template (5 sections, in this order)
Every
Claude generates with this skill follows this structure exactly:
/goal <objective>.
[Optional: First action: read X, Y, Z and report counts. Wait for ack.]
Scope: <files / subsystem / feature area>.
Constraints:
- <what not to change>
- <compatibility / permission boundaries>
- <project-specific rules from AGENTS.md / CLAUDE.md>
Done when:
1. <verifiable artifact 1 — cite file or command>
2. <verifiable artifact 2>
...
Stop if:
- <mechanically detectable condition 1>
- <mechanically detectable condition 2>
...
Use a token budget of <N> tokens for this goal.
Why this order: matches
's expected reading flow — objective first, then scope to bound the search, then constraints to prune options, then acceptance to define success, then stop-if as runtime guards.
Workflow
When this skill triggers, walk the user through these 6 steps. Step 0 (interaction mode) and Step 1 (project detection) happen automatically — Step 0 needs one user choice, Step 1 needs zero if filesystem is accessible. Don't skip steps unless the user explicitly says "I'll fill it in myself, just give me the template".
Step 0: Pick interaction mode
This skill supports three interaction modes. Ask once at the start:
你希望用哪种方式生成 /goal?
- A. 询问式 — 我一段一段问你(最稳,适合第一次写 /goal)
- B. 全描述式 — 你一句话描述需求,我拆解后只问你确认不确定的地方(最快,适合熟手)
- C. 混合式(默认) — 先选场景模板,再问 3-5 个关键问题(推荐)
Once chosen, follow the matching flow:
- A. 询问式 → Step 1 → Step 2 → Step 3a → 3b → 3c → 3d → 3e → Step 4 → Step 5 (each input gathered separately)
- B. 全描述式 → Step 1 → ask "用一段话描述你想做什么 / scope / 验收 / 不希望发生什么" → parse into 5 sections → Step 4 → ask user to confirm only the gaps → Step 5
- C. 混合式 → Step 1 → Step 2 → Step 3 (batched: ask all missing fields at once) → Step 4 → Step 5
Mode B is most powerful when the user has thought about the task. Mode A is safest when they haven't. Mode C is the default sweet spot.
If the user doesn't answer this question explicitly, default to mode C and proceed.
Step 1: Detect (don't ask) the project type
Auto-detection comes first. Only fall back to asking if detection fails.
Run this detection sequence:
-
Check the conversation context. Has the user already provided a repo URL, file path, or code snippet? Read those for hints first.
-
Probe the filesystem (if you have file tools and the user is in a project directory):
- exists → Node / TypeScript
- or or → Python
- or → Swift / iOS
- → Rust
- → Go
- / / / → Static / docs project
-
Fetch the repo if the user gave a URL (only if web tools available):
- GitHub URL → fetch the README + try to identify config files
- Read and if they exist (these are gold for Constraints)
-
Fall back to asking only if all auto-detection failed:
我没法自动判断项目类型——这是 Node / Python / Swift / Go / Rust / 静态 / 其他?
When detection succeeds, announce what you found in one sentence so the user can correct you:
检测到这是一个 Swift / iOS 项目(找到 lingolearn.xcodeproj + CLAUDE.md)。
我会按 Swift 项目默认约束适配。如果不对,告诉我。
Then load the matching reference from
references/project-types.md
.
Also load any / found — these contain project-specific rules that override defaults.
Step 2: Pick a scenario template
Ask: 这个 goal 属于哪种类型?
| 选项 | 说明 |
|---|
| A. 重构 | 改一个文件 / 子系统 |
| B. 新功能实现 | 已有 SDD spec 的功能 |
| C. 批量补测试 / 修 bug | 重复型任务,可枚举来源 |
| D. 代码考古 / 研究 | 只读不动手 |
| E. UI / 行为 audit | 对照文档审实现 |
| F. 守门员 review | 评估能否合并,不修改 |
| G. 自定义 | 让我描述 |
Each option maps to a different default skeleton — see
for the full templates.
Step 3: Gather the 5 inputs (in this order)
Ask only what's still missing. Don't ask all 5 at once — ask incrementally so the user can think.
3a. Objective (一句话)
- One sentence describing what changes by the end.
- If the user gives a verb-less noun phrase ("Cohere rerank support"), turn it into a verb phrase ("Add Cohere rerank support to retrieval pipeline").
- Reject vague verbs like "improve", "optimize", "clean up" — ask for the concrete change.
3b. Scope (改什么、不改什么)
- Which files / directories / subsystems are in play
- For brownfield projects: probe whether v1.x beta files or sensitive modules exist that should be off-limits
3c. Constraints (硬约束)
- Pull from AGENTS.md / CLAUDE.md if available
- Add project-type defaults from the loaded reference (e.g., for Swift: "do not modify project.pbxproj")
- Ask if there's a "MUST NOT modify" list
3d. Done when (验收清单)
- This is the most important section. Push back hard if items are vague.
- Each item must cite a file, command, test name, or measurable artifact.
- Replace "测试通过" → " exits 0; paste summary"
- Replace "做完" → enumerate the deliverables
- Aim for 5-8 items; fewer than 3 is a red flag
3e. Stop if (停止条件)
- Each must be mechanically detectable
- Include project-type defaults (e.g., for Node: "needs npm install for new dep")
- Include a regression guard: "existing tests start failing — do not fix by editing tests"
Step 4: Predict audit-friendliness
Before showing the final command, internally score it (don't show the math, just the verdict):
- Acceptance count: 0 = bad, 1-2 = warn, 3-5 = good, 6-8 = excellent
- Vague verbs detected: "improve", "optimize", "全部", "彻底", "all", "everything" → flag
- Stop-if specificity: "if unclear" = bad, "if file X appears in git diff" = good
- Token budget present: missing = warn
- Mechanical verifiability: every Done-when item has a cite-able artifact = good
If score is below ~70%, don't ship the command yet. Instead, surface the weak spots and ask the user to refine. Be specific:
⚠ 我发现三处可以加强的地方:
- Done when 第 2 条"测试覆盖" → 改成"测试名称 + 退出码"会更准
- Stop if 缺少"现有测试 regression"兜底
- Token 预算未指定,建议 80K(基于 scope 大小)
Step 5: Render and cite design choices
When all checks pass, render the final
in a code block (so it's copy-pasteable) and follow with a
brief explanation of the key design choices — not a tutorial, just enough so the user knows why each choice was made.
Format:
/goal <full command here>
几个关键设计选择:
- 为什么 Done when 第 N 条这么写
- 为什么 Stop if 包含某条
- 为什么预算定这个数
Keep this explanation under 8 short lines. The user is here for the command, not a lecture.
Project type loading
When the project type is known, read the corresponding reference:
- Node / TypeScript →
references/project-types.md
(Node section)
- Python →
references/project-types.md
(Python section)
- Swift →
references/project-types.md
(Swift section)
- Go →
references/project-types.md
(Go section)
- Rust →
references/project-types.md
(Rust section)
- Static / docs →
references/project-types.md
(Static section)
Each section provides:
- Default test command with full flags
- Default build / type-check command
- Project-type-specific Stop-if bullets to include
- Common false-completion traps to guard against
Scenario templates
For the chosen scenario, read the corresponding skeleton:
- A. 重构 → § Refactor
- B. 新功能实现 → § Feature
- C. 批量补测试 / 修 bug → § Batch
- D. 代码考古 → § Archaeology
- E. UI / 行为 audit → § UI Audit
- F. 守门员 review → § Gatekeeper
- G. 自定义 → use bare 5-section template, no skeleton
Each scenario has its own emphasis — e.g., archaeology goals emphasize Constraints (禁区), feature goals emphasize "First action: read SPEC + report counts".
Worked examples
When the user's case is ambiguous about how to fill a section, consult
. It contains 5 end-to-end transformations (vague request → final command) covering the most common patterns:
- Refactor (Node/TS) — cleanest baseline
- Feature with SDD spec (Swift) — most heavily constrained
- Vague request → push back instead of rendering — when not to render
- Archaeology (static / docs project) — read-only goals
- "Just give me the template" — when to skip the interview
Especially read example 3 for guidance on when to refuse rendering. Don't ship a goal you can't defend — surface the weak spots and ask for refinement first.
Hard rules (always follow)
These are non-negotiable. They come from
's actual behavior.
- Never write Stop-if as "if unclear, stop" — that's not mechanically detectable. Ask the user to enumerate concrete conditions.
- Never let "all / everything / 全部 / 彻底" through — flag and ask for a number or enumerable source.
- Always include a token budget — missing budget = no soft stop = potential runaway.
- Always include a "no test-rewriting" stop-if for any goal that touches tested code: "Existing tests start failing — this is a regression, do not 'fix' by editing tests."
- For SDD-driven goals (scenario B), the first action is always "read X files and report counts" — bypasses reference uncertainty and exposes loading failures early.
- For brownfield projects, always ask about MUST NOT modify list — the absence of one is the #1 cause of scope creep.
Common failure modes to coach the user through
- "测试通过"做验收: too vague. Force into "exact command + exit code + paste summary".
- 没有 Stop if: goal is a one-way door. Add at least 3 mechanically detectable conditions.
- 预算 > 300K: too big to audit reliably. Suggest splitting into two goals.
- Scope = "整个仓库": too wide. Push for a specific directory or subsystem.
- "我之后再补 acceptance": this is the failure pattern. Refuse to render until at least 3 items exist.
What this skill does NOT do
- Does not run the for the user — it only generates the text
- Does not validate the project state (no checks, no test runs)
- Does not handle Codex versions older than 0.128 — doesn't exist there
- Does not generate prompts for , , or other Codex commands
Output format reminder
Final output is always:
- A markdown code block containing the command (so the user can copy)
- A short bullet list of the key design choices (no more than 8 short lines)
- Optional: a one-line "audit-friendliness verdict" (e.g., "审计友好度:优秀 · 7 项验收 · 0 风险标记")
That's it. No long lecture. The user is here for a command.