Agent Spec Tool-First Workflow
Version: 3.1.0 | Last Updated: 2026-03-08
You are an expert at using
as a CLI tool for contract-driven AI coding. Help users by:
- Planning: Render task contracts before coding with
- Implementing: Follow contract Intent, Decisions, Boundaries
- Verifying: Run / to check code against specs
- Reviewing: Use for human-readable summaries, for git trailers
- Debugging: Interpret verification failures and fix code accordingly
Core Mental Model
The key shift: Review point displacement. Human attention moves from "reading code diffs" to "writing contracts".
Traditional: Write Issue (10%) → Agent codes (0%) → Read diff (80%) → Approve (10%)
agent-spec: Write Contract (60%) → Agent codes (0%) → Read explain (30%) → Approve (10%)
Humans define "what is correct" (Contract). Machines verify "is the code correct" (lifecycle). Humans do final "Contract Acceptance" — not Code Review.
Quick Reference
| Command | Purpose | When to Use |
|---|
| Scaffold new spec | Starting a new task |
agent-spec contract <spec>
| Render Task Contract | Before coding - read the execution plan |
| Spec quality check | After writing spec, before giving to Agent |
agent-spec lifecycle <spec> --code .
| Full lint + verify pipeline | After edits - main quality gate |
agent-spec guard --spec-dir specs --code .
| Repo-wide check | Pre-commit / CI - all specs at once |
agent-spec explain <spec> --format markdown
| PR-ready review summary | Contract Acceptance - paste into PR |
agent-spec explain <spec> --history
| Execution history | See how many retries the Agent needed |
agent-spec stamp <spec> --dry-run
| Preview git trailers | Before committing - traceability |
agent-spec verify <spec> --code .
| Raw verification only | When you want verify without lint gate |
agent-spec checkpoint status
| VCS-aware status | Check uncommitted state |
Documentation
Refer to the local files for detailed command patterns:
- - Complete CLI command reference with all flags
IMPORTANT: Documentation Completeness Check
Before answering questions, Claude MUST:
- Read for exact command syntax
- If file read fails: Inform user "references/commands.md is missing, answering from SKILL.md patterns"
- Still answer based on SKILL.md patterns + built-in knowledge
The Seven-Step Workflow
Step 1: Human writes Task Contract (human attention: 60%)
Not a vague Issue — a structured Contract with Intent, Decisions, Boundaries, Completion Criteria.
bash
agent-spec init --level task --lang zh --name "用户注册API"
# Then fill in the four elements in the generated .spec file
Key principle: Exception scenarios >= happy path scenarios. 1 happy + 3 error paths forces you to think through edge cases before coding begins.
Step 2: Contract quality gate
Check Contract quality before handing to Agent. Like "code review" but for the Contract itself.
bash
agent-spec lint specs/user-registration.spec --min-score 0.7
Catches: vague verbs, unquantified constraints, non-deterministic wording, missing test selectors, sycophancy bias, uncovered constraints.
Optional: team "Contract Review" — review 50-80 lines of natural language instead of 500 lines of code diff.
Step 3: Agent reads Contract and codes
Agent consumes the structured contract:
bash
agent-spec contract specs/user-registration.spec
Agent is triple-constrained:
- Decisions tell it "how to do it" (no technology shopping)
- Boundaries tell it "what to touch" (no unauthorized file changes)
- Completion Criteria tell it "when it's done" (all bound tests must pass)
Step 4: Agent self-checks with lifecycle (automatic retry loop)
bash
agent-spec lifecycle specs/user-registration.spec \
--code . --change-scope worktree --format json --run-log-dir .agent-spec/runs
Four verification layers run in sequence:
- lint — re-check Contract quality (prevent spec tampering)
- StructuralVerifier — pattern match Must NOT constraints against code
- BoundariesVerifier — check changed files are within Allowed Changes
- TestVerifier — execute tests bound to each scenario
Agent retry loop (no human needed):
Code → lifecycle → FAIL (2/5) → read failure_summary → fix → lifecycle → FAIL (4/5) → fix → lifecycle → PASS (5/5) ✓
Run logs record this history — "this Contract took 3 tries to pass".
Retry Protocol
When lifecycle fails, follow this exact sequence:
- Run:
agent-spec lifecycle <spec> --code . --format json
- Parse JSON output, find each scenario's and
- For : the bound test ran and failed — read evidence to understand why, fix code
- For : the bound test was not found — check selector matches a real test name
- For : AI verification pending — review manually or enable AI backend
- Fix code based on evidence. Do NOT modify the spec file — changing the Contract to make verification pass is sycophancy, not a fix
- Re-run lifecycle
- After 3 consecutive failures on the same scenario, stop and escalate to the human
Critical rule: The spec defines "what is correct". If the code doesn't match, fix the code. If the spec itself is wrong, switch to authoring mode and update the Contract explicitly — never silently weaken acceptance criteria.
Step 5: Guard gate (pre-commit / CI)
bash
# Pre-commit hook
agent-spec guard --spec-dir specs --code . --change-scope staged
# CI (GitHub Actions)
agent-spec guard --spec-dir specs --code . --change-scope worktree
Runs lint + verify on ALL specs against current changes. Blocks commit/PR if any spec fails.
Step 6: Contract Acceptance replaces Code Review (human attention: 30%)
Human reviews a Contract-level summary, not a code diff:
bash
agent-spec explain specs/user-registration.spec --code . --format markdown
Reviewer judges two questions:
- Is the Contract definition correct? (Intent, Decisions, Boundaries make sense?)
- Did all verifications pass? (4/4 pass including error paths?)
If both "yes" → approve. This is 10x faster than reading code diffs.
Check retry history if needed:
bash
agent-spec explain specs/user-registration.spec --code . --history
Assisting Contract Acceptance
When helping a human review a completed task:
- Run
agent-spec explain <spec> --code . --format markdown
and present the output
- If human asks about retry history: run with flag
- If human asks about specific failures: run
agent-spec lifecycle <spec> --code . --format json
and extract the relevant scenario results
- If human approves: run
agent-spec stamp <spec> --code . --dry-run
and present the trailers
Step 7: Stamp and archive
bash
agent-spec stamp specs/user-registration.spec --dry-run
# Output: Spec-Name: 用户注册API
# Spec-Passing: true
# Spec-Summary: 4/4 passed, 0 failed, 0 skipped, 0 uncertain
Establishes Contract → Commit traceability chain.
Verdict Interpretation
| Verdict | Meaning | Action |
|---|
| Scenario verified | No action needed |
| Scenario failed verification | Read evidence, fix code |
| Test not found or not run | Add missing test or fix selector |
| AI stub / manual review needed | Review manually or enable AI backend |
Key rule: != . All four verdicts are distinct.
VCS Awareness
agent-spec auto-detects the VCS from the project root. Behavior differs between git and jj:
| Condition | Behavior |
|---|
| exists (even with ) | Use instead of |
| jj repo | Do NOT run or — jj auto-snapshots all changes |
| jj repo | output includes trailer with jj change ID |
| jj repo | shows file-level diffs between runs (via operation IDs) |
| Only | Use standard git commands ( or ) |
| Neither | Change scope detection unavailable; use explicitly |
Change Set Options
| Flag | Behavior | Default |
|---|
| Explicit file/dir for boundary checking | (none) |
| Git staged files | guard default |
| All git working tree changes | (none) |
| Jujutsu VCS changes | (none) |
| No change detection | lifecycle/verify default |
Advanced Features
Verification Layers
bash
# Run only specific layers
agent-spec lifecycle specs/task.spec --code . --layers lint,boundary,test
# Available: lint, boundary, test, ai
Run Logging
bash
agent-spec lifecycle specs/task.spec --code . --run-log-dir .agent-spec/runs
agent-spec explain specs/task.spec --history
AI Mode
bash
agent-spec verify specs/task.spec --code . --ai-mode off # default - no AI
agent-spec verify specs/task.spec --code . --ai-mode stub # testing only
agent-spec lifecycle specs/task.spec --code . --ai-mode caller # agent-as-verifier
AI Verification: Caller Mode
When
is used, the calling Agent acts as the AI verifier. This is a two-step protocol:
Step 1: Emit AI requests
bash
agent-spec lifecycle specs/task.spec --code . --ai-mode caller --format json
If any scenarios are skipped (no mechanical verifier covered them), the output JSON includes:
"ai_requests_file": ".agent-spec/pending-ai-requests.json"
The pending requests file contains
objects with scenario context, code paths, contract intent, and constraints.
Step 2: Resolve with external decisions
The Agent reads the pending requests, analyzes each scenario, then writes decisions:
json
[
{
"scenario_name": "场景名称",
"model": "claude-agent",
"confidence": 0.92,
"verdict": "pass",
"reasoning": "All steps verified by code analysis"
}
]
Then merges them back:
bash
agent-spec resolve-ai specs/task.spec --code . --decisions decisions.json
This produces a final merged report where Skip verdicts are replaced with the Agent's AI decisions.
When to use caller mode:
- When the calling Agent (Claude, Codex, etc.) can read and reason about code
- For scenarios that can't be verified by tests alone (design intent, code quality)
- When you want the Agent to be both implementor and verifier
When to Use / When NOT to Use
| Scenario | Use agent-spec? | Why |
|---|
| Clear feature with defined inputs/outputs | Yes | Contract can express deterministic acceptance criteria |
| Bug fix with reproducible steps | Yes | Great for "given bug X, when fixed, then Y" |
| Exploratory prototyping | No | You don't know "what is done" yet - vibe code first |
| Large architecture refactor | No | Boundaries hard to define, "better architecture" isn't testable |
| Security/compliance rules | Yes (org.spec) | Encode rules once, enforce mechanically everywhere |
Gradual Adoption
Week 1-2: Pick 2-3 clear bug fixes, write Contracts for them
Week 3-4: Expand to new feature development
Week 5-8: Create project.spec with team coding standards
Month 3+: Consider org.spec for cross-project governance
Common Errors
| Error | Cause | Solution |
|---|
| Guard reports N specs failing | Specs have lint or verify issues | Run on each failing spec individually |
| verdict on scenario | Test selector doesn't match any test | Check / / in spec |
| Quality score below threshold | Too many lint warnings | Fix vague verbs, add quantifiers, improve testability |
| Boundary violation detected | Changed file outside allowed paths | Either update Boundaries or revert the change |
| on all AI scenarios | Using or no backend | Expected — review manually |
| Agent keeps failing lifecycle | Contract criteria too vague or too strict | Improve Completion Criteria specificity |
Command Priority
| Preference | Use | Instead of |
|---|
| Render task contract | (legacy alias) |
| Full pipeline | alone (misses lint) |
| Repo-wide | Multiple individual calls |
| Explicit paths known | when paths are known |
| CLI commands | Tool-first approach | library API |
When to Switch to Authoring Mode
During implementation, if you discover:
- A missing exception path that should be in Completion Criteria
- A Boundary that's too restrictive (need to modify more files than allowed)
- A Decision that needs to change (technology choice was wrong)
Switch to
skill, update the Contract FIRST, re-run
to validate the change, then resume implementation. Do NOT silently work outside the Contract's boundaries.
Escalation
Switch to library integration only when:
- Embedding into another Rust agent runtime
- Testing internals
- Injecting a host via
verify_with_backend(Arc<dyn AiBackend>)