C/C++ Security Review
Runs in the main conversation (invoke via
). Orchestrator owns the
ledger as bookkeeping for retries; workers and judges have no Task tools. Workers and judges are named plugin subagents (
,
c-review:c-review-dedup-judge
,
c-review:c-review-fp-judge
); tool sets are declared in
plugins/c-review/agents/*.md
. Findings are exchanged via markdown-with-YAML files in a shared output directory.
When to Use
Native C/C++ application security review: memory safety, integer overflow, races, type confusion, Linux/macOS daemons, Windows userspace services.
When NOT to Use
- Kernel drivers/modules (Linux, Windows, macOS).
- Managed languages (Java, C#, Python, Go, Rust).
- Embedded/bare-metal code without libc.
Subagents
| Subagent type | Purpose | Tool set |
|---|
| Run assigned cluster, write findings | Read, Write, Edit, Grep, Glob, Bash |
c-review:c-review-dedup-judge
| Merge duplicates (runs first) | Read, Write, Edit, Glob |
c-review:c-review-fp-judge
| FP + severity + final reports (runs second) | Read, Write, Edit, Grep, Glob, Bash |
Tools come from each agent's frontmatter at spawn time. The orchestrator's
/
/
/etc. come from this skill's
.
Architecture
coordinator: write context.md → build_run_plan.py → TaskCreate × M
→ spawn primer (foreground) → spawn M workers (parallel)
→ classify Phase-7 outcomes + write findings-index.txt
→ dedup-judge → fp-judge → SARIF safety net → return REPORT.md
Output directory contains:
,
,
,
,
(per-worker shards),
,
,
,
,
,
.
Path convention: set
${C_REVIEW_PLUGIN_ROOT}=${CLAUDE_PLUGIN_ROOT}
if that resolves (
Bash: ls "${CLAUDE_PLUGIN_ROOT}/prompts/clusters/buffer-write-sinks.md"
), otherwise
Bash: find ~/.claude -path '*/plugins/c-review/prompts/clusters/buffer-write-sinks.md' -print -quit
.
Scope convention: keep two scopes separate throughout the run:
- — the user-requested audit subtree. Workers may only file findings whose vulnerable location is inside this subtree.
- — read-only repo roots/files workers and judges may inspect to verify reachability, callers, wrappers, build flags, mitigations, and threat-model details. Default to unless the user explicitly forbids broader context. Reading context outside is allowed; filing findings there is not.
Rationalizations to Reject
- "Background spawns parallelize the workers." They do not — calls in a single assistant message already run concurrently. defeats the Phase 6a primer cache, so every worker pays full cache-creation (
cache_read_input_tokens=0
) and the ~15 K-token primer is wasted M times. This is the single most common defect — multiple recent runs spawned 7-of-8 (or all) workers with . Default: omit from worker spawns.
- "I'll re-derive the cluster list / paths / pass prefixes inline instead of running ." The script is the only authority for selection and rendering. Paraphrasing it drops fields that the worker self-check requires, producing
worker-N abort: spawn prompt malformed
. Always run the script and .
- "The run partially succeeded — I'll just write from what completed." Hiding partial runs behind a successful report is a correctness bug. If any Phase-5 cluster task is not , surface it prominently in and the final response.
- "Zero findings — skip Phase 8." Always run both judges and Phase 8b: dedup-judge writes a minimal no-op on an empty index, fp-judge writes empty /, and Phase 8b's SARIF generator emits for the empty case. SARIF consumers depend on a stable artifact set.
- " is fine for the preflight." Under zsh, an unmatched glob aborts the whole compound command before runs. Use (preferred) or (never fails on no-match).
Orchestration Workflow
Run these phases in the main conversation.
Phase 0: Parameter Collection
Entry: skill invoked.
Exit: ,
,
resolved;
resolved or set to
;
finding_scope_root=scope_subpath
;
resolved.
The skill is invoked directly (no command wrapper). Parse any free-text arguments the user passed on the
line (e.g.
,
,
) and pre-fill the answers they imply — then ask for any missing required parameters with
one call. Never silently default the required parameters.
Required parameters:
| Parameter | Values | How to infer from args |
|---|
| / / | Words like "remote", "network", "attacker" → ; "local", "unprivileged" → ; otherwise ask. |
| / / | Explicit model name in args. Otherwise ask (no silent default). |
| / / | "all", "every", "noisy" → ; "medium and above" → ; "high only", "criticals only" → . Otherwise ask — no silent default. |
| repo-relative directory (optional) | Phrases like "X only", "just audit X/", "review subdirectory X" → or the matching subdir. Apply fuzzy matching against top-level subdirectories of the repo. If absent, set ; if ambiguous, ask. |
Call
exactly once with only unresolved required parameters (
,
,
) plus
only when the user explicitly requested a narrowed scope but it is ambiguous. If the required parameters were all pre-filled and scope is absent or resolved, skip the question.
After resolving
, set
finding_scope_root="${scope_subpath:-.}"
. Set
by default so workers can verify callers/build settings outside a narrowed subtree without filing out-of-scope findings. If the user explicitly asks to forbid broader context, set
context_roots="${finding_scope_root}"
and note that reachability confidence may be lower.
Phase 1: Prerequisites
Entry: Phase 0 complete.
Exit: ,
,
flags determined.
Probe within
. Prefer
/
when available in the orchestrator's tool set; some sessions only expose
, so fall back to the equivalents below — both forms produce identical signals (non-empty output ⇒ flag true):
bash
# is_cpp
find "${finding_scope_root:-.}" -type f \( -name '*.cpp' -o -name '*.cxx' -o -name '*.cc' -o -name '*.hpp' -o -name '*.hh' \) -print -quit
# is_posix
grep -rlE '#include[[:space:]]*<(pthread|signal|sys/(socket|stat|types|wait)|unistd|errno)\.h>' \
--include='*.c' --include='*.h' \
--include='*.cpp' --include='*.cxx' --include='*.cc' --include='*.hpp' --include='*.hh' \
"${finding_scope_root:-.}" | head -1
# is_windows
grep -rlE '#include[[:space:]]*<(windows|winbase|winnt|winuser|winsock|ntdef|ntstatus)\.h>' \
--include='*.c' --include='*.h' \
--include='*.cpp' --include='*.cxx' --include='*.cc' --include='*.hpp' --include='*.hh' \
"${finding_scope_root:-.}" | head -1
is informational (no agent currently uses LSP), but the probe is mandatory so the run summary records whether richer local tooling is available. Probe via
Glob: **/compile_commands.json
under
. If
is unavailable, use:
bash
printf '%s\n' "${context_roots:-.}" | tr ',' '\n' | while IFS= read -r root; do
[ -n "$root" ] && find "$root" -name compile_commands.json -print -quit
done | head -1
# `find "$root"` is quoted intentionally so a context root containing spaces
# (e.g. "/Users/me/My Repo") survives word-splitting. Do not unquote it.
If absent, suggest CMake
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON
/Bear/compiledb to the user but continue.
Phase 2: Output Directory
Entry: Phase 1 flags set.
Exit: absolute
resolved;
exists.
Resolve an absolute path for
(default:
$(pwd)/.c-review-results/$(date -u +%Y%m%dT%H%M%SZ)/
):
bash
mkdir -p "${output_dir}/findings"
Phase 3: Codebase Context
Entry: exists.
Exit: written.
Skim
and any build file (
,
,
,
) — preflight with the
tool before any
(a
on a missing file aborts the turn). Do
not use
for the preflight: under zsh, an unmatched glob aborts the whole compound command before
runs (observed: a Phase-3
call failed with
and dropped the entire preflight). If you must use
, use
find . -maxdepth 2 -name 'README*' -o -name 'Makefile' -o -name 'CMakeLists.txt' -o -name 'meson.build'
, which never fails on no-match.
Write
with: YAML frontmatter (
,
,
,
,
,
,
,
,
,
as
/
plus path when present), then a short markdown body with five sections —
Purpose (1-3 sentences),
Scope (what's in
, and that findings outside it are out of scope),
Entry points (where untrusted data enters: network, files, CLI, IPC),
Trust boundaries (sandboxed vs trusted peers vs arbitrary remote),
Existing hardening (fuzzing corpora, sanitizers, privilege separation).
Phase 4: Build Run Plan (deterministic)
Entry: language flags +
known;
exists.
Exit: and
${output_dir}/worker-prompts/*.txt
written;
known.
Selection, filtering, path resolution, and spawn-prompt rendering are delegated to the script to prevent the "orchestrator paraphrases the spawn template and drops fields" failure mode:
bash
python3 "${C_REVIEW_PLUGIN_ROOT}/scripts/build_run_plan.py" \
--plugin-root "${C_REVIEW_PLUGIN_ROOT}" --output-dir "${output_dir}" \
--threat-model "${threat_model}" --severity-filter "${severity_filter}" \
--scope-subpath "${finding_scope_root:-.}" --context-roots "${context_roots:-.}" \
--is-cpp "${is_cpp}" --is-posix "${is_posix}" --is-windows "${is_windows}"
The script writes
+
worker-prompts/worker-N.txt
+ (if
, the default)
worker-prompts/cache-primer.txt
, and prints a JSON summary on stdout. Exits non-zero on any missing prompt — surface the message and stop. Typical M: 7 (C POSIX), 8 (C++ POSIX), 10 (C POSIX + Windows), 11 (C++ POSIX + Windows). After it returns,
for the structured selection — never re-derive filtering or paths.
Phase 5: Create Bookkeeping Tasks (orchestrator-internal)
Entry: exists;
.
Exit: created (1:1 with
), all
.
The task ledger is
orchestrator bookkeeping only (TUI visibility + Phase-7 retry tracking) — workers never read or write it. One
per worker, populating
with
,
,
,
,
,
— all values copied verbatim from
. Track
in
order.
Phase 6: Spawn workers (optional cache-primer first, then M in parallel)
Entry: populated; per-worker spawn prompt files exist at
${output_dir}/worker-prompts/worker-N.txt
.
Exit: all M
calls have returned (the parallel spawn block completed).
Phase 6a: Cache primer (gated on )
A parallel batch from cold start cannot share cache (all M requests dispatch simultaneously, none has finished writing). To warm the prefix, spawn a tiny primer first — foreground (background spawns don't share cache with subsequent foreground spawns).
If
plan.run.cache_primer == true
,
has written
${output_dir}/worker-prompts/cache-primer.txt
. Spawn it in its own assistant message:
the file, pass verbatim as
with
subagent_type=c-review:c-review-worker
,
,
description="C review cache primer"
, no
. The script wrote the prefix byte-identical to
through the
block — that byte-identity is what gives the parallel workers their cache hit. The primer trailer contains
, which the worker system prompt treats as a first-class mode and returns exactly
worker-PRIMER abort: cache primer (no analysis performed)
in one text response with zero tool calls. Discard the abort line — Phase 7 ignores it (no
id).
Foreground spawn already serializes — no
needed before Phase 6b. Skip Phase 6a entirely if
plan.run.cache_primer == false
.
Phase 6b: Spawn M real workers in ONE message
STOP — read this before composing the spawn message.
Workers MUST be spawned
foreground (no
field, or
).
"Parallel" here means
one assistant message containing M calls — that already runs them concurrently.
Background spawns are NOT how you parallelize this skill.
Background spawns defeat Phase 6a's primer cache: every worker pays full cache-creation on its first turn (
cache_read_input_tokens=0
), and the primer's ~15 K tokens are wasted M times over. Two real runs (audit logs available) had exactly this symptom — every worker started with
.
Before sending the spawn message, audit your draft: every
call must have
no key. If you wrote
, delete it.
Required spawn shape: emit a single assistant message containing M
tool invocations. Sequential spawning serializes the review and is also wrong, but that failure is loud (timing); the background-spawn failure is silent (cost).
Read: ${output_dir}/worker-prompts/worker-N.txt
- Pass the file contents verbatim as the tool's argument:
| Parameter | Value |
|---|
| |
| (haiku / sonnet / opus) |
| |
| the full text of (no edits) |
| field MUST be omitted, OR set to . Never . See the foreground-spawn warning above. |
The spawn prompt is the single authority. Pass it verbatim — every field is required by the worker's self-check; any deviation triggers
worker-N abort: spawn prompt malformed
.
Anti-patterns to reject:
- Passing (the dominant historical defect — see warning above).
- Hand-typing the spawn prompt instead of reading .
- Inserting Task-related instructions ("first call TaskList", "Assigned task id: <N>"). Workers have no Task tools.
- Editing the rendered prompt before passing it (trimming "redundant" fields, collapsing pass lists).
Phase 7: Wait for Workers and Classify Outcomes
Entry: all M Phase-6
calls have returned.
Exit: every cluster has either succeeded or been retried up to the cap;
${output_dir}/findings-index.txt
written.
The Phase-6
invocations block until each worker returns. Inspect each worker's return text and apply this classifier in order — first match wins:
| # | Match (in return text) | Outcome | Action |
|---|
| 1 | | success | to . |
| 2 | abort: spawn prompt malformed
, abort: pre-work budget exceeded
, or abort: TaskList unavailable
(legacy) | non-retryable orchestrator bug | Stop the run, surface the abort + spawn-prompt path. Re-running the same prompt repeats the failure — pre-work-budget exhaustion always means the worker couldn't pass its self-check, which a retry won't fix. |
| 3 | other | retryable | Mark , set , , increment . |
| 4 | errored or no / token | retryable | Same as #3 (transient worker crash). |
If any non-retryable, stop. Otherwise re-spawn each
retryable with
in one parallel block (cap = 2 attempts per cluster). Replacement workers can safely overwrite partial files — finding IDs are deterministic per prefix.
Sanity-check + write index
For every
cluster, list
${output_dir}/findings/${prefix}-*.md
for each
(from
). A worker that says "wrote N finding files" with N>0 but zero files on disk is
suspicious — treat as retryable (classifier row #4). Zero claimed + zero on disk is fine.
Then build the index — workers wrote per-worker shards under
${output_dir}/findings-index.d/
, prefer those:
bash
# Use `find` rather than a `worker-*.txt` glob: zsh aborts the compound command on no-match
# even with `2>/dev/null`, so an empty findings-index.d would otherwise drop the index file.
# `awk 1` (vs `cat`) normalizes a missing trailing newline on any shard, so a future
# worker that writes shards via Write/printf instead of `ls -1 | sort` can't silently glue
# the last path of one shard onto the first of the next when sort -u dedupes.
if [ -d "${output_dir}/findings-index.d" ]; then
find "${output_dir}/findings-index.d" -maxdepth 1 -type f -name 'worker-*.txt' -exec awk 1 {} + 2>/dev/null \
| sort -u > "${output_dir}/findings-index.txt"
else
find "${output_dir}/findings" -maxdepth 1 -type f -name '*.md' 2>/dev/null | sort > "${output_dir}/findings-index.txt"
fi
collapses duplicates from Phase-7 retries. Empty file is the unambiguous "zero findings" signal. Cross-check the line count against the sum of
worker claims; log mismatches but don't abort.
After task updates and index creation, run
and write
${output_dir}/run-summary.md
with:
- resolved parameters (, , , , language/platform flags, compile-commands status)
- worker outcome table (, , claimed finding count, shard line count, task status, retry/abort state)
- line count and any mismatch against worker claims
- judge status once Phase 8 finishes, or the reason a judge was skipped/failed
If any Phase-5 cluster task is not
, include it prominently in
and the final response. Do not hide a partial run behind a successful report.
Always run Phase 8 even on zero findings — both judges short-circuit on an empty index: dedup-judge writes a minimal no-op
, and fp-judge writes empty
/
so SARIF consumers get a stable artifact set.
Phase 8: Judge Pipeline (sequential, dedup → fp+severity)
Entry: exists.
Exit: dedup-judge and fp-judge have returned;
,
,
, and ideally
are written.
Each judge's full protocol is its system prompt (
agents/c-review-{dedup,fp}-judge.md
); spawn prompts pass only per-run variables. Do
not reference
— those files don't exist.
Spawn sequentially (dedup first, fp-judge sees only merged primaries):
Agent(subagent_type="c-review:c-review-dedup-judge", description="Dedup judge", prompt=f"output_dir: {output_dir}")
Agent(subagent_type="c-review:c-review-fp-judge", description="FP + severity judge", prompt=f"output_dir: {output_dir}\nsarif_generator_path: {sarif_generator_path}")
— resolve to ${C_REVIEW_PLUGIN_ROOT}/scripts/generate_sarif.py
.
Judge failure handling. Same shape as Phase 7's classifier, applied to judge return text:
- → success.
- → non-retryable. Surface the abort line plus
ls -l ${output_dir}/findings-index.txt
; stop.
- No (help message / error / question) → retryable once.
SendMessage(to=<agentId>, …)
rather than a fresh spawn (the agent already paid the protocol-parse cost). Include the explicit finding paths from . If the second try still fails, surface the transcript and continue to Phase 8b.
Phase 8b: SARIF safety net
Entry: fp-judge returned, or the run aborted early.
Exit: ${output_dir}/REPORT.sarif
exists.
bash
test -d "${output_dir}/findings" && python3 "${C_REVIEW_PLUGIN_ROOT}/scripts/generate_sarif.py" "${output_dir}"
Run unconditionally whenever
exists — generator is idempotent (full overwrite), emits
for zero-survivor runs, and handles partial runs (findings without
are emitted as
rather than being silently dropped). Always overwriting protects against the case where fp-judge crashed mid-write and left a corrupt
on disk. Skip only if
doesn't exist (Phase 2 failed). After this phase, update
${output_dir}/run-summary.md
with judge/SARIF status.
Phase 9: Return Report
Entry: Phase 8b complete.
Exit: every item in
Success Criteria verified true;
returned to the caller.
Before composing the response, walk the
Success Criteria checklist below and confirm each bullet against on-disk artifacts (
for cluster tasks,
/
for the files). If any criterion fails, surface the failure prominently in the response — do
not hide a partial run behind a successful report.
Then
Read ${output_dir}/REPORT.md
and return its content to the caller. Append an Artifacts list pointing at
,
,
,
,
,
,
.
Finding file frontmatter — three stages
Authoritative schema:
agents/c-review-worker.md
("Finding File Format"). Three-stage write:
- Worker — base fields (, , , , , , ) + seven body sections.
- Dedup-judge — adds on duplicates, or + on primaries that absorbed.
- FP+Severity judge — adds + on every primary; on survivors (/) also adds , , , .
Bug classes / clusters
Authoritative:
prompts/clusters/manifest.json
. 47 always-on bug classes, up to 64 with all conditional clusters enabled.
is fully consolidated (its sub-prompts are not re-read at runtime).
Success Criteria
The phase exits already cover most of this; the orchestrator-visible end-state is:
- Every Phase-5 cluster task is (verify via ).
${output_dir}/run-summary.md
exists and records resolved scope/context, compile-commands probe result, worker claims vs index count, task status, and judge/SARIF status.
- Every primary finding (no ) has + ; every survivor (/) also has , , , .
- exists, severity-filtered per .
- exists (Phase 8b safety net guarantees this).