Bug Review v2
Multi-pass PR review agent with 5 parallel review passes, majority voting, independent Opus validation, and resolution rate learning. Posts inline PR comments and optionally generates autofix commits. Tracks whether findings get resolved at merge time and uses that signal to improve future reviews.
When to Apply
- User asks to review a pull request for bugs or correctness issues
- User runs
/bug-review <PR-number-or-URL>
- User runs to classify resolutions after merge
- User runs for resolution rate statistics
- User asks for code review focused on logic errors, edge cases, or security
- User wants to find bugs in a diff or set of changes
Setup
On first run, verify:
- CLI is installed and authenticated ()
- Current directory is a git repo with a GitHub remote
- is installed (for JSON processing)
- is installed (for resolution rate calculations; pre-installed on most systems)
Read config.json for configuration (passes, vote threshold, models, category weights).
Workflow Overview
/bug-review <PR>
|
v
Fetch PR context + gather-context.sh
|
v
5 parallel passes (shuffled diffs, Sonnet) --> Aggregate & vote (3/5 majority)
|
v
Independent Opus validator --> Dedup --> Present findings --> Post + store
|
(later, after merge)
v
/bug-review:resolve <PR> --> Classify resolutions --> Update category weights
Command: /bug-review <PR>
Step 1: Parse Input & Fetch Context
- Parse the PR identifier (number, URL, or branch name)
- Check cache: Look for
${CLAUDE_PLUGIN_DATA}/bug-review/cache/pr-{N}/
— if cache exists for the same head commit, offer to resume from the last checkpoint
- Run
scripts/fetch-pr.sh <pr-identifier>
to get PR diff + metadata as JSON
- Save the diff to a temp file for shuffling
- Run
scripts/gather-context.sh <changed-files-json>
to get prioritized context (callers, types, tests, repo rules)
- Read from repo root if it exists
- Save checkpoint: Write context to
${CLAUDE_PLUGIN_DATA}/bug-review/cache/pr-{N}/context.json
Step 2: Run 5 Parallel Review Passes
For each pass (1-5), prepare a shuffled diff:
bash
scripts/shuffle-diff.sh <pass-number> < pr.diff > pass-<N>.diff
Launch 5 Agent subprocesses in parallel. Read review-passes.md for the exact prompt for each pass.
- Pass 1: Logic & Edge Cases (seed 1)
- Pass 2: Security & Data Integrity (seed 2)
- Pass 3: Error Handling & API Contracts (seed 3)
- Pass 4: Concurrency & State (seed 4)
- Pass 5: Data Flow & Contracts (seed 5)
Use
from config.json
(default:
).
Each agent returns a JSON array of findings.
Save checkpoint: Write all pass results to
${CLAUDE_PLUGIN_DATA}/bug-review/cache/pr-{N}/pass-results.json
Step 3: Aggregate & Vote
- Collect findings from all 5 passes
- Group findings by similarity: same file + line within +/-5 + same or related category
- Count votes per group
- Keep only findings with 3+ votes (majority of 5, configurable via )
- Apply category weights from config.json:
final_score = votes × severity_weight × category_weight
- Categories with weight < 0.1 are suppressed entirely
- Rank by final_score descending
If only 1-2 passes found bugs and the others found none, present findings but note they lack consensus.
Save checkpoint: Write voted findings to cache.
Step 4: Independent Validation (Opus)
Launch a
separate Agent using
from config.json (default:
).
This agent has NOT seen the review passes. It receives only the voted findings and the original code. Read the Validator section in review-passes.md for the prompt.
For each finding, the validator outputs:
{id, verdict: "KEEP"|"DISCARD", confidence, reasoning}
Remove DISCARDed findings. Multiply each finding's score by the validator's confidence.
Compute each finding's final
field:
confidence = (votes / total_passes) × validator_confidence
Findings with confidence < 0.5 are shown with a "low confidence" warning.
Save checkpoint: Write validated findings to cache.
Step 5: Dedup Against Prior Reviews
Run
scripts/dedup.sh <pr-number>
to get existing
comments.
Match by location proximity (file + line within +/-10) and category — not text similarity.
Step 6: Present Findings to User
Display a table:
| # | Severity | Confidence | File | Line | Title | Votes |
|---|
For each finding, show full description, trigger scenario, suggested fix, and validator reasoning.
Ask the user (using AskUserQuestion with multiSelect):
- Which findings to post as PR comments (default: all)
- Which findings to autofix (default: none)
If no findings survived voting + validation: "No bugs found across 5 review passes. The changes look clean."
Step 7a: Post PR Review
Write approved findings to a temporary JSON file, then run:
bash
scripts/post-review.sh <pr-number> <findings-json-file>
Then persist findings for resolution tracking:
bash
scripts/store-findings.sh <pr-number> <findings-json-file> <head-commit-sha>
Step 7b: Autofix (User-Selected Findings)
For each finding selected for autofix:
- Read the file and understand surrounding context
- Generate a minimal fix (smallest possible change)
- Apply the fix using the Edit tool
- Scope check: Run — verify only the finding's file was modified and diff is under 20 lines. If exceeded, revert and warn.
- Run existing tests if available (, , , etc.)
- If tests pass: commit with
fix: {title} [bug-review]
- If tests fail: revert the fix () and report to user
- After all fixes: push to the PR branch
Safety: one commit per fix, run tests between fixes, never force-push, scope-validate every fix.
Command: /bug-review:resolve <PR>
Run after a PR is merged to classify whether findings were resolved.
- Run
scripts/classify-resolutions.sh <pr-number>
- Loads stored findings from
${CLAUDE_PLUGIN_DATA}/bug-review/findings/pr-{N}.json
- Checks if PR is merged
- For each finding: diffs code between review commit and merge commit
- Classifies each as RESOLVED, UNRESOLVED, or INCONCLUSIVE
- Updates the stored findings file with resolution data
- Display resolution summary to user
- If enough data accumulated (10+ findings, 3+ PRs): run
scripts/update-weights.sh
to adjust category weights
Command: /bug-review:report
Display resolution rate statistics across all tracked PRs.
Run
scripts/resolution-report.sh
which outputs:
- Overall resolution rate
- Resolution rate by severity
- Resolution rate by category (sorted worst-first to highlight noisy categories)
- Suppressed categories (weight < 0.1)
Repo-Specific Rules (.bug-review.md)
Teams can create
at their repo root:
markdown
## Focus Areas
- Pay special attention to authentication flows
- Check all database queries for SQL injection
## Ignore
- Don't flag issues in generated files (*.generated.ts)
- Ignore style-only concerns
## Invariants
- All API endpoints must check req.user before accessing user data
- Database migrations must be reversible
## Severity Overrides
- Treat any auth bypass as CRITICAL regardless of category default
How to Use
Read workflow.md for detailed step-by-step with error handling.
Read review-passes.md for all 5 review pass prompts and the validator.
Read categories.md for bug categories and learned weights.
Related Skills
- Consider creating a Runbook skill for investigating bugs found by this review
- Consider creating a CI/CD skill to run this review automatically on PR open