/cheat-bump — Rubric / Bucket Upgrade

Two modes:

Mode	Trigger	What it does	Validation Strength
Full rubric bump	`--propose "<new formula>"`	Modify formula / dimensions / weights	Mandatory 5-step process + cross-model audit
--bucket-only recalibration	`--bucket-only`	Only re-derive bucket boundaries	Automatic data derivation, no audit

Full rubric bump strictly follows the 5 steps in shared-references/bump-validation-protocol.md. Bucket-only follows the lightweight path — see Phase B below.

Overview

Entry: User triggers /cheat-bump
  ↓
[Phase A0: Detect Call Mode]
  ↓
  ├─ --bucket-only  →  [Phase B: Lightweight Bucket Recalibration]
  └─ --propose      →  [Phase 0~6: Full Rubric Bump]

Phase A0: Call Mode Diversion (Do First)

Read user parameters:

Contains
```
--bucket-only
```
→ proceed to Phase B (lightweight recalibration)
Contains
```
--propose "<...>"
```
→ proceed to Phase 0~8 (full rubric bump)
Neither → Ask user: "What do you want to do? 1) Adjust rubric formula / add/remove dimensions → --propose; 2) Only re-derive bucket boundaries → --bucket-only"

If user says "I think ER is too low and want to adjust" → it's the

--propose

path. If user says "My account has grown, buckets are no longer accurate" → it's the

--bucket-only

path. The two paths cannot be mixed — only one type of operation per action.

Full Rubric Bump Workflow

[User: upgrade rubric --propose "ER×1.5→2.0, remove NA, add MS"]
  ↓
[Phase 0: Pre-threshold Check]
  ↓
[Phase 1: Write Complete New Formula Equation]
  ↓
[Phase 2: Full Re-scoring of Calibration Pool]
  ↓
[Phase 3: Calculate Ranking Consistency]
  ↓
[Phase 4: Mandatory Cross-Model Independent Audit]
  ↓
[Phase 5: Implementation + Cleanup Pass]
  ↓
[Phase 6: Append Re-scored Line to Bottom of All Calibration Sample Prediction Files]

Constants

READINESS_HEURISTIC —
- Default Reference: Calibration pool ≥5 samples + at least 1 cross-sample observation supported by ≥3 samples
- But Claude can propose bump (even with few samples) if observation signals are exceptionally strong:
  - N=3 but there's a strong counterexample that completely overturns current rubric assumptions (e.g., composite 8.5 vs actual performance 50k, a ≥3x deviation)
  - A single post shows an extreme phenomenon (e.g., a single meme with ≥2000 likes in the comment section)
- Claude can also reject bump (even with sufficient samples) if evidence is weak:
  - N=10 but observations are low-confidence fragmented patterns with no clear direction
  - User reviews contain numerous non-serious judgments like "just glanced at it"
- Must be stated in prediction header or cheat-bump output: Whether this proposal is default-aligned or judgment-driven, providing users with a basis for review
THRESHOLD = 0.8 — Consistency threshold between new ranking and actual performance ranking (4/5). This is hard-coded — statistical rigidity for bump validation
CROSS_MODEL_AUDIT = true — Call external LLM for independent audit. false is only used for offline scenarios
REQUIRE_CONFIRM = true — Require explicit user confirmation "yes, bump" before implementation

Inputs

Required	Source
`--propose` text	User parameters; ask if missing
`rubric_notes.md`	User project root
All `predictions/*.md` files	Calibration pool data
`.cheat-state.json`	State file

Workflow

Phase 0: Pre-threshold Check

Check item by item according to the "When to Prohibit" section in bump-validation-protocol.md:

Check	Failure Handling
Total calibration pool samples vs observation strength	Claude's judgment — follow READINESS_HEURISTIC: default ≥5 samples but allow exceptions (strong counterexamples / strong memes). If default is not met, Claude must explicitly explain why bump is still proposed ("Although only N=3 samples, entry X has composite Y vs actual performance Z, a W-fold deviation"), allowing user review
Number of new calibrations since last bump vs observation maturity	Claude's judgment — default suggests ≥3 new samples, but if 3 consecutive samples all provide strong evidence pointing to the same direction → no need to wait
`in_progress_session == null`	Reject: "You have an in-progress prediction that is not completed. Finish that process first or clear the state"
Trigger conditions are met (systematic deviation / new cross-sample observations / sufficient evidence for new dimensions)	Warn but do not block — ask user why they want to bump now

Pass → proceed to Phase 1.

Phase 1: Write Complete New Formula Equation

Do not only accept brief user descriptions. Expand it into a complete equation:

Current: v2  composite = (ER×1.5 + SR×1.5 + HP×1.5 + QL + NA + AB + SAT) / 8.5 × 2.0
Proposed: v2.1  composite = (ER×2.0 + HP×1.5 + MS×1.5 + QL + SR + TS + SAT) / 9.0 × 2.0

Summary of changes:
- ER ×1.5 → ×2.0 (increased)
- SR ×1.5 → ×1.0 (decreased)
- Added MS ×1.5 (Memetic Shareability)
- Added TS ×1.0 (Topic Shareability)
- Removed NA (overlaps with HP)
- Removed AB (replaced by TS)
- Normalization constant 8.5 → 9.0
- Total number of formula dimensions: 7 → 7 (net change 0)

If user's proposal is vague (e.g., "Increase ER weight a bit") → ask for specific values, do not guess.

Phase 2: Full Re-scoring of Calibration Pool (Mandatory Blind Sub-agent)

Glob all files in

predictions/*.md

with complete review sections → calibration pool.

Bump is the highest-risk action of the tool — all re-scoring must go through cheat-score-blind sub-agent. Inline re-scoring means the main Claude has already seen actual performance, making rank consistency overfitting rather than real signals.

Mandatory Constraints

No self-scored fallback accepted —
```
/cheat-predict
```
has a
```
--skip-blind
```
flag, but
```
/cheat-bump
```
does not. If Task tool is unavailable → abort bump, report to user "Resolve Task tool issues first before bumping"
No "I only recalculate composite without re-scoring dimensions" — even if the new formula only adjusts weights without adding dimensions, all dimensions of each prediction must be re-reviewed by the sub-agent. Reason: Old dimension scores may already be contaminated; weight changes cannot guarantee old dimensions are still valid

For Each Prediction:

Parse the prediction file to get the corresponding
```
scripts/<id>.md
```
path (from the
```
Script Path
```
header field)
Verify the script file exists + hash matches the header
```
Script Hash
```
; if not → warn (script has been modified) but still spawn sub-agent

Spawn cheat-score-blind sub-agent via Task tool:

Spawn cheat-score-blind sub-agent.

Input:
  script_path: <Script Path from prediction header>
  rubric_notes_path: rubric_notes.md
  sidecar_path: .cheat-cache/bump-rescores/<prediction-id>.json

Task: Score the script according to the current formula in rubric_notes (already updated to new version vN+1).
Return strict JSON. Write to sidecar file for batch reading by the main bump process.

Do not read state file / predictions/ / videos/ or any other files.
Do not ask user — you have no user.
Do not read this prediction file itself — only look at script + rubric.

Wait for sub-agent to complete → read sidecar JSON → main process calculates composite using new formula
Write "re-score table" to
```
.cheat-cache/bump-rescores.json
```
(summary). Mark each entry with
blind: true
— during Phase 5 cleanup of bump, write this field along with the new score to the
```
Re-scored under v<N+1>
```
line in the prediction file

Honest Labeling of Contamination

Even with sub-agent, two types of residual contamination must be honestly labeled in the bump report:

Type	Source	Label Field
Model prior contamination	Sub-agent is still Claude, sharing RLHF	`model_prior_warning: true` (default true, cannot be turned off)
User's own rubric design bias	rubric_notes.md is written by user, naturally fitting their own content	`rubric_self_designed: true` (default true, cannot be turned off)

These two remind users that channel C (cross-model audit) is indispensable. The end of the bump report must state: "The above rank consistency is within channel A. Final decision must wait for channel C audit approval."

Failure Modes

Symptom	Handling
Script file for a prediction is missing	Sub-agent skips this entry, main process summarizes "N entries excluded due to missing script". If remaining valid pool < MIN_SAMPLES → abort bump
Sub-agent returns `refusal != null`	Resend Task up to 3 times; if still failed → mark this entry as `rescore_failed: true` and exclude from calibration pool
Task tool is completely unavailable	Abort bump, prompt user "Task tool is a hard dependency for bump. If in offline environment, run `/cheat-bump --bucket-only` to use lightweight branch"
Sub-agent output contains contamination_signal	Mark as `suspicious: true` but do not exclude — list these suspicious entries at the end of bump report for user review

Phase 3: Calculate Ranking Consistency

For each sample:
  new_composite_rank: Rank sorted by new formula
  actual_plays_rank: Rank sorted by actual plays
  delta: |new_rank - actual_rank|

Output comparison table:
| Sample | composite (v2) | composite (v2.1) | rank (new) | actual | rank (actual) | delta |
|---|---|---|---|---|---|---|
| Hamster | 9.41 | 9.55 | 1 | 1.248M | 1 | 0 |
| Stop Expecting | 8.24 | 9.11 | 2 | 711K | 2 | 0 |
| Boss Nonsense | 7.65 | 8.11 | 4 | 396K | 3 | 1 |
| Job Hunting Paradox | 8.47 | 7.56 | 5 | 168K | 4 | 1 |
| Who Asked You | 8.24 | 7.00 | 6 | 117K | 5 | 1 |

Ranking consistency: 4/5 with |delta| ≤ 1
Pairwise no-regression: All pairs correctly ranked by old formula are not reversed under new formula ✓

Judgment:

Ranking consistency < THRESHOLD (default 0.8) → Local rejection, explicitly report failure before proceeding to Phase 4
Pairwise regression occurs → Local rejection

THRESHOLD

is hard-coded in the protocol — temporary lowering is not allowed (that itself is another meta-decision requiring bump).

Phase 4: Mandatory Cross-Model Independent Audit (Mandatory, except escape hatch)

CROSS_MODEL_AUDIT=true

(default):

Call

mcp__llm-chat__chat

prompt:
You are an independent reviewer. Below is a rubric formula that a content creator is preparing to upgrade.
Please independently judge two things:
1. Ranking consistency: Is the ranking of samples by the new formula consistent with the ranking of actual performance in ≥80% of samples?
2. Explanatory power: Does the new formula better explain the actual performance distribution of the calibration pool compared to the old formula?

Data:
Old formula: (ER×1.5 + SR×1.5 + HP×1.5 + QL + NA + AB + SAT) / 8.5 × 2.0
New formula: (ER×2.0 + HP×1.5 + MS×1.5 + QL + SR + TS + SAT) / 9.0 × 2.0

Calibration pool:
[Full JSON of re-score table from Phase 2]

Ranking comparison:
[Full JSON of table from Phase 3]

Output format:
- Judgment: PASS or REJECT
- Reason: ≥100 words
- Key risks: [List potential issues of new formula if any]

Receive external LLM response → parse judgment.

Judgment logic:

Local PASS + External PASS → Pass, proceed to Phase 5
Local PASS + External REJECT → Treat as REJECT. Conflict means at least one party's interpretation is unstable
Local REJECT → Already terminated in Phase 3
mcp__llm-chat__chat unavailable → Gracefully degrade to
```
CROSS_MODEL_AUDIT=false
```
, mark
```
last_bump_self_audited: true
```
in state file

CROSS_MODEL_AUDIT=false

Only rely on local judgment
Continuously mark in state file, cheat-status continuously prompts user "This bump was self-audited, it is recommended to configure mcp__llm-chat__chat"

Phase 5: Implementation + Cleanup Pass

After passing audit, REQUIRE_CONFIRM=true → Ask user: "New formula passed local and external audits. Final confirmation: Execute bump implementation? This will modify rubric_notes.md + rubric-memo.md and delete several absorbed observations. Only execute if you answer 'yes, bump'."

After user confirmation:

5a. Update

rubric_notes.md

(Only use general language, no video names / actual performance data)

Update top metadata:

```
**Current Version**: vN+1
```
```
**Last bumped at**: <ISO 8601>
```

**Upgrade memos**: See [rubric-memo.md](rubric-memo.md)

(pointer, do not copy memo content)

Add a line to version quick reference table (only include version number + formula signature, no evidence samples)
Update "Current Scoring Dimensions" section (remove NA / AB, add MS / TS)
Derived evidence section if new dimensions need anchor explanation → Use general language:
- ✅ Allowed: "Derived evidence: High abstract density samples → CC=1 → Low reach"
- ❌ Prohibited: "Derived evidence: 'Stop Expecting' CC=1 → Actual performance 137K" (video name + actual performance number)
- If prohibited pattern is hit → move this section to "Derived Evidence" sub-section in rubric-memo.md, replace with general language in place

5b. Write Memo to

rubric-memo.md

(Append mode, do not overwrite history)

Append a memo section to the end of the file according to Step 5 in bump-validation-protocol.md + templates/rubric-memo.template.md:

Trigger observation (include real observation ID)
Evidence data (Full re-score table + ranking comparison of calibration pool, include real video names + actual performance)
Derived evidence (Include real sample names + actual performance)
Diagnosis
New formula
Cross-model audit conclusion reference (include model name + judgment + reason excerpt)
Known limitations

Never overwrite existing content in rubric-memo.md — bump memos accumulate in chronological order.

5c. Cleanup Pass (According to "Mandatory Cleanup Pass Timing" in observation-lifecycle.md)

Execute within

rubric_notes.md

(Do not modify rubric-memo.md):

Observations that have been absorbed into new dimensions → Delete (e.g., Observation E absorbed into MS → delete Observation E)
Observations overturned by new data → Delete
Unresolved observations → Move to "Pending Validation Hypotheses" section of new version
Validated "rules" → Move to "Rule Precipitation Area"

5d. Organize + Self-Check

Re-read entire
```
rubric_notes.md
```
to ensure readers can understand current rules within 60 seconds → trigger additional cleanup if exceeding 600 lines
Self-check leak guard: Run
```
grep -E '\\d+\\s*[wWmMkK]|plays|actual performance|actual'
```
on
```
rubric_notes.md
```
→ if any hits → abort bump + rollback, prompt user "rubric_notes.md contains prohibited content (actual performance / play counts)". These contents should be in rubric-memo.md, not rubric_notes.md

Phase 6: Batch Update of Calibration Samples

For each calibration sample's prediction file, append to the bottom (do not modify prediction section or review section):

markdown


---
**Re-scored under v2.1 on 2026-05-04**: composite=8.24 → 9.11 (blind: true)
(Full re-calculated during rubric bump, independently scored by cheat-score-blind sub-agent; see v2 → v2.1 upgrade memo in rubric-memo.md)

The

blind: true

field is required — tell future readers of this record "This is channel B isolated scoring, not self-scored by main Claude". If a prediction was excluded in Phase 2 due to sub-agent failure → no Re-scored line will be added (keep as is).

Use Edit tool to match the end of each file.

Phase 7: Update State File

json

{
  "rubric_version": "v2.1",
  "last_bump_at": "<ISO timestamp>",
  "last_bump_self_audited": false,
  "consecutive_directional_errors": [],
  "calibration_samples_at_last_bump": <current value>
}

Clear

consecutive_directional_errors

— new rubric starts counting again.

Phase 8: Console Report

✅ Rubric upgraded from v2 → v2.1

Changes:
- ER ×1.5 → ×2.0
- SR ×1.5 → ×1.0
- Added MS / TS
- Removed NA / AB

Calibration pool re-scoring: 5/5 passed ranking check (4/5 consistent + 0 pairwise regression)
Cross-model audit: ✅ PASS
Cleanup pass: Deleted Observations D and E (absorbed into QL redefinition and MS dimension)

Scoring will use v2.1 formula starting from next prediction.
All historical prediction files have been appended with Re-scored marks.

Phase B: Bucket-Only Recalibration (Lightweight Branch)

/cheat-bump --bucket-only [--scheme ratio|absolute|percentile]

Essential difference from full bump: Bucket boundaries are not part of the rules, they are data-derived quantities. Re-deriving them does not require cross-model audit — the derivation algorithm is deterministic with no "judgment" component.

B1: Select Algorithm (Automatically Derived Based on Available Sample Count, Scheme Not Stored in State)

Algorithm	Applicable	Boundary Derivation Method
`ratio` (default for N=1-4)	Small sample size	Median of last 1 / last 3 samples × {0.3 / 1 / 3 / 10 / 30}
`absolute` (default for N=5-9)	Medium sample size	Median of entire calibration pool × {0.3 / 1 / 3 / 10 / 30}, fixed boundaries
`percentile` (default for N≥10)	Large sample size	Actual performance percentiles of calibration pool {30 / 60 / 85 / 95 / 100}

The

--scheme

parameter allows users to explicitly override default:

```
--scheme ratio
```
forces use of ratio (even if N≥5)
```
--scheme absolute
```
forces use of absolute
```
--scheme percentile
```
forces use of percentile (requires N≥3, otherwise error)

--scheme

is not specified → automatically derived according to the table above.

Old design had
bucket_scheme
state field — removed in v1.1. All skills derive algorithm in real-time based on calibration_samples, no need to persist "which one is currently used". This avoids state inconsistency issues like "forgot to sync after switching scheme".

B2: Derive New Boundaries

Read all samples with

actual_plays

predictions/*.md

Ratio Mode:

baseline = median(last 3 actual_plays)
buckets = {
  "Decline": (-inf, baseline * 0.3),
  "Stable": (baseline * 0.3, baseline * 1),
  "Hit": (baseline * 1, baseline * 3),
  "Small Viral": (baseline * 3, baseline * 10),
  "Big Viral": (baseline * 10, +inf),
}

Absolute Mode:

baseline = median(all calibration pool actual_plays)
buckets = {
  "Bottom": (-inf, baseline * 0.3),
  "Base Audience": (baseline * 0.3, baseline * 1),
  "Hit": (baseline * 1, baseline * 3),
  "Viral": (baseline * 3, baseline * 10),
  "Phenomenal": (baseline * 10, +inf),
}

Percentile Mode:

sorted_plays = sorted(all calibration pool actual_plays)
buckets = {
  "Bottom":   ≤ p30,
  "Base Audience": p30 - p60,
  "Hit":   p60 - p85,
  "Small Viral":   p85 - p95,
  "Big Viral":   ≥ p95,
}

B3: Report Changes + User Confirmation

Current bucket scheme: ratio
Proposed scheme: absolute
Baseline: 42K median (based on 5 calibration samples)

New boundaries:
- Bottom:   < 12.6K
- Base Audience: 12.6K - 42K
- Hit:   42K - 126K
- Viral:   126K - 420K
- Phenomenal: > 420K

Derivation explanation:
- 5 actual performances: 15K / 38K / 42K / 56K / 180K
- Median is 42K, new buckets derived by ×{0.3, 1, 3, 10}

Confirm application? (yes / no)

B4: Implementation

After user confirmation:

Edit the "Bucket Scheme" section in
```
rubric_notes.md
```
, replace with new table
Update
```
baseline_plays
```
field in
```
.cheat-state.json
```
(bucket scheme is not persisted — derived in real-time during next cheat-predict)

Append a change record to the top of the bucket section in

rubric_notes.md

v2 buckets recalibrated on YYYY-MM-DD: scheme=absolute, baseline=42K (based on N=10 samples)

Do not modify any prediction files — bucket tags in historical predictions remain as they are (judgments made under the scheme at the time of writing the sample)

B5: Impact on Future Predictions

Starting from the next

/cheat-predict

, new buckets will be derived. Bucket tags in historical prediction files will not be recalculated — buckets are semantic judgments made at prediction time, post-hoc rewriting will destroy blindness.

What Phase B Does Not Do

No re-calculation of composite (formula remains unchanged)
No re-review of observation sections (rubric remains unchanged)
No cross-model audit (deterministic derivation requires no judgment)
No strict sample count threshold (judged by Claude according to READINESS_HEURISTIC; ratio mode can run with N=1)

Key Rules

5 steps cannot be skipped (only for full rubric bump). Reject any request to "run a simplified version first"
THRESHOLD is hard-coded (only for full rubric bump). Dynamic adjustment is not allowed
Cross-model audit is default (only for full rubric bump). Turning off audit requires explicit marking in state file
Cleanup pass is part of bump (only for full rubric bump). Bump cannot be completed without cleaning observation sections
REQUIRE_CONFIRM (both modes). Must get explicit user confirmation "yes, bump" or "yes, recalibrate" before final implementation
Bucket recalibration does not modify historical predictions. Buckets are prediction-time semantics, post-hoc rewriting destroys blindness

Refusals

"Skip calibration pool re-scoring, directly change formula" → Reject. Rule #2
"Skip cheat-score-blind sub-agent, main Claude can re-score directly" → Reject. Bump does not accept any self-scored fallback — if sub-agent is unavailable → abort bump, do not accept "self-audit"
"Skip external LLM audit" → Only allowed if
```
CROSS_MODEL_AUDIT=false
```
is explicitly set
"Adjust THRESHOLD to 3/5 this time to let it pass" → Reject. Changing THRESHOLD is a meta-level bump
"Keep all old observations as history" → Violates Rule #3
"Bump first, do cleanup next time" → Reject. Cleanup is part of bump
"Only recalculate composite without re-scoring dimensions" → Reject. New weights × old dimensions are still old contamination. Each dimension must be re-reviewed by sub-agent
"Write full memo into top of rubric_notes.md for easy reading" → Reject. rubric_notes.md is whitelisted for blind sub-agent — containing video names / actual performance → leaks through whitelist. Memo is written in rubric-memo.md (outside whitelist), rubric_notes.md only contains formula + general language dimension definitions + pointer
"Keep real video names in derived evidence section to make rubric more specific" → Reject. Must use general language in rubric_notes.md ("high abstract density samples"); derived evidence with video names is written in rubric-memo.md

Integration

Upstream:
```
/cheat-retro
```
detects ≥3 same-direction deviations → propose running
```
/cheat-bump
```
Dependencies:
```
mcp__llm-chat__chat
```
(if configured) + Task tool (spawn cheat-score-blind)
Modifications:
- ```
rubric_notes.md
```
  (structural update, never write real video names / actual performance)
- ```
rubric-memo.md
```
  (new — append full memo, including evidence + derived evidence)
- All
```
predictions/*.md
```
  (append Re-scored line, do not modify prediction section)
- ```
.cheat-state.json
```
Downstream: Next
```
/cheat-predict
```
automatically uses new rubric_version for scoring

cheat-bump

NPX Install

Tags

SKILL.md Content (Chinese)

/cheat-bump — Rubric / Bucket Upgrade

Overview

Phase A0: Call Mode Diversion (Do First)

Full Rubric Bump Workflow

Constants

Inputs

Workflow

Phase 0: Pre-threshold Check

Phase 1: Write Complete New Formula Equation

Phase 2: Full Re-scoring of Calibration Pool (Mandatory Blind Sub-agent)

Mandatory Constraints

For Each Prediction:

Honest Labeling of Contamination

Failure Modes

Phase 3: Calculate Ranking Consistency

Phase 4: Mandatory Cross-Model Independent Audit (Mandatory, except escape hatch)

Phase 5: Implementation + Cleanup Pass

5a. Update
`rubric_notes.md`
(Only use general language, no video names / actual performance data)

5b. Write Memo to
`rubric-memo.md`
(Append mode, do not overwrite history)

5c. Cleanup Pass (According to "Mandatory Cleanup Pass Timing" in observation-lifecycle.md)

5d. Organize + Self-Check

Phase 6: Batch Update of Calibration Samples

Phase 7: Update State File

Phase 8: Console Report

Phase B: Bucket-Only Recalibration (Lightweight Branch)

B1: Select Algorithm (Automatically Derived Based on Available Sample Count, Scheme Not Stored in State)

B2: Derive New Boundaries

B3: Report Changes + User Confirmation

B4: Implementation

B5: Impact on Future Predictions

What Phase B Does Not Do

Key Rules

Refusals

Integration

cheat-bump

NPX Install

Tags

SKILL.md Content (Chinese)

/cheat-bump — Rubric / Bucket Upgrade

Overview

Phase A0: Call Mode Diversion (Do First)

Full Rubric Bump Workflow

Constants

Inputs

Workflow

Phase 0: Pre-threshold Check

Phase 1: Write Complete New Formula Equation

Phase 2: Full Re-scoring of Calibration Pool (Mandatory Blind Sub-agent)

Mandatory Constraints

For Each Prediction:

Honest Labeling of Contamination

Failure Modes

Phase 3: Calculate Ranking Consistency

Phase 4: Mandatory Cross-Model Independent Audit (Mandatory, except escape hatch)

Phase 5: Implementation + Cleanup Pass

5a. Update rubric_notes.md (Only use general language, no video names / actual performance data)

5b. Write Memo to rubric-memo.md (Append mode, do not overwrite history)

5c. Cleanup Pass (According to "Mandatory Cleanup Pass Timing" in observation-lifecycle.md)

5d. Organize + Self-Check

Phase 6: Batch Update of Calibration Samples

Phase 7: Update State File

Phase 8: Console Report

Phase B: Bucket-Only Recalibration (Lightweight Branch)

B1: Select Algorithm (Automatically Derived Based on Available Sample Count, Scheme Not Stored in State)

B2: Derive New Boundaries

B3: Report Changes + User Confirmation

B4: Implementation

B5: Impact on Future Predictions

What Phase B Does Not Do

Key Rules

Refusals

Integration

5a. Update
`rubric_notes.md`
(Only use general language, no video names / actual performance data)

5b. Write Memo to
`rubric-memo.md`
(Append mode, do not overwrite history)