MANDATORY PREPARATION
Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the feedback-loops reference in the agent-workflow skill for evaluation patterns and self-correction strategies.
Set up feedback loops that make workflows self-correcting and continuously improving. Iteration transforms one-shot gambles into convergent, reliable systems.
Feedback Loop Design
Step 1: Define Quality Criteria
What does "good output" look like? Score dimensions:
| Dimension | Weight | Threshold | Measurement |
|---|
| Accuracy | 0.4 | ≥ 0.8 | Factual correctness check |
| Completeness | 0.3 | ≥ 0.7 | Required fields present |
| Format | 0.2 | ≥ 0.9 | Schema compliance |
| Tone | 0.1 | ≥ 0.6 | Appropriate for audience |
Step 2: Choose Evaluator Type
Match evaluator to requirements:
- Rule-based: Schema validation, field presence, value ranges (fast, free)
- Self-check: Same model evaluates own output (fast, cheap, less reliable)
- Cross-model: Different model evaluates (slower, more reliable)
- Human-in-the-loop: Human review (slowest, most reliable, doesn't scale)
- Hybrid: Rules first, then model check for what rules can't catch
Step 3: Design the Correction Loop
text
generate(input) → evaluate(output) → score
if score ≥ threshold → return output
if score < threshold AND attempts < max →
enrich input with evaluator feedback
generate again (with feedback)
if attempts ≥ max → fallback or escalate
Critical: The retry input MUST be different from the original. Include:
- The evaluator's specific feedback
- What was wrong and why
- A suggestion for how to fix it
Step 4: Set Up Regression Detection
When changing prompts, models, or tools:
- Run golden test set with OLD config → baseline scores
- Run golden test set with NEW config → new scores
- Compare: improvement ≥ 5% → accept; regression ≥ 5% → reject
Step 5: Continuous Monitoring
For production workflows:
- Sample 1-5% of outputs for automated evaluation
- Track quality scores over time
- Alert on downward trends
- A/B test changes before full rollout
Iteration Checklist
Recommended Next Step
After setting up feedback loops, run
{{command_prefix}}evaluate
to validate the loop with real scenarios, then
for final polish.
NEVER:
- Retry with the exact same input (definition of insanity)
- Use the same weak model to both generate and evaluate
- Skip the max attempts limit (infinite loops are real)
- Deploy changes without regression testing against golden set
- Monitor only errors — track quality scores over time