iterate

Original：🇺🇸 English

Translated

Use when the workflow needs to self-correct, improve over time, or establish feedback loops and evaluation cycles.

5installs

Sourcesharpdeveye/maestro

Added on2026-04-11

NPX Install

npx skill4agent add sharpdeveye/maestro iterate

SKILL.md Content

View Translation Comparison →

MANDATORY PREPARATION

Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.

Consult the feedback-loops reference in the agent-workflow skill for evaluation patterns and self-correction strategies.

Set up feedback loops that make workflows self-correcting and continuously improving. Iteration transforms one-shot gambles into convergent, reliable systems.

Feedback Loop Design

Step 1: Define Quality Criteria

What does "good output" look like? Score dimensions:

Dimension	Weight	Threshold	Measurement
Accuracy	0.4	≥ 0.8	Factual correctness check
Completeness	0.3	≥ 0.7	Required fields present
Format	0.2	≥ 0.9	Schema compliance
Tone	0.1	≥ 0.6	Appropriate for audience

Step 2: Choose Evaluator Type

Match evaluator to requirements:

Rule-based: Schema validation, field presence, value ranges (fast, free)
Self-check: Same model evaluates own output (fast, cheap, less reliable)
Cross-model: Different model evaluates (slower, more reliable)
Human-in-the-loop: Human review (slowest, most reliable, doesn't scale)
Hybrid: Rules first, then model check for what rules can't catch

Step 3: Design the Correction Loop

text

generate(input) → evaluate(output) → score
  if score ≥ threshold → return output
  if score < threshold AND attempts < max →
    enrich input with evaluator feedback
    generate again (with feedback)
  if attempts ≥ max → fallback or escalate

Critical: The retry input MUST be different from the original. Include:

The evaluator's specific feedback
What was wrong and why
A suggestion for how to fix it

Step 4: Set Up Regression Detection

When changing prompts, models, or tools:

Run golden test set with OLD config → baseline scores
Run golden test set with NEW config → new scores
Compare: improvement ≥ 5% → accept; regression ≥ 5% → reject

Step 5: Continuous Monitoring

For production workflows:

Sample 1-5% of outputs for automated evaluation
Track quality scores over time
Alert on downward trends
A/B test changes before full rollout

Iteration Checklist

Quality criteria defined with weights and thresholds
Evaluator selected and configured
Correction loop has max attempts limit
Feedback is injected into retries (not identical retry)
Golden test set exists with ≥ 10 cases
Regression detection configured for changes
Production monitoring in place

Recommended Next Step

After setting up feedback loops, run

{{command_prefix}}evaluate

to validate the loop with real scenarios, then

{{command_prefix}}refine

for final polish.

NEVER:

Retry with the exact same input (definition of insanity)
Use the same weak model to both generate and evaluate
Skip the max attempts limit (infinite loops are real)
Deploy changes without regression testing against golden set
Monitor only errors — track quality scores over time