Agent Room
Meta — Stochastic Multi-Agent Discussion. View a problem through multiple expert perspectives via debate or polling.
Core Question: "What do multiple perspectives converge on — and where do they genuinely disagree?"
This is the centralized multi-perspective analysis capability. When any skill needs debate, consensus, or multiple viewpoints on a decision, it invokes agent-room. Structured decomposition work (like task-breakdown) may retain specialized agents for their domain.
Two Entry Points
1. Standalone (user invokes directly)
User runs
/agent-room "Should we use a monorepo or polyrepo?"
— the skill runs a full debate or poll session.
2. Sub-routine (another skill invokes mid-flow)
The
skill hits a complex decision during conversation. It invokes agent-room with the specific decision framed, waits for the result, then continues the conversation.
Sub-routine protocol (for invoking skills):
1. Frame the specific decision as a clear problem statement
2. Include relevant context gathered so far
3. Invoke agent-room with mode (debate/poll) and agent count
4. Receive the report: consensus, disagreements, recommendation
5. Integrate the recommendation into the ongoing conversation
6. The agent-room report is ephemeral — it lives in context, not necessarily on disk
When invoked as a sub-routine, skip writing the report to disk unless the user asks. The value is the insight, not the artifact.
Critical Gates
- Choose the right mode — debate for trade-off decisions, poll for filtering hallucinations and finding consensus. Default to debate (richer output for fewer agents).
- Problem must be specific — N agents on a fuzzy prompt wastes tokens. If vague, ask the user to sharpen before spawning.
- Agents must produce structured output — freeform prose can't be aggregated.
- Cost scales with agent count — 3 debate agents x 3 rounds ~ $0.30-0.50. 10 poll agents ~ $0.30-0.50. Default to sonnet unless user requests opus.
Mode Routing
| Keywords | Mode |
|---|
| "debate", "argue", "discuss", "chatroom", "trade-off" | Debate |
| "consensus", "poll", "vote", "what do agents think", "multiple opinions" | Poll |
| Ambiguous | Default to Debate |
Mode A: Debate
Spawn N agents (default 3) into a shared conversation. Each reads the full chat history before responding — building on, challenging, or refining previous contributions.
Why it works: Sequential handoffs lose context. A shared conversation preserves reasoning chains and enables genuine debate. When Agent A says "this needs a queue" and Agent B says "a simple loop is fine," that disagreement is more valuable than either agent's solo answer.
A1. Parse the Request
Extract:
- Problem/question to debate
- Agent count N — default 3 (override: "have 5 agents debate")
- Round count R — default 3 (override: "debate for 5 rounds")
- Agent roles — user may specify. If not, assign diverse defaults.
A2. Assign Agent Roles
Each agent gets a distinct perspective to maximize productive disagreement.
Software engineering:
- Architect — systems, interfaces, scalability, long-term maintainability
- Pragmatist — shipping fast, minimal complexity, "good enough" solutions
- Critic — edge cases, failure modes, security holes, unstated assumptions
Product/design:
- User advocate — UX, simplicity, delight
- Business strategist — revenue, growth, competitive advantage
- Engineer — technical feasibility and cost
Strategy/decisions:
- Optimist — opportunity, upside, reasons to act
- Skeptic — risk, downside, reasons to wait
- Synthesizer — middle path, integrates both perspectives
For N > 3, add roles that create productive tension with existing ones.
Constraint-assignment for divergence — when the debate is about design or architecture (not strategy), assign each agent a structural constraint instead of (or in addition to) a perspective. This mechanically forces different solutions rather than hoping for them:
- Agent 1: "Minimize surface area — aim for the fewest possible methods/endpoints"
- Agent 2: "Maximize flexibility — support the widest range of use cases"
- Agent 3: "Optimize for the most common case — make the 80% path trivially simple"
- Agent 4 (if N > 3): "Take inspiration from [specific paradigm/library the user knows]"
Constraint-assigned agents produce genuinely different designs. Perspective-assigned agents tend to converge on similar designs with different justifications.
A3. Run Debate Rounds
Round 1 — Opening positions. Agent prompt:
You are {role}: {role_description}
PROBLEM:
{problem}
CONTEXT:
{context}
This is Round 1 of a multi-agent debate. State your initial position.
Be specific — propose actual solutions, not vague principles. Take a clear stance.
Other agents will challenge you in subsequent rounds.
Communication discipline:
- No performative agreement: never open with "Great point" or "I appreciate X's perspective"
- State disagreements directly: "That approach fails because [X]" not "While that has merit..."
- No hedging: "This will break under load" not "This might potentially have scaling concerns"
Respond in this format:
POSITION: [One-sentence stance]
REASONING: [3-5 key points]
PROPOSAL: [Concrete recommendation]
CONCERNS: [What could go wrong with your approach]
Write your response directly — do not write to any files.
Rounds 2+ — Debate. Agent prompt:
You are {role}: {role_description}
PROBLEM:
{problem}
PREVIOUS DISCUSSION:
{all previous round entries}
This is Round {N}. Read the previous discussion carefully.
1. Respond to the strongest counterargument against your position
2. Identify where you AGREE with other agents (concede good points)
3. Identify where you still DISAGREE and why
4. Refine your proposal based on the discussion
Do NOT repeat your previous position. Engage with what others said.
Change your mind if they made a better argument.
Do NOT soften disagreements with praise. "I appreciate Agent A's point, but..." is sycophancy disguised as discourse. State the disagreement directly.
Respond in this format:
AGREEMENTS: [What other agents got right]
DISAGREEMENTS: [Where you still differ and why]
REFINED PROPOSAL: [Updated recommendation]
CONFIDENCE: [1-10]
Write your response directly — do not write to any files.
After each round:
- Collect all agent responses
- Check for convergence: if all agents agree (confidence 8+, proposals aligned), stop early
- Otherwise continue to next round
A4. Synthesize
After the last round, you (the orchestrator) read the full debate and synthesize:
- Where did agents converge? — high-confidence conclusions
- Where did they remain split? — genuine trade-offs the user must decide
- What concerns were raised but unresolved? — risks to monitor
- Did any agent change their mind? — mind-changes are strong signals
Mode B: Poll
Spawn N agents (default 10) with identical context and varied framings. Each independently analyzes and produces structured output. Aggregate by consensus, divergence, and outlier.
Why it works: Exploits stochastic variation. Like polling 10 experts separately. Filters hallucinations and individual biases. Divergences reveal genuine judgment calls.
B1. Design Structured Output Schema
Each agent must return structured output that can be mechanically compared:
| Output Type | When | Schema |
|---|
| Ranking | Predefined options | "Rank these 5 options 1-5" |
| Recommendation | Open-ended | "Top 3 recommendations with confidence 1-10" |
| Binary | Yes/no decision | "YES or NO, top 3 reasons" |
| Scoring | Multi-criteria | "Score each option 1-10 on [criteria]" |
B2. Generate Framing Variations
N slightly different prompts. Core problem + schema identical — only framing varies:
- Neutral baseline
- Risk-averse analyst
- Growth-oriented strategist
- Contrarian (challenge conventional wisdom)
- First-principles reasoner
- User-empathy focus
- Resource-constrained optimizer
- Long-term (5-year) optimizer
- Data-driven (measurable only)
- Systems thinker (second/third-order effects)
For N < 10, use the first N. For N > 10, cycle.
B3. Spawn All N Agents in Parallel
One-pass — no convergence detection. Independent samples give better statistical signal than iterative refinement.
B4. Aggregate Results
Rankings: Borda count (1st = N points, 2nd = N-1, etc.)
Recommendations: Group similar, count occurrences. Consensus (70%+), Divergence (40-69%), Outlier (<40%).
Scoring: Mean, median, standard deviation. Flag high-variance options.
Binary: Count YES/NO, summarize strongest arguments from each side.
Report
When standalone (or when explicitly requested), write to
.agents/meta/agent-room-report.md
:
markdown
---
skill: agent-room
version: 1
date: {YYYY-MM-DD}
status: final
---
# Agent Room Report
**Problem**: {problem}
**Mode**: {debate | poll}
**Agents**: {N} | **Rounds**: {R, debate only}
Debate sections: Participants, Consensus, Key Disagreements, Recommended Action, Unresolved Risks, Debate Highlights.
Poll sections: Consensus (X+/N agreed), Divergences (split X/Y), Outliers (Z/N), Raw Rankings/Scores.
When invoked as sub-routine: return the synthesis inline, skip disk write.
Configuration
| Parameter | Default | Override |
|---|
| mode | debate | "poll this" / "debate this" |
| N | 3 (debate) / 10 (poll) | "5 agents" / "15 agents" |
| R | 3 | "debate for 5 rounds" (debate only) |
| model | sonnet | "use opus" |
| roles | auto | "have a DBA, a frontend dev, and a DevOps engineer debate" |
Edge Cases
- Vague problem: Ask user to sharpen before spawning. Don't burn tokens on vagueness.
- N < 2 (debate) or N < 3 (poll): Warn user — debate needs 2+, poll needs 3+.
- Unanimous agreement round 1: Stop early. Report consensus. Valid and cheap.
- Deadlock after R rounds: Report honestly. The finding IS that no dominant answer exists.
- Even poll split: Report the split. No forced tiebreaker.
- Agent goes off-topic: Exclude from synthesis, note effective N.
- Existing report: Overwrite — these are ephemeral analysis artifacts.
Cost Considerations
- 3 sonnet agents x 3 rounds (debate): ~$0.30-0.50
- 10 sonnet agents (poll): ~$0.30-0.50
- Opus multiplies ~10x — only use when explicitly requested
- Early convergence saves cost
- For binary decisions, 5 poll agents usually suffices
Chain Position
Standalone skill — can be invoked by any other skill as a sub-routine for multi-perspective decisions. Typical callers:
,
,
.