Deep Research
Hypothesis-driven research swarm. Spawns specialist agents to investigate a task,
grades every finding by evidence quality, then adversarially challenges the emerging
conclusion before delivering a structured verdict.
When This Skill Activates
Trigger on explicit research requests:
- User says: "research", "investigate", "discover", "how should I approach..."
- User asks: "what's the best way to...", "explore options for...", "deep research"
- User wants prior art, feasibility analysis, or approach comparison
Do NOT activate automatically on every task. This is an on-demand tool, not a gate.
Phase 1: Hypothesis Formation
Before spawning agents, frame the research:
-
Parse the task: Extract the core question or goal. If ambiguous, ask the user
to clarify before proceeding. Identify technology keywords (languages, frameworks,
libraries mentioned or implied by the codebase).
-
Identify repo context: Run
git rev-parse --show-toplevel
to get
.
If this fails (not a git repo), set
to the current working directory.
Check for
,
,
,
, etc. to identify
the language/framework stack. Pass this as
. If no manifests are found,
set
to the primary file extensions present or ask the user.
-
speak-memory: If
exists and an active story matches
the current work, read it for context. If
does not exist, skip.
-
Form hypotheses: State 1-3 hypotheses to investigate. For each:
- Question: The specific question being answered
- Prior belief: What you expect the answer to be (best guess before research)
- Disconfirming evidence: What evidence would change the answer
If the task is open-ended ("how should I build X?") and you have no prior belief,
that's fine — set prior belief to "unknown" and frame the hypothesis as:
"There is an established approach for X in this codebase/ecosystem."
Disconfirming evidence: "No established patterns exist; this is novel."
-
Set scope budget: Declare upfront: "Budget: 5 research agents + 1 adversarial
challenge = 6 agent calls, investigating N hypotheses." (substitute the actual
hypothesis count for N). When budget is exhausted, synthesize what you have
rather than expanding.
Present hypotheses to the user before proceeding:
Hypotheses:
1. [Question] — Prior belief: [X]. Would change if: [Y].
2. ...
Budget: 6 agents, [hypothesis count] questions. Proceed?
If the user declines or asks to revise, update the hypotheses and re-present.
Do not spawn agents until the user confirms.
Phase 2: Evidence Gathering
Spawn all 5 agents
in a single message (parallel Agent tool calls). Each agent
returns a JSON findings array — see
references/agent-roles.md
for full prompts.
| Agent | Subagent Type | Model | Focus |
|---|
| codebase | Explore | opus | Existing patterns, utilities, similar implementations, conventions |
| web-research | general-purpose | opus | Solutions, libraries, best practices, documentation |
| tools-mcp | general-purpose | opus | Available MCP servers, tools, and resources |
| skills | general-purpose | opus | Installed skills and marketplace matches |
| dependencies | general-purpose | opus | Installed packages, version constraints, compatibility |
Agent tool grants:
- codebase: (read-only codebase exploration)
- web-research: (external research)
- tools-mcp:
ListMcpResourcesTool, ReadMcpResourceTool
(tool discovery)
- skills: (read skills + )
- dependencies: (read manifests + //)
Pass each agent the hypotheses so they can focus their search, but instruct them
to also report unexpected relevant findings outside the hypothesis scope.
Required output format (every finding must include
):
json
[
{
"evidence_tier": "primary|secondary|speculative",
"relevance": "high|medium|low",
"source": "where this was found (file path, URL, tool name)",
"finding": "what was discovered",
"supports_hypothesis": "which hypothesis this relates to, or 'unexpected'",
"recommendation": "how this finding applies to the task",
"references": ["file paths or URLs for further reading"]
}
]
Evidence tiers:
- primary: Direct from authoritative source — code you read, API response,
official documentation, test output
- secondary: Reputable third-party — blog post with evidence, SO answer with
code examples, well-maintained library README
- speculative: Inference, analogy, or "I think" — no direct source confirms this
Return empty array
if no relevant findings in that domain.
Instruct each agent: "Return your top 10 findings maximum, prioritized by relevance.
Tag every finding with an evidence_tier. Never report speculative findings as primary."
Error handling: If an agent returns non-JSON output, strip code fences and attempt
JSON extraction. If extraction fails or the agent times out, treat as empty array
and log a warning. Continue with results from agents that succeeded.
Phase 3: Synthesis
Merge all 5 finding arrays. Apply the synthesis rules from
:
- Merge: Collect all 5 JSON arrays into a single flat array. Tag each finding
with its source agent name. Note which agents failed or returned empty.
- Deduplicate: Merge identical findings; note corroborating sources.
- Resolve conflicts: Flag when agents disagree; note the resolution.
- Grade evidence: Flag any conclusion that rests entirely on speculative evidence.
A conclusion needs at least one primary or secondary source to be credible.
- Rank: Sort by evidence_tier (primary first), then relevance, then corroboration.
Cap the merged array at 30 findings — drop low-relevance speculative findings
first to keep context manageable.
- Form preliminary conclusion: Based on the ranked findings, form a preliminary
answer to each hypothesis. State whether the prior belief was confirmed or changed.
- Generate approaches: Propose 2-3 approaches with trade-offs. Each approach
must reference the evidence that supports it with tier tags.
Phase 4: Adversarial Challenge
Before delivering findings, actively try to disprove the emerging conclusion.
Spawn one
devil's advocate agent (
subagent_type: "general-purpose"
,
).
Give it:
- The preliminary conclusion from Phase 3
- The recommended approach
- The hypotheses and their current status (confirmed/changed)
The devil's advocate agent's job:
- Search for disconfirming evidence using
WebSearch, WebFetch, Read, Glob, Grep
- Find reasons the recommended approach would fail
- Identify assumptions that haven't been tested
- Look for alternatives the research may have missed
See
references/agent-roles.md
for the full devil's advocate prompt.
Handle the devil's advocate result:
- : The conclusion survives — it's stronger. Note any speculative
counterarguments as dissent but don't change the recommendation.
- : The devil's advocate found credible disconfirming evidence
(primary or secondary tier). Revise the conclusion, note the revision, and
re-generate approaches (re-run synthesis Step 7) to reflect the updated position.
- : The recommended approach is fundamentally flawed. Discard
it, revise all affected hypothesis conclusions, and re-generate approaches from
scratch based on the combined original + adversarial evidence.
Error handling: If the devil's advocate agent fails (non-JSON output, timeout, or
invalid structure), log a warning: "Devil's advocate failed — conclusion not adversarially
tested. Reduce confidence by one level." Continue to Phase 5 with the untested conclusion.
Phase 5: Verdict
Output a structured verdict (use
assets/report-template.md
for formatting):
VERDICT: [one sentence answer]
CONFIDENCE: high|medium|low — [justification]
EVIDENCE:
1. [primary] source — finding
2. [secondary] source — finding
3. ...
DISSENT: [strongest counterargument from devil's advocate]
ACTION: [what to do next — implement X, investigate Y further, do nothing]
After the verdict, output the structured artifact for downstream skills:
json
{
"task": "original task description",
"tech_stack": ["identified technologies"],
"hypotheses": [
{
"question": "...",
"prior_belief": "...",
"conclusion": "confirmed|changed|inconclusive",
"conclusion_detail": "what we found"
}
],
"metadata": {
"agents_completed": ["codebase", "web-research", "tools-mcp", "skills", "dependencies", "devils-advocate"],
"agents_failed": [],
"total_findings": 0,
"speculative_only_conclusions": 0,
"timestamp": "ISO-8601"
},
"findings": [... merged findings array (high/medium only, with evidence_tier) ...],
"conflicts": [
{
"description": "what conflicts",
"agents": ["which agents disagree"],
"resolution": "how resolved, or null if user must decide"
}
],
"verdict": {
"summary": "one sentence",
"confidence": "high|medium|low",
"confidence_justification": "why this confidence level",
"dissent": "strongest counterargument"
},
"approaches": [
{
"name": "Approach A",
"summary": "one-line description",
"pros": ["..."],
"cons": ["..."],
"recommended": true,
"evidence": ["[primary] source — finding", "..."],
"relevant_files": ["paths from codebase agent"],
"relevant_tools": ["MCP tools/skills discovered"],
"dependencies_needed": ["new packages, if any"],
"estimated_complexity": "low|medium|high"
}
]
}
No-findings edge case: If all agents return empty arrays, output the structured
artifact with
,
, and
verdict.confidence: "low"
.
Set
to: "Research found no directly relevant prior art. Recommend
exploratory implementation or breaking the task into smaller sub-problems."
speak-memory: If an active story was loaded in Phase 1, use Write/Edit to update
the story file — append to Recent Activity and update Current Context.
Key Constraints
- All research agents: Opus () for maximum research quality.
- Agent execution caps: Research agents: . Devil's advocate: .
- Max 2 levels of nesting: orchestrator → specialist. Specialists never spawn agents.
- Scope budget: 5 research agents + 1 devil's advocate = 6 total. Do not expand.
- Findings cap: Max 30 findings enter synthesis (after merge + dedup). Drop
low-relevance speculative findings first.
- All sub-agents are read-only — no code modifications, no git changes. The
orchestrator may write to only.
- Bash (sub-agents only) limited to dependency queries (, , )
and skill search (). No other Bash commands.
- Conclusions resting entirely on speculative evidence must be flagged as low confidence.
- The structured artifact stays in conversation context — no file writing.
Closing Checklist
Do not declare the research done until all boxes are checked:
Reference Files
Load only when needed:
references/agent-roles.md
— Full prompt templates for each research agent + devil's advocate
- — Evidence grading, dedup, conflict resolution, approach generation
assets/report-template.md
— Verdict and report format