agent-development

Original🇺🇸 English
Translated

Use when the user needs to build AI agents — tool use patterns, memory management, planning strategies, multi-agent coordination, evaluation, and safety guardrails. Triggers: user says "agent", "build an agent", "tool use", "agent loop", "multi-agent", "memory management", "guardrails", "agent evaluation".

7installs
Added on

NPX Install

npx skill4agent add pixel-process-ug/superkit-agents agent-development

Agent Development

Overview

Design and build AI agents that effectively use tools, manage memory, plan multi-step tasks, coordinate with other agents, and operate within safety guardrails. This skill covers the full agent development lifecycle from architecture through evaluation, with emphasis on observable, testable, and safe agent behavior.

Phase 1: Agent Design

  1. Define the agent's purpose and scope
  2. Identify required tools and capabilities
  3. Design memory architecture (short-term, long-term)
  4. Plan agent loop structure (observe, think, act)
  5. Define safety boundaries and guardrails
STOP — Present agent design to user for approval before implementation.

Agent Architecture Decision Table

Agent TypeWhen to UseLoop PatternComplexity
Single-turn tool userSimple queries with tool callsRequest -> Tool -> ResponseLow
ReAct agentMulti-step reasoning tasksThought -> Action -> Observation -> loopMedium
Plan-and-executeComplex tasks with dependenciesPlan -> Execute steps -> ValidateMedium-High
Multi-agent orchestratorParallel/specialized sub-tasksDispatch -> Collect -> SynthesizeHigh
Autonomous loop (Ralph-style)Long-running iterative developmentPlan -> Build -> Verify -> Exit gateHigh

Phase 2: Implementation

  1. Build the agent loop with tool dispatch
  2. Implement memory management (context window, persistence)
  3. Add planning and decomposition logic
  4. Integrate error recovery and retry patterns
  5. Implement output validation
STOP — Run smoke tests on the agent loop before adding complexity.

Tool Use Patterns

Tool Definition Best Practices

PrincipleRuleExample
Clear namingverb-noun format
search_documents
,
create_file
Detailed descriptionsInclude when to use AND when NOT to use"Use for keyword search. Do NOT use for semantic similarity."
Well-typed parametersDescriptions and examples on every param
query: string // "e.g., 'user authentication'"
Predictable returnsConsistent format across toolsAlways return
{ success, data, error }
Self-correcting errorsHelp agent recover"Invalid date format. Expected ISO 8601: YYYY-MM-DD"

Tool Selection Strategy

Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure

Tool Design Principles

  • Composable: small tools that combine for complex tasks
  • Idempotent: safe to retry without side effects (where possible)
  • Observable: return enough context for the agent to verify success
  • Bounded: timeouts and size limits on all operations
  • Documented: every parameter and return value described

Memory Management

Memory Type Decision Table

TypeDurationStorageUse Case
Working MemoryCurrent turnContext windowActive reasoning
Short-term MemoryCurrent sessionIn-context or bufferRecent conversation
Long-term MemoryAcross sessionsDatabase/fileLearned patterns, user prefs
Episodic MemorySpecific eventsIndexed storePast task outcomes
Semantic MemoryKnowledgeVector DBDomain knowledge retrieval

Context Window Management

Strategy: Sliding window with importance-based retention

1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand

Budget allocation:
  System prompt + tools: ~20%
  Current task context:  ~40%
  Conversation history:  ~25%
  Retrieved memory:      ~15%

Memory Update Triggers

TriggerAction
User correctionUpdate learned patterns
Task completionStore outcome and approach
Error recoveryRecord what failed and what worked
New domain knowledgeIndex for future retrieval

Planning Strategies

Hierarchical Task Decomposition

1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approach

ReAct Pattern (Reason + Act)

Thought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details

Plan-and-Execute Pattern

1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)

Reflection Pattern

After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?

Phase 3: Evaluation and Safety

  1. Build evaluation harness with test scenarios
  2. Measure accuracy, efficiency, and safety metrics
  3. Test edge cases and adversarial inputs
  4. Add monitoring and logging
  5. Implement circuit breakers for runaway behavior
STOP — All safety guardrails must be tested before deployment.

Multi-Agent Coordination

Coordination Pattern Decision Table

PatternDescriptionUse When
OrchestratorCentral agent delegates to specialistsClear task hierarchy
PipelineAgents process in sequenceLinear workflows
DebateAgents propose and critiqueNeed diverse perspectives
VotingMultiple agents, majority winsUncertainty in approach
SupervisorOne agent monitors othersSafety-critical tasks

Communication Protocol

Agent-to-Agent message:
{
  "from": "planner",
  "to": "executor",
  "type": "task_assignment",
  "content": { "task": "...", "context": "...", "constraints": "..." },
  "priority": "high",
  "deadline": "2025-01-15T10:00:00Z"
}

Coordination Rules

  • Define clear ownership boundaries
  • Use structured messages between agents
  • Implement deadlock detection
  • Set timeouts for inter-agent communication
  • Log all inter-agent messages for debugging

Evaluation Framework

Metrics Decision Table

MetricWhat It MeasuresHow to MeasureTarget
Task Success RateCorrect completions / totalAutomated + human eval> 90%
EfficiencySteps vs optimal pathStep count comparison< 2x optimal
Tool AccuracyCorrect tool calls / totalLog analysis> 95%
SafetyViolations / total interactionsGuardrail checks0 violations
LatencyTime to complete taskWall clock< SLA
CostToken usage per taskAPI usage trackingWithin budget

Evaluation Dataset Structure

json
{
  "test_cases": [
    {
      "id": "tc_001",
      "input": "Find all orders over $100 from last week",
      "expected_tools": ["search_orders"],
      "expected_output_contains": ["order_id", "amount"],
      "category": "retrieval",
      "difficulty": "easy"
    }
  ]
}

Safety Guardrails

Input Guardrails

  • Detect and reject prompt injection attempts
  • Validate all user inputs before processing
  • Rate limit requests per user/session
  • Content filtering for harmful requests

Output Guardrails

  • Validate tool call arguments before execution
  • Check outputs for sensitive information (PII, secrets)
  • Enforce response format constraints
  • Prevent infinite tool call loops

Operational Guardrails

  • Maximum tool calls per task (circuit breaker)
  • Maximum tokens per response
  • Timeout for total task duration
  • Escalation to human when confidence is low
  • Audit logging for all actions

Circuit Breaker Thresholds

ConditionThresholdAction
Max tool calls per task20Stop execution, return error
Max consecutive errors3Stop, log, return graceful error
Max task duration5 minutesTimeout, return partial result
Max tokens generated10,000Stop generation
Pattern repeats5 identical errorsOpen circuit, alert operator

Prompt Engineering for Agents

System Prompt Structure

1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)

Key Prompt Patterns

  • Scratchpad: encourage step-by-step reasoning before action
  • Self-correction: "If your first approach fails, try..."
  • Confidence calibration: "Only proceed if you are confident"
  • Graceful degradation: "If you cannot complete the task, explain why"

Anti-Patterns / Common Mistakes

Anti-PatternWhy It Is WrongWhat to Do Instead
Calling tools without reasoningWastes calls, misses contextUse ReAct pattern (think first)
No max iteration limitInfinite loops, runaway costsSet circuit breaker thresholds
Trusting all tool outputsCorrupted data propagatesValidate tool results
Hardcoded tool sequencesNo adaptability to failuresDynamic tool selection based on state
No error recovery strategyAgent gets stuck on first failureImplement retry with alternatives
Apologizing instead of actingWastes user timeTake corrective action, then report
Over-reliance on single toolFragile if that tool failsProvide fallback tools
No evaluation frameworkShipping blind, no quality signalBuild eval harness before deployment
Unlimited context growthContext overflow, degraded qualityImplement memory management

Integration Points

SkillIntegration
mcp-builder
MCP servers provide tools for agents
planning
Agent planning uses structured plan generation
autonomous-loop
Ralph-style loops are a specialized agent pattern
dispatching-parallel-agents
Multi-agent coordination pattern
circuit-breaker
Operational safety for agent loops
verification-before-completion
Agent output validation
test-driven-development
TDD for agent tool implementations

Skill Type

FLEXIBLE — Adapt the agent architecture, memory strategy, and coordination patterns to the specific use case. Safety guardrails and evaluation frameworks are strongly recommended for all production agents.