Loading...
Loading...
Found 18 Skills
TDD-style testing methodology for skills using fresh subagent instances to prevent priming bias and validate skill effectiveness. Use when validating skill improvements, testing skill effectiveness, preventing priming bias, measuring skill impact on behavior. Do not use when implementing skills (use skill-authoring instead), creating hooks (use hook-authoring instead).
Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
Use when creating or editing any prompt (commands, hooks, skills, subagent instructions) to verify it produces desired behavior - applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
Creates, modifies, and manages Claude Code subagents by writing agent files with YAML frontmatter, system prompts, and tool configurations. Use when you need to "create an agent", "modify an agent", "set up a specialist", "I need an agent for [task]", "agent to handle [domain]", or "configure agent tools". Covers agent file format, YAML frontmatter, system prompts, tool restrictions, MCP integration, model selection, and testing.
Deprecated alias for tdd skill
Use when facing 2 or more independent tasks that can be completed without shared state or sequential dependencies
Implementation guidance for creating individual agents in the Arcanea system with proper structure, capabilities, and integration.
RED-GREEN-REFACTOR testing for agents: dispatch subagents with known inputs, capture verbatim outputs, verify against expectations. Use when creating, modifying, or validating agents and skills. Use for "test agent", "validate agent", "verify agent works", or pre-deployment checks. Do NOT use for feature requests, simple prompt edits without behavioral impact, or agents with no structured output to verify.