Loading...
Loading...
Found 21 Skills
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
Expert prompt optimization for LLMs and AI systems. Use PROACTIVELY when building AI features, improving agent performance, or crafting system prompts. Masters prompt patterns and techniques.
Expert data analysis and manipulation for customer support operations using pandas
Query cross-project usage analytics. Use when reviewing agent, skill, hook, or team performance across OrchestKit projects. Also replay sessions, estimate costs, and view model delegation trends.
Designs multi-agent system architectures with orchestration patterns, tool schemas, and performance evaluation. Use when building AI agent systems, designing agent workflows, creating tool schemas, or evaluating agent performance.
Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.
Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).
Review recent work, identify process gaps and repeated mistakes, and produce specific file edits to prevent them. Not a reflection exercise — outputs config and identity changes. Trigger manually after sprints, or automate weekly.
Paired benchmark orchestration for comparing coding-agent performance with recursive-mode off and on. Use when the user wants to benchmark recursive-mode, compare recursive vs non-recursive execution on the same project, generate disposable benchmark repos, capture timing/build-test logs, or write a benchmark report.