Loading...
Loading...
Found 1,927 Skills
Use when the user asks to "create a metric", "write a metric", "design a metric", "build a metric for", "evaluate agent performance", "measure call quality", "track a KPI", "add a workflow metric", "improve my metric", "fix a metric", "debug metric results", "set up quality scoring", or "what metrics do I need". Also relevant when discussing LLM judge prompts, custom code metrics, evaluation triggers, VALID_SKIP patterns, section extraction, or metric best practices for Cekura voice AI agents. Covers both creating new metrics and reviewing, iterating on, or troubleshooting existing ones.
AI Agent learning roadmap and curated resources for building production-ready agents with modern patterns like Claude Code, OpenClaw, skills, MCP, and evaluation
Design and build multi-agent harness architectures for long-running AI application development. GAN-inspired Generator-Evaluator pattern, Sprint Contract negotiation, context management, quality criteria calibration. Based on Anthropic Engineering patterns. Use when: "build a harness", "multi-agent architecture", "agent orchestration", "generator-evaluator", "long-running app", "harness design", "agent pipeline", "quality evaluation loop", "sprint contract", "build app with agents", "Claude Agent SDK architecture", or when building complex full-stack apps that need planning → generation → evaluation cycles. Also use when discussing context degradation, self-evaluation bias, or assumption testing in AI workflows.
Writes, refactors, and evaluates prompts for LLMs — generating optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Use when designing prompts for new LLM applications, refactoring existing prompts for better accuracy or token efficiency, implementing chain-of-thought or few-shot learning, creating system prompts with personas and guardrails, building JSON/function-calling schemas, or developing prompt evaluation frameworks to measure and improve model performance.
Analyze multi-round evaluation score data, count various indicators, and calculate rating levels. Suitable for analyzing score trends and calculating S/A/B ratings
Force critical evaluation of proposals, requirements, or decisions by analyzing from multiple adversarial perspectives. Triggers on: accepting a proposal without pushback, 'sounds good', 'let's go with', design decisions with unstated tradeoffs, unchallenged assumptions, premature consensus. Invoke with /challenge-that.
Generate objective reference check reports about the user from real AI collaboration data — session history, git logs, GitHub profile, and memory files. Like a colleague writing a professional reference, but grounded in actual shared work. Use whenever the user asks to evaluate them as a developer, wants a reference letter, work style analysis, introduced by my agents content, interview prep from collaboration history, or blog topics from past discussions. Triggers on: write a reference, analyze my work patterns, what do you think of me, 나에 대한 레퍼런스 써줘, 내 작업 스타일 분석해줘. Not for general code review, architecture docs, cover letters, or codebase-only analysis.
Creates structured hiring scorecards for any role. Takes job title, requirements, and team context. Generates comprehensive scorecard with weighted scoring rubric, interview questions per competency, evaluation matrix, red/green flags, and reference check questions.
Reviews pitch decks and investor presentations. Reads slide content, evaluates narrative flow, problem/solution clarity, market sizing, competitive positioning, financial projections, team credibility, and ask clarity. Generates a scored pitch-review.md with slide-by-slide feedback, overall score, top improvements, investor objection predictions, and comparisons to successful decks. Use when reviewing fundraising materials, investor decks, or pitch presentations.
Evaluate the reproducibility of technical articles. Dispatch a subagent to simulate a first-time reader reproducing the work locally and list missing information. Use as the final check on a draft before publication.
Compare the differences in business quality, growth, profitability, valuation and catalysts of peer candidate companies horizontally, and provide conclusions on relative strengths and weaknesses. It is applicable to scenarios such as choosing between two candidate stocks, selecting the best among peers in an industry, and establishing a priority tracking order.
Build quick IRR/MOIC sensitivity tables for PE deal evaluation. Models returns across entry multiple, leverage, exit multiple, growth, and hold period scenarios. Use when sizing up a deal, stress-testing assumptions, or preparing IC returns exhibits. Triggers on "returns analysis", "IRR sensitivity", "MOIC table", "what's the return at", "model the returns", or "back of the envelope".