Loading...
Loading...
Found 1,928 Skills
Critically review terminal user interfaces for UX quality, responsiveness, visual design, and interactivity. Use when asked to "review my TUI", "test my TUI UX", "audit my terminal UI", "check TUI responsiveness", "review TUI keybindings", "check interactivity", or any request to evaluate the user experience quality of a ratatui/crossterm/ncurses-based terminal application. Launches the TUI in tmux, systematically tests 10 dimensions (responsiveness, input conflicts, visual clarity, navigation, feedback loops, error states, layout, keyboard design, permission flows, visual design & color), and produces a graded report with screenshots and specific findings. Benchmarks against Claude Code, OpenCode, and Codex — the three best-in-class AI terminal UIs.
Markets orchestration — connects ESPN live schedules with Kalshi & Polymarket prediction markets. Unified dashboards, odds comparison, entity search, and bet evaluation across platforms. Use when: user wants to see prediction market odds alongside ESPN game schedules, compare odds across platforms, search for a team/player on Kalshi or Polymarket, check for arbitrage between ESPN odds and prediction markets, or evaluate a specific game's market value. Don't use when: user wants raw prediction market data without ESPN context — use polymarket or kalshi directly. For pure odds math (conversion, de-vigging, Kelly) — use betting. For live scores without market data — use the sport-specific skill.
Evaluate and validate Claude Code rules in .claude/rules/ directories. Use when auditing rule file quality, validating frontmatter and glob patterns, or checking rules organization before deployment. Do not use when writing new rules from scratch - use rule authoring guides instead. Do not use when evaluating skills or hooks - use skills-eval or hooks-eval instead.
Perform a PESTLE analysis covering Political, Economic, Social, Technological, Legal, and Environmental factors. Use when assessing the macro environment, doing strategic planning, or evaluating external factors affecting your business.
Evaluate a vendor — cost analysis, risk assessment, and recommendation. Use when reviewing a new vendor proposal, deciding whether to renew or replace a contract, comparing two vendors side-by-side, or building a TCO breakdown and negotiation points before procurement sign-off.
This skill provides the 12 critical story questions for screenplay evaluation. Covers concept, theme, audience reaction, beginning, ending, rising tensions, characters, protagonist, motivation, antagonist, and believability. Use when: evaluating a screenplay draft, identifying story weaknesses, preparing for rewrites, or validating fundamentals before submission.
Opik observability for LLM agents — Agent Configuration, Local Runner (opik connect), Evaluation Suites, threads, integrations. Use for "configure my agent", "connect my agent", "evaluate my agent" or "integrate with Opik".
Iterative code refinement through plan → code → evaluate → refine cycles. Runs lint checks (ruff), tests (pytest), and structured self-evaluation each cycle, then diagnoses failures and refines. Decomposes complex tasks into sequential phases, iterates up to 3 times per phase (10 total). Use when: the main agent delegates a code task with 'MODE: MORE_EFFORT', the user selects 'More Effort' code generation mode, or the task explicitly requests iterative refinement for higher code quality. Do NOT use for single-pass code generation (Lite mode), experiment pipeline orchestration (use experiment-pipeline), or diagnosing a specific experiment failure (use experiment-craft).
Apply structured critical thinking — identifying claims, evidence, reasoning chains, hidden assumptions, and logical fallacies — to evaluate or construct specific written arguments rigorously. Use this skill when the user presents a concrete argument, claim, op-ed, research finding, or piece of reasoning to be analyzed for logical validity or flaws, even if they say 'is this argument valid', 'what logical fallacies are in this', or 'what assumptions am I making in this thesis'. Do NOT use for casual plan review, trip planning, project risk brainstorming, or pre-mortems — 'poke holes in my plan' requests are red-team / risk review, not argument analysis.
Use when measuring or improving agent quality and performance — set up evaluators, online monitoring, CI/CD quality gates, observability, or cost optimization. Triggers on: "evaluate my agent", "add evaluator", "measure quality", "quality gate", "run evals", "agent too slow", "why is it slow", "reduce latency", "set up observability", "CloudWatch dashboard", "how much does my agent cost", "cost optimization", "logs not showing up", "logs missing", "spans not found", "eval failing", "eval error", "dev traces", "local traces", "agentcore dev traces", "traces to CloudWatch". Not for debugging errors or crashes — use agents-debug. Slow but correct routes here; broken routes to debug.
Filesystem RAG benchmarks: corpus/, train.json, evaluate_rag.py (RAGAS quality). Not for prod monitoring, latency/throughput benchmarking (use rag-perf), or evals outside this repo layout.
Evaluate UX/UI using Don Norman's 7 fundamental design principles from The Design of Everyday Things. Audit discoverability, affordances, signifiers, feedback, mapping, constraints and conceptual models.