Loading...
Loading...
Found 9 Skills
Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
Diagnose and fix GitHub Actions CI failures. Inspects workflow runs and logs, identifies root causes, implements minimal fixes, and pushes to a fix branch. Use when CI is failing, red, broken, or needs diagnosis.
Fetch CI build results and diagnose failures. Auto-detects provider from project files or URLs. Supports GitHub Actions, Buildkite, and CircleCI.
Evaluates RAG (Retrieval-Augmented Generation) pipeline quality across retrieval and generation stages. Measures precision, recall, MRR for retrieval; groundedness, completeness, and hallucination rate for generation. Diagnoses failure root causes and recommends chunk, retrieval, and prompt improvements. Triggers on: "audit RAG", "RAG quality", "evaluate retrieval", "hallucination detection", "retrieval precision", "why is RAG failing", "RAG diagnosis", "retrieval quality", "RAG evaluation", "chunk quality", "RAG pipeline review", "grounding check". Use this skill when diagnosing or evaluating a RAG pipeline's quality.
Use this skill when the user wants to debug, diagnose, or systematically iterate on an experiment that already exists, or when they need a structured experiment log for tracking runs, hypotheses, failures, results, and next steps during active research. Apply it to underperforming methods, training that will not converge, regressions after a change, inconsistent results across datasets, aimless experimentation without progress, and questions like 'why doesn't this work?', 'no progress after many attempts', or 'how should I investigate this failure?'. Also use it for setting up practical experiment logging/record-keeping that supports debugging and iteration. Do not use it for designing a brand-new experiment pipeline or full experiment program (use experiment-pipeline), generating research ideas, fixing isolated coding/syntax errors, or writing retrospective summaries into research memory/notes/knowledge bases.
Analyze MSBuild binary logs to diagnose build failures by replaying binlogs to searchable text logs. Only activate in MSBuild/.NET build context. USE FOR: build errors that are unclear from console output, diagnosing cascading failures across multi-project builds, tracing MSBuild target execution order, investigating common errors like CS0246 (type not found), MSB4019 (imported project not found), NU1605 (package downgrade), MSB3277 (version conflicts), and ResolveProjectReferences failures. Requires an existing .binlog file. DO NOT USE FOR: generating binlogs (use binlog-generation), build performance analysis (use build-perf-diagnostics), non-MSBuild build systems. INVOKES: dotnet msbuild binlog replay, grep, cat, head, tail for log analysis.
Detects MSBuild projects with conflicting OutputPath or IntermediateOutputPath. Only activate in MSBuild/.NET build context. USE FOR: builds failing with 'Cannot create a file when that file already exists', 'The process cannot access the file because it is being used by another process', intermittent build failures that succeed on retry, missing outputs in multi-project builds, multi-targeting builds where project.assets.json conflicts. Diagnoses when multiple projects or TFMs write to the same bin/obj directories due to shared OutputPath, missing AppendTargetFrameworkToOutputPath, or extra global properties like PublishReadyToRun creating redundant evaluations. DO NOT USE FOR: file access errors unrelated to MSBuild (OS-level locking), single-project single-TFM builds, non-MSBuild build systems. INVOKES: dotnet msbuild binlog replay, grep for output path analysis.
Activate this when users need to understand extreme events (bubbles, crashes, mass hysteria, cults, mob behavior), diagnose systemic organizational failures, or assess the risk of multiple psychological/market/institutional forces aligning in the same direction. Typical trigger signals: the phenomenon described by the user "far exceeds what any single factor can explain"; the user attempts to explain an extreme outcome with a single cause; the user is concerned about "multiple adverse factors erupting simultaneously". Not applicable to conventional single-factor decision analysis or assessment of mild incremental changes.
Diagnose pytest or CI failures, identify root cause, and implement the minimal fix. Use when tests fail or CI reports errors.