Loading...
Loading...
Found 23 Skills
ABSOLUTE MUST to debug and inspect LLM/AI agent traces using PostHog's MCP tools. Use when the user pastes a trace URL (e.g. /llm-observability/traces/<id>), asks to debug a trace, figure out what went wrong, check if an agent used a tool correctly, verify context/files were surfaced, inspect subagent behavior, investigate LLM decisions, or analyze token usage and costs.
Use when your agent or environment is broken — wrong answers, errors, timeouts, tool failures, or CLI issues. Reads traces and logs to diagnose root causes. Also checks prerequisites when the CLI itself isn't working. Triggers on: "agent not working", "wrong answer", "agent error", "tool call failing", "debug agent", "check logs", "read traces", "broken", "500 error", "424 error", "model access denied", "command not found", "stuck in DELETING", "maxVms exceeded", "cold start diagnosis", "cold start slow", "agentcore create error", "create failed", "exit code 7", "connection refused local dev". Not for deploy failures — use agents-deploy. Not for performance tuning without errors — use agents-optimize. Not for VPC configuration — use agents-build. Not for observability setup or missing logs — use agents-optimize.
Retrieve and analyze simulation results from a Coval run. Use when user wants to review evaluation outcomes or debug agent behavior.
Use when diagnosing agent failures, debugging lost-in-middle issues, understanding context poisoning, or asking about "context degradation", "lost in middle", "context poisoning", "attention patterns", "context clash", "agent performance drops"
Debug problems by investigating multiple hypotheses in parallel. Use when you have a bug, unexpected behaviour, or mystery where the root cause is unclear. Spawns parallel investigator agents each pursuing a different theory, then compares evidence to identify the most likely cause and fix.
Agent tracing CLI for inspecting agent execution snapshots. Use when user mentions 'agent-tracing', 'trace', 'snapshot', wants to debug agent execution, inspect LLM calls, view context engine data, or analyze agent steps. Triggers on agent debugging, trace inspection, or execution analysis tasks.
This skill should be used when the user asks to "diagnose context problems", "fix lost-in-middle issues", "debug agent failures", "understand context poisoning", or mentions context degradation, attention patterns, context clash, context confusion, or agent performance degradation. A core context engineering skill — also activates when the user mentions "context engineering" or "context-engineering" in the context of diagnosing and mitigating context failures.
Instruments Python and TypeScript code with MLflow Tracing for observability. Triggers on questions about adding tracing, instrumenting agents/LLM apps, getting started with MLflow tracing, or tracing specific frameworks (LangGraph, LangChain, OpenAI, DSPy, CrewAI, AutoGen). Examples - "How do I add tracing?", "How to instrument my agent?", "How to trace my LangChain app?", "Getting started with MLflow tracing", "Trace my TypeScript app"
This skill should be used when inspecting, analyzing, or querying Claude Code session logs. Use when users ask about session history, want to find sessions, analyze context usage, extract tool call patterns, debug agent execution, or understand what happened in previous sessions. Essential for understanding Claude Code's ~/.claude/projects/ structure, JSONL session format, and the erk extraction pipeline.
Debugging toolkit for AI agents. Diagnose symptoms via memory cache -> behavior cache -> codebase search, trace data flow, git-bisect bad commits, and compare output directories.
Inspect Claude Code session logs, tool calls, token usage, subagents, and context window using claude-devtools visual UI