Loading...
Loading...
Found 277 Skills
Run post-deployment smoke checks with Makefile targets (`remote-status`, `remote-logs`) plus optional health URL checks. Use after deployment to verify runtime state before final acceptance.
Structured JSON logging with correlation IDs, request context propagation across async boundaries, performance timing decorators, and worker metrics collection.
Parses OpenTelemetry-formatted logs to reconstruct execution traces, extract errors with call chains, and provide AI-powered root cause analysis. Use when investigating errors, checking logs, debugging issues, viewing traces, or analyzing execution flow. Triggers on "check the logs", "analyze errors", "what's failing", "debug this issue", "show me the traces", or "investigate the error".
Instrument, trace, evaluate, and monitor LLM applications and AI agents with LangSmith. Use when setting up observability for LLM pipelines, running offline or online evaluations, managing prompts in the Prompt Hub, creating datasets for regression testing, or deploying agent servers. Triggers on: langsmith, langchain tracing, llm tracing, llm observability, llm evaluation, trace llm calls, @traceable, wrap_openai, langsmith evaluate, langsmith dataset, langsmith feedback, langsmith prompt hub, langsmith project, llm monitoring, llm debugging, llm quality, openevals, langsmith cli, langsmith experiment, annotate llm, llm judge.
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability. Masters log analysis, distributed tracing, Kubernetes debugging, performance optimization, and root cause analysis. Handles production outages, system reliability, and preventive monitoring. Use PROACTIVELY for debugging, incident response, or system troubleshooting.
Complete reference for the Portkey AI Gateway Python SDK with unified API access to 200+ LLMs, automatic fallbacks, caching, and full observability. Use when building Python applications that need LLM integration with production-grade reliability.
Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
Creates Elastic Cloud Serverless projects (Elasticsearch, Observability, or Security) via the REST API, saves credentials to file, and bootstraps a scoped Elasticsearch API key. Use when creating a new serverless project, provisioning a search or observability environment, or spinning up a new Elastic Cloud project.
Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability.
Use when building cloud-native apps. Keywords: kubernetes, k8s, docker, container, grpc, tonic, microservice, service mesh, observability, tracing, metrics, health check, cloud, deployment, 云原生, 微服务, 容器