Loading...
Loading...
Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse LLM tracing, and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.
npx skill4agent add yonatangross/orchestkit monitoring-observabilityrules/| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Infrastructure Monitoring | 3 | CRITICAL | Prometheus metrics, Grafana dashboards, alerting rules |
| LLM Observability | 3 | HIGH | Langfuse tracing, cost tracking, evaluation scoring |
| Drift Detection | 3 | HIGH | Statistical drift, quality regression, drift alerting |
| Silent Failures | 3 | HIGH | Tool skipping, quality degradation, loop/token spike alerting |
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram
http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])# Langfuse LLM tracing
from langfuse import observe, get_client
@observe()
async def analyze_content(content: str):
get_client().update_current_trace(
user_id="user_123", session_id="session_abc",
tags=["production", "orchestkit"],
)
return await llm.generate(content)# PSI drift detection
import numpy as np
psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
alert("Significant quality drift detected!")| Rule | File | Key Pattern |
|---|---|---|
| Prometheus Metrics | | RED method, counters, histograms, cardinality |
| Grafana Dashboards | | Golden Signals, SLO/SLI, health checks |
| Alerting Rules | | Severity levels, grouping, escalation, fatigue prevention |
| Rule | File | Key Pattern |
|---|---|---|
| Langfuse Traces | | @observe decorator, OTEL spans, agent graphs |
| Cost Tracking | | Token usage, spend alerts, Metrics API |
| Eval Scoring | | Custom scores, evaluator tracing, quality monitoring |
| Rule | File | Key Pattern |
|---|---|---|
| Statistical Drift | | PSI, KS test, KL divergence, EWMA |
| Quality Drift | | Score regression, baseline comparison, canary prompts |
| Drift Alerting | | Dynamic thresholds, correlation, anti-patterns |
| Rule | File | Key Pattern |
|---|---|---|
| Tool Skipping | | Expected vs actual tool calls, Langfuse traces |
| Quality Degradation | | Heuristics + LLM-as-judge, z-score baselines |
| Silent Alerting | | Loop detection, token spikes, escalation workflow |
| Decision | Recommendation | Rationale |
|---|---|---|
| Metric methodology | RED method (Rate, Errors, Duration) | Industry standard, covers essential service health |
| Log format | Structured JSON | Machine-parseable, supports log aggregation |
| Tracing | OpenTelemetry | Vendor-neutral, auto-instrumentation, broad ecosystem |
| LLM observability | Langfuse (not LangSmith) | Open-source, self-hosted, built-in prompt management |
| LLM tracing API | | OTEL-native, automatic span creation |
| Drift method | PSI for production, KS for small samples | PSI is stable for large datasets, KS more sensitive |
| Threshold strategy | Dynamic (95th percentile) over static | Reduces alert fatigue, context-aware |
| Alert severity | 4 levels (Critical, High, Medium, Low) | Clear escalation paths, appropriate response times |
| Resource | Description |
|---|---|
| references/ | Logging, metrics, tracing, Langfuse, drift analysis guides |
| checklists/ | Implementation checklists for monitoring and Langfuse setup |
| examples/ | Real-world monitoring dashboard and trace examples |
| scripts/ | Templates: Prometheus, OpenTelemetry, health checks, Langfuse |
defense-in-depthdevops-deploymentresilience-patternsllm-evaluationcaching