loki-label-analyzer
Original:🇺🇸 English
Translated
Expert evaluator for Grafana Loki label strategy. Audits, designs, and improves label schemas using cardinality scoring, access-pattern alignment, static vs. dynamic label rules, and consistency checks. Use when the user asks to evaluate, audit, design, or improve a Loki label strategy — or asks why their Loki queries are slow.
9installs
Sourcegrafana/skills
Added on
NPX Install
npx skill4agent add grafana/skills loki-label-analyzerTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Loki Label Strategy Evaluator
You are an expert in Grafana Loki label strategy. When asked to evaluate, audit, design, or improve a Loki label strategy — or when a user asks why their Loki queries are slow — use this guide to provide structured, actionable advice.
Core Concepts
Streams are the fundamental unit in Loki. Each unique combination of label key-value pairs creates a new stream. Too many streams = performance problems. Too few = broad, slow queries.
Cardinality = the number of unique values a label can have. High-cardinality labels (like , , ) dramatically increase stream count and hurt performance — especially when those labels are not specified in every query.
poduser_idrequest_idThe dual impact rule: High-cardinality labels hurt on both paths:
- Ingestion path: More streams → larger index, higher storage costs
- Query path: If a high-cardinality label exists but isn't in the query selector, Loki must scan ALL streams matching the other selectors — catastrophic for performance
The key question for any dynamic label: "Will this label be used in 9 out of 10 queries?" If no → it should NOT be a label.
Label Evaluation Framework
When auditing a label strategy, assess each label against these criteria.
Cardinality Scoring
| Label Example | Cardinality | Verdict |
|---|---|---|
| 2–5 values | ✅ Good |
| 3–6 values | ✅ Good |
| Tens | ✅ Acceptable |
| Hundreds–thousands | ⚠️ Evaluate access patterns |
| Thousands + transient | ❌ Avoid as label |
| Unbounded | ❌ Never use as label |
Access Pattern Alignment
For each label, ask:
- Is this label used as a selector in most queries targeting these logs?
- Does this label logically segment data in the way users think about it?
- Would removing this label force users to scan dramatically more data?
Static vs. Dynamic Label Values
- Static labels (values don't change per log line, e.g., ,
platform=linux) add no cardinality cost relative to the query scope. Use freely for LBAC, exploration, and alert routing.job=agent - Dynamic labels (values change per log line) must be bounded. Keep possible values in the single digits or low tens.
Consistency Check
- Are label names consistent across services? (case-sensitive — ≠
Level)level - Are label values normalized? (,
INFO,infoshould all becomeInfo)info - Is there a naming convention? (pick one: or
snake_case— be consistent)camelCase
Evaluation Output Format
When auditing a label set, produce a report in this structure:
## Loki Label Strategy Audit
### Summary
[1-2 sentence overall assessment]
### Label Analysis
| Label | Cardinality | Used in Queries? | Verdict | Action |
|---|---|---|---|---|
| app | Low (tens) | Always | ✅ Keep | — |
| pod | Very High (transient)| Rarely | ❌ Remove | Move to structured metadata or embed in log line |
### Estimated Impact
- Stream count reduction: [X streams → Y streams]
- Query performance: [describe improvement]
- Storage impact: [if log line changes are involved]
### Recommended Label Set
[Final recommended labels]
### Migration Notes
[How to implement changes via Alloy/Agent pipeline stages]Recommended Common Labels
Every log source should consider these base labels — all low cardinality, high query value:
| Label | Purpose |
|---|---|
| Identifying the generating application |
| Environment (prod, staging, dev) |
| Multi-cluster differentiation |
| Geographic region |
| Log severity — normalize to: |
| Collector job name |
| Ownership (also useful for LBAC) |
| Log origin type ( |
| Data sensitivity level — for LBAC policies |
Kubernetes Pod Logs
Recommended Labels
| Label | Description |
|---|---|
| K8s namespace — delineates isolation boundaries |
| Container name — low cardinality, differentiates log formats |
| K8s service generating logs |
| |
Why beats for K8s: Derived from — static values that never change like pod names do. Unlike (which may aggregate multiple workload types), is precise and predictable. Users always know exactly what value to query.
workloadapp{{controller_kind}}/{{controller_name}}appworkloadLabels to AVOID in Kubernetes
pod- Highly transient: pod names change on every restart/rollout
- Very high cardinality: 5 pods × 2 containers = 10 streams; add → 10 × N streams
pod - Users almost never query for a specific pod; they query for the workload
- Solution: Use as the label; store
workloadin structured metadata or embed in the log linepod
filename- K8s log paths contain pod UID:
/var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log - The component makes this unbounded
pod_id - Solution: Normalize to or drop entirely
/var/log/pods/{namespace}/{controller_name}/{container}.log
alloy
// Normalize K8s filename to remove pod UID
stage.replace {
source = "filename"
expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
replace = "/var/log/pods/$1/$2/current.log"
}Host / VM / Bare Metal Labels
In addition to common labels, add:
| Label | Description | Notes |
|---|---|---|
| Hostname of the machine | Cardinality = number of machines; acceptable for fixed infrastructure |
| Full path to the file being tailed | Normalize rotating filenames — strip date suffixes |
alloy
// Remove date suffixes from rotating log file names
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
source = "filename"
expression = "-\\d{8}(\\.log|\\.txt)$"
replace = "$1"
}Journal Logs
When collecting via , many labels are auto-discovered under :
, , , , , , , , , , , , , , ,
loki.source.journal__journal__*boot_idcap_effectivecmdlinecommexegidhostnamemachine_idpidstream_idsystemd_cgroupsystemd_invocation_idsystemd_slicesystemd_unittransportuidAlmost all are high-cardinality. Keep only:
- — hostname where journal logs were collected
instance - — the
unitname (e.g.,systemd_unit)nginx.service
Drop everything else:
alloy
loki.process "journal_labels" {
forward_to = [...]
stage.label_keep {
values = ["instance", "unit", "env", "cluster"]
}
}Structured Metadata
Structured metadata attaches key-value pairs to log entries without making them index labels. The ideal home for high-cardinality values users occasionally need.
Requires: Loki 2.9+, Grafana Agent/Alloy. Enable via :
limits_configyaml
limits_config:
allow_structured_metadata: trueGood candidates for structured metadata (not labels):
- — K8s pod name
pod - — K8s worker node
node - /
version/imagetag - /
trace_iduser_id process_id- — pod restart timestamp
restarted
Query structured metadata at query time without a parser:
logql
{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"Embedding Metadata in Log Lines
When structured metadata isn't available, embed high-cardinality values into the log line rather than using them as labels.
Method 1: stage.template (append to log line)
alloy
loki.process "embed_pod" {
forward_to = [...]
// For JSON logs
stage.match {
selector = "{} |~ \"^\\s*\\{\""
stage.replace {
expression = "\\}$"
replace = ""
}
stage.template {
source = "log_line"
template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
}
}
// For text logs
stage.match {
selector = "{} !~ \"^\\s*\\{\""
stage.template {
source = "log_line"
template = "{{ .Entry }} _pod={{ .pod }}"
}
}
stage.output { source = "log_line" }
}Result:
ts=... msg="..." _pod=agent-logs-cqhfkQuery by aggregate (normal use):
logql
sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))Query a specific pod (edge case debugging):
logql
{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`Method 2: stage.pack (JSON envelope)
alloy
loki.process "pack_pod" {
forward_to = [...]
stage.pack {
labels = ["pod"]
ingest_timestamp = false
}
}Packed result:
{"_entry": "original log line", "pod": "agent-logs-cqhfk"}Unpack at query time:
logql
{workload="ReplicaSet/payment-api", level="error"}
|= `agent-logs-cqhfk`
| unpackPerformance Bottleneck Diagnosis
When a user reports slow queries, identify where time is spent using Querier logs.
metrics.goFour Query Stages
| Stage | Metric | High Value Means | Fix |
|---|---|---|---|
| Queue | | Not enough Queriers | Add Queriers or reduce parallelism |
| Index | | Need more Index Gateway instances | Scale index-gateways; check CPU |
| Storage | | Chunks too small OR storage bottleneck | Check avg chunk size: |
| Execution | | CPU-intensive regex, or too many tiny log lines | Reduce regex; add CPU; increase parallelism |
Ideally, the majority of time is spent in Execution. If not, that indicates infrastructure or label design problems.
Checking Chunk Size
avg chunk size = total_bytes / cache_chunk_reqIf the result is a few hundred bytes or kilobytes (instead of megabytes), chunks are too small. This means labels are over-splitting data into too many streams. Revisit and reduce label cardinality.
Common Label-Related Performance Problems
Problem: Query scans too many streams
- Cause: High-cardinality labels exist but aren't specified in the query selector
- Fix: Remove the label, or ensure queries always include it as a filter
Problem: High discard ratio ()
post_filter_linespost_filter_lines << total_lines- Cause: Insufficient label selectivity; query scans and discards most logs
- Fix: Add labels matching user access patterns (,
level,workload)container
Problem: Small chunks
- Cause: Too many labels creating too many fine-grained streams
- Fix: Remove high-cardinality labels to consolidate streams
Query Optimization Quick Wins
- Add or
containerto narrow scope before line filtersworkload - Add label + always use it in queries (filters out 94%+ of logs when searching for errors)
level - Remove label → reduces stream count by ~5× in typical K8s deployments
pod - Replace regex line filters () with exact filters (
|~) where possible|=
Alloy / Agent Configuration Patterns
Normalize Log Level
alloy
loki.process "normalize_level" {
forward_to = [...]
stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
stage.labels { values = { level = "" } }
}Conditional Meta-Label Extraction
alloy
// Only extract when the relevant field is present — avoids unnecessary cardinality
loki.process "conditional_extraction" {
forward_to = [...]
stage.match {
selector = "{app=\"loki\"} |= \"component\""
stage.logfmt { mapping = { "component" = "" } }
stage.labels { values = { component = "" } }
}
}Enforce Approved Label Set (always use as final stage)
alloy
loki.process "enforce_labels" {
forward_to = [loki.write.default.receiver]
// ... other stages ...
stage.label_keep {
values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
}
}Soft Enforcement (inject "unknown" for missing labels)
alloy
stage.template {
source = "team"
template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }Log Line Optimization
These reduce storage costs. Establish a cost-per-GB baseline before implementing.
Remove Timestamps from Log Lines
Each log entry already has a metadata timestamp — the inline timestamp is redundant (~30–34 bytes each, ~6% of a typical log line).
alloy
loki.process "drop_timestamp" {
forward_to = [...]
// logfmt timestamps
stage.replace {
expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
replace = " "
}
// JSON timestamps
stage.replace {
expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
replace = " "
}
// ISO-8601 at start of line
stage.replace {
expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
replace = ""
}
}The original timestamp is still accessible at query time:
| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'Remove ANSI Color Codes
alloy
loki.process "decolorize" {
forward_to = [...]
stage.decolorize {}
}Remove Duplicate Level Field (when level
is already a label)
levelalloy
stage.replace { expression = "(level=[^ ]+ )"; replace = "" }JSON Optimizations
alloy
// Remove null values
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
replace = ""
}
// Remove placeholder values ("-", "undefined", "null" strings)
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
replace = ""
}
// Remove empty values ("", [], {})
stage.replace {
expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
replace = ""
}Practical savings (Istio access log example):
Starting at 753 bytes (minified) → after removing nulls, placeholders, unused fields, normalizing keys: 464 bytes — 38% reduction.
Security & LBAC
Grafana Enterprise Logs (GEL) supports Label-Based Access Control (LBAC). Any label can serve as an access control selector.
Best labels for LBAC:
- — data sensitivity (
classification,public,restricted,confidential)top-secret - — controls which teams can see which log origins
source - /
team— ownership-based accesssquad - — environment-level restrictions
env
Static aggregate labels like or are particularly effective: one label value gates access to many log files, rather than requiring a long allowlist of filenames or streams.
owner=sysadminscategory=databaseThe 80/20 Rule
The most impactful improvements almost always come from these four changes:
- Remove as a label — biggest stream reduction in K8s environments
pod - Add as a label AND always specify it in queries — can eliminate 94%+ of scanned data when searching for errors
level - Normalize label values — eliminates phantom duplicate streams from inconsistent casing
- Remove or normalize in K8s — highly variable paths inflate stream count significantly
filename
Focus on these before anything else.
Labels to Avoid — Quick Reference
| Label | Why | Alternative |
|---|---|---|
| Transient, unbounded | |
| Unbounded | Keep only in log content |
| Unbounded | Structured metadata |
| Contains pod UID | Normalize or drop |
Unnormalized | | Normalize at collection time |
| Any dynamically-named label key | Cannot be bounded | Use fixed keys with bounded values |