loki-label-analyzer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Loki Label Strategy Evaluator

Loki标签策略评估工具

You are an expert in Grafana Loki label strategy. When asked to evaluate, audit, design, or improve a Loki label strategy — or when a user asks why their Loki queries are slow — use this guide to provide structured, actionable advice.

你是Grafana Loki标签策略方面的专家。当用户请求评估、审计、设计或优化Loki标签策略,或者询问其Loki查询为何缓慢时,请使用本指南提供结构化、可落地的建议。

Core Concepts

核心概念

Streams are the fundamental unit in Loki. Each unique combination of label key-value pairs creates a new stream. Too many streams = performance problems. Too few = broad, slow queries.
Cardinality = the number of unique values a label can have. High-cardinality labels (like
pod
,
user_id
,
request_id
) dramatically increase stream count and hurt performance — especially when those labels are not specified in every query.
The dual impact rule: High-cardinality labels hurt on both paths:
  • Ingestion path: More streams → larger index, higher storage costs
  • Query path: If a high-cardinality label exists but isn't in the query selector, Loki must scan ALL streams matching the other selectors — catastrophic for performance
The key question for any dynamic label: "Will this label be used in 9 out of 10 queries?" If no → it should NOT be a label.

**流(Streams)**是Loki中的基本单元。每个唯一的标签键值对组合都会创建一个新流。流数量过多会导致性能问题,流数量过少则会导致查询范围过宽、速度变慢。
**基数(Cardinality)**指一个标签可拥有的唯一值数量。高基数标签(如
pod
user_id
request_id
)会大幅增加流数量并损害性能——尤其是当这些标签并未在每次查询中指定时。
双重影响规则:高基数标签会在两个环节造成负面影响:
  • 摄入环节:流数量越多→索引越大,存储成本越高
  • 查询环节:如果存在高基数标签但未在查询选择器中指定,Loki必须扫描所有匹配其他选择器的流——这对性能来说是灾难性的
动态标签的关键问题:“该标签会在10次查询中的9次被使用吗?”如果答案是否→则不应将其设为标签。

Label Evaluation Framework

标签评估框架

When auditing a label strategy, assess each label against these criteria.
审计标签策略时,请根据以下标准评估每个标签。

Cardinality Scoring

基数评分

Label ExampleCardinalityVerdict
env
(prod/staging/dev)
2–5 values✅ Good
level
(info/warn/error)
3–6 values✅ Good
namespace
(K8s)
Tens✅ Acceptable
instance
/
hostname
Hundreds–thousands⚠️ Evaluate access patterns
pod
Thousands + transient❌ Avoid as label
user_id
,
request_id
Unbounded❌ Never use as label
标签示例基数结论
env
(prod/staging/dev)
2–5个值✅ 良好
level
(info/warn/error)
3–6个值✅ 良好
namespace
(K8s)
数十个✅ 可接受
instance
/
hostname
数百到数千个⚠️ 评估访问模式
pod
数千个+临时值❌ 避免作为标签
user_id
,
request_id
无上限❌ 绝不要作为标签

Access Pattern Alignment

访问模式对齐

For each label, ask:
  • Is this label used as a selector in most queries targeting these logs?
  • Does this label logically segment data in the way users think about it?
  • Would removing this label force users to scan dramatically more data?
针对每个标签,需询问:
  • 该标签是否在大多数针对这些日志的查询中被用作选择器?
  • 该标签是否按照用户的思维逻辑对数据进行分段?
  • 删除该标签是否会迫使用户扫描大量额外数据?

Static vs. Dynamic Label Values

静态与动态标签值

  • Static labels (values don't change per log line, e.g.,
    platform=linux
    ,
    job=agent
    ) add no cardinality cost relative to the query scope. Use freely for LBAC, exploration, and alert routing.
  • Dynamic labels (values change per log line) must be bounded. Keep possible values in the single digits or low tens.
  • 静态标签(值不会随日志行变化,例如
    platform=linux
    job=agent
    )相对于查询范围不会增加基数成本。可自由用于LBAC、数据探索和告警路由。
  • 动态标签(值随日志行变化)必须有界。可能的取值应控制在个位数或低十位数。

Consistency Check

一致性检查

  • Are label names consistent across services? (case-sensitive —
    Level
    level
    )
  • Are label values normalized? (
    INFO
    ,
    info
    ,
    Info
    should all become
    info
    )
  • Is there a naming convention? (pick one:
    snake_case
    or
    camelCase
    — be consistent)

  • 跨服务的标签名称是否一致?(区分大小写——
    Level
    level
  • 标签值是否已标准化?(
    INFO
    info
    Info
    应统一为
    info
  • 是否有命名规范?(选择一种:
    snake_case
    camelCase
    ——保持一致)

Evaluation Output Format

评估输出格式

When auditing a label set, produce a report in this structure:
undefined
审计标签集时,请按照以下结构生成报告:
undefined

Loki Label Strategy Audit

Loki标签策略审计

Summary

摘要

[1-2 sentence overall assessment]
[1-2句话的整体评估]

Label Analysis

标签分析

LabelCardinalityUsed in Queries?VerdictAction
appLow (tens)Always✅ Keep
podVery High (transient)Rarely❌ RemoveMove to structured metadata or embed in log line
标签基数是否用于查询?结论操作
app低(数十个)总是✅ 保留
pod极高(临时值)很少❌ 删除移至结构化元数据或嵌入日志行

Estimated Impact

预估影响

  • Stream count reduction: [X streams → Y streams]
  • Query performance: [describe improvement]
  • Storage impact: [if log line changes are involved]
  • 流数量减少:[X个流 → Y个流]
  • 查询性能:[描述改进效果]
  • 存储影响:[如果涉及日志行修改]

Recommended Label Set

推荐标签集

[Final recommended labels]
[最终推荐的标签列表]

Migration Notes

迁移说明

[How to implement changes via Alloy/Agent pipeline stages]

---
[如何通过Alloy/Agent流水线阶段实现变更]

---

Recommended Common Labels

推荐通用标签

Every log source should consider these base labels — all low cardinality, high query value:
LabelPurpose
app
/
service
Identifying the generating application
env
Environment (prod, staging, dev)
cluster
Multi-cluster differentiation
region
Geographic region
level
Log severity — normalize to:
info
,
warn
,
error
,
debug
job
Collector job name
team
/
squad
Ownership (also useful for LBAC)
source
Log origin type (
file
,
k8s-events
,
journal
,
syslog
, etc.)
classification
Data sensitivity level — for LBAC policies

每个日志源都应考虑以下基础标签——均为低基数、高查询价值:
标签用途
app
/
service
标识生成日志的应用
env
环境(prod、staging、dev)
cluster
多集群区分
region
地理区域
level
日志级别——标准化为:
info
warn
error
debug
job
采集器任务名称
team
/
squad
归属(也适用于LBAC)
source
日志来源类型(
file
k8s-events
journal
syslog
等)
classification
数据敏感级别——用于LBAC策略

Kubernetes Pod Logs

Kubernetes Pod日志

Recommended Labels

推荐标签

LabelDescription
namespace
K8s namespace — delineates isolation boundaries
container
Container name — low cardinality, differentiates log formats
service
K8s service generating logs
workload
{controller_kind}/{controller_name}
e.g.
ReplicaSet/payment-api
strongly recommended
Why
workload
beats
app
for K8s
: Derived from
{{controller_kind}}/{{controller_name}}
— static values that never change like pod names do. Unlike
app
(which may aggregate multiple workload types),
workload
is precise and predictable. Users always know exactly what value to query.
标签描述
namespace
K8s命名空间——划分隔离边界
container
容器名称——低基数,区分日志格式
service
生成日志的K8s服务
workload
{controller_kind}/{controller_name}
例如
ReplicaSet/payment-api
—— 强烈推荐
为何K8s中
workload
优于
app
:由
{{controller_kind}}/{{controller_name}}
派生而来——是像Pod名称那样不会变化的静态值。与
app
(可能聚合多种工作负载类型)不同,
workload
精准且可预测。用户总能明确知道要查询的具体值。

Labels to AVOID in Kubernetes

Kubernetes中应避免的标签

pod
label
  • Highly transient: pod names change on every restart/rollout
  • Very high cardinality: 5 pods × 2 containers = 10 streams; add
    pod
    → 10 × N streams
  • Users almost never query for a specific pod; they query for the workload
  • Solution: Use
    workload
    as the label; store
    pod
    in structured metadata or embed in the log line
filename
label (raw K8s path)
  • K8s log paths contain pod UID:
    /var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log
  • The
    pod_id
    component makes this unbounded
  • Solution: Normalize to
    /var/log/pods/{namespace}/{controller_name}/{container}.log
    or drop entirely
alloy
// Normalize K8s filename to remove pod UID
stage.replace {
 source = "filename"
 expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
 replace = "/var/log/pods/$1/$2/current.log"
}

pod
标签
  • 高度临时:Pod名称在每次重启/滚动更新时都会变化
  • 基数极高:5个Pod × 2个容器 = 10个流;添加
    pod
    后→10 × N个流
  • 用户几乎从不查询特定Pod;他们查询的是工作负载
  • 解决方案:使用
    workload
    作为标签;将
    pod
    存储在结构化元数据或嵌入日志行中
filename
标签(原始K8s路径)
  • K8s日志路径包含Pod UID:
    /var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log
  • pod_id
    部分使其无上限
  • 解决方案:标准化为
    /var/log/pods/{namespace}/{controller_name}/{container}.log
    或完全丢弃
alloy
// 标准化K8s文件名以移除Pod UID
stage.replace {
 source = "filename"
 expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
 replace = "/var/log/pods/$1/$2/current.log"
}

Host / VM / Bare Metal Labels

主机/虚拟机/裸金属标签

In addition to common labels, add:
LabelDescriptionNotes
instance
Hostname of the machineCardinality = number of machines; acceptable for fixed infrastructure
filename
Full path to the file being tailedNormalize rotating filenames — strip date suffixes
alloy
// Remove date suffixes from rotating log file names
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
 source = "filename"
 expression = "-\\d{8}(\\.log|\\.txt)$"
 replace = "$1"
}

除通用标签外,添加:
标签描述说明
instance
机器主机名基数=机器数量;适用于固定基础设施
filename
被追踪文件的完整路径标准化滚动文件名——移除日期后缀
alloy
// 移除滚动日志文件名中的日期后缀
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
 source = "filename"
 expression = "-\\d{8}(\\.log|\\.txt)$"
 replace = "$1"
}

Journal Logs

Journal日志

When collecting via
loki.source.journal
, many labels are auto-discovered under
__journal__*
:
boot_id
,
cap_effective
,
cmdline
,
comm
,
exe
,
gid
,
hostname
,
machine_id
,
pid
,
stream_id
,
systemd_cgroup
,
systemd_invocation_id
,
systemd_slice
,
systemd_unit
,
transport
,
uid
Almost all are high-cardinality. Keep only:
  • instance
    — hostname where journal logs were collected
  • unit
    — the
    systemd_unit
    name (e.g.,
    nginx.service
    )
Drop everything else:
alloy
loki.process "journal_labels" {
 forward_to = [...]
 stage.label_keep {
 values = ["instance", "unit", "env", "cluster"]
 }
}

通过
loki.source.journal
采集时,许多标签会自动发现并以
__journal__*
为前缀:
boot_id
,
cap_effective
,
cmdline
,
comm
,
exe
,
gid
,
hostname
,
machine_id
,
pid
,
stream_id
,
systemd_cgroup
,
systemd_invocation_id
,
systemd_slice
,
systemd_unit
,
transport
,
uid
几乎所有这些标签都是高基数的。仅保留
  • instance
    —— 采集Journal日志的主机名
  • unit
    ——
    systemd_unit
    名称(例如
    nginx.service
丢弃所有其他标签:
alloy
loki.process "journal_labels" {
 forward_to = [...]
 stage.label_keep {
 values = ["instance", "unit", "env", "cluster"]
 }
}

Structured Metadata

结构化元数据

Structured metadata attaches key-value pairs to log entries without making them index labels. The ideal home for high-cardinality values users occasionally need.
Requires: Loki 2.9+, Grafana Agent/Alloy. Enable via
limits_config
:
yaml
limits_config:
 allow_structured_metadata: true
Good candidates for structured metadata (not labels):
  • pod
    — K8s pod name
  • node
    — K8s worker node
  • version
    /
    image
    /
    tag
  • trace_id
    /
    user_id
  • process_id
  • restarted
    — pod restart timestamp
Query structured metadata at query time without a parser:
logql
{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"

结构化元数据会将键值对附加到日志条目上,但不会将其设为索引标签。这是用户偶尔需要的高基数值的理想存储位置。
要求:Loki 2.9+,Grafana Agent/Alloy。通过
limits_config
启用:
yaml
limits_config:
 allow_structured_metadata: true
适合结构化元数据的候选项(而非标签):
  • pod
    —— K8s Pod名称
  • node
    —— K8s工作节点
  • version
    /
    image
    /
    tag
  • trace_id
    /
    user_id
  • process_id
  • restarted
    —— Pod重启时间戳
查询时无需解析器即可访问结构化元数据:
logql
{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"

Embedding Metadata in Log Lines

将元数据嵌入日志行

When structured metadata isn't available, embed high-cardinality values into the log line rather than using them as labels.
当无法使用结构化元数据时,将高基数值嵌入日志行而非用作标签。

Method 1: stage.template (append to log line)

方法1:stage.template(追加到日志行)

alloy
loki.process "embed_pod" {
 forward_to = [...]

 // For JSON logs
 stage.match {
 selector = "{} |~ \"^\\s*\\{\""
 stage.replace {
 expression = "\\}$"
 replace = ""
 }
 stage.template {
 source = "log_line"
 template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
 }
 }

 // For text logs
 stage.match {
 selector = "{} !~ \"^\\s*\\{\""
 stage.template {
 source = "log_line"
 template = "{{ .Entry }} _pod={{ .pod }}"
 }
 }

 stage.output { source = "log_line" }
}
Result:
ts=... msg="..." _pod=agent-logs-cqhfk
Query by aggregate (normal use):
logql
sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))
Query a specific pod (edge case debugging):
logql
{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`
alloy
loki.process "embed_pod" {
 forward_to = [...]

 // 针对JSON日志
 stage.match {
 selector = "{} |~ \"^\\s*\\{\""
 stage.replace {
 expression = "\\}$"
 replace = ""
 }
 stage.template {
 source = "log_line"
 template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
 }
 }

 // 针对文本日志
 stage.match {
 selector = "{} !~ \"^\\s*\\{\""
 stage.template {
 source = "log_line"
 template = "{{ .Entry }} _pod={{ .pod }}"
 }
 }

 stage.output { source = "log_line" }
}
结果:
ts=... msg="..." _pod=agent-logs-cqhfk
按聚合查询(常规使用):
logql
sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))
查询特定Pod(边缘场景调试):
logql
{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`

Method 2: stage.pack (JSON envelope)

方法2:stage.pack(JSON信封)

alloy
loki.process "pack_pod" {
 forward_to = [...]
 stage.pack {
 labels = ["pod"]
 ingest_timestamp = false
 }
}
Packed result:
{"_entry": "original log line", "pod": "agent-logs-cqhfk"}
Unpack at query time:
logql
{workload="ReplicaSet/payment-api", level="error"}
 |= `agent-logs-cqhfk`
 | unpack

alloy
loki.process "pack_pod" {
 forward_to = [...]
 stage.pack {
 labels = ["pod"]
 ingest_timestamp = false
 }
}
打包结果:
{"_entry": "original log line", "pod": "agent-logs-cqhfk"}
查询时解包:
logql
{workload="ReplicaSet/payment-api", level="error"}
 |= `agent-logs-cqhfk`
 | unpack

Performance Bottleneck Diagnosis

性能瓶颈诊断

When a user reports slow queries, identify where time is spent using Querier
metrics.go
logs.
当用户反馈查询缓慢时,使用Querier的
metrics.go
日志确定时间消耗位置。

Four Query Stages

四个查询阶段

StageMetricHigh Value MeansFix
Queue
queue_time
Not enough QueriersAdd Queriers or reduce parallelism
Index
chunk_refs_fetch_time
Need more Index Gateway instancesScale index-gateways; check CPU
Storage
store_chunks_download_time
Chunks too small OR storage bottleneckCheck avg chunk size:
total_bytes / cache_chunk_req
Execution
duration - chunk_refs_fetch_time - store_chunks_download_time
CPU-intensive regex, or too many tiny log linesReduce regex; add CPU; increase parallelism
Ideally, the majority of time is spent in Execution. If not, that indicates infrastructure or label design problems.
阶段指标数值高意味着修复方案
队列
queue_time
Querier数量不足添加Querier或降低并行度
索引
chunk_refs_fetch_time
需要更多Index Gateway实例扩容index-gateways;检查CPU
存储
store_chunks_download_time
Chunk过小或存储瓶颈检查平均Chunk大小:
total_bytes / cache_chunk_req
执行
duration - chunk_refs_fetch_time - store_chunks_download_time
CPU密集型正则表达式,或过多小日志行简化正则表达式;添加CPU;提高并行度
理想情况下,大部分时间应消耗在执行阶段。否则表明存在基础设施或标签设计问题。

Checking Chunk Size

检查Chunk大小

avg chunk size = total_bytes / cache_chunk_req
If the result is a few hundred bytes or kilobytes (instead of megabytes), chunks are too small. This means labels are over-splitting data into too many streams. Revisit and reduce label cardinality.
avg chunk size = total_bytes / cache_chunk_req
如果结果为几百字节或几千字节(而非兆字节),则Chunk过小。这意味着标签将数据过度拆分为过多流。重新审视并降低标签基数。

Common Label-Related Performance Problems

常见标签相关性能问题

Problem: Query scans too many streams
  • Cause: High-cardinality labels exist but aren't specified in the query selector
  • Fix: Remove the label, or ensure queries always include it as a filter
Problem: High
post_filter_lines
discard ratio
(
post_filter_lines << total_lines
)
  • Cause: Insufficient label selectivity; query scans and discards most logs
  • Fix: Add labels matching user access patterns (
    level
    ,
    workload
    ,
    container
    )
Problem: Small chunks
  • Cause: Too many labels creating too many fine-grained streams
  • Fix: Remove high-cardinality labels to consolidate streams
问题:查询扫描过多流
  • 原因:存在高基数标签但未在查询选择器中指定
  • 修复:移除该标签,或确保查询始终将其作为过滤器
问题:
post_filter_lines
丢弃率过高
post_filter_lines << total_lines
  • 原因:标签选择性不足;查询扫描并丢弃大部分日志
  • 修复:添加匹配用户访问模式的标签(
    level
    workload
    container
问题:Chunk过小
  • 原因:过多标签创建了过多细粒度流
  • 修复:移除高基数标签以合并流

Query Optimization Quick Wins

查询优化快速方案

  1. Add
    container
    or
    workload
    to narrow scope before line filters
  2. Add
    level
    label + always use it in queries (filters out 94%+ of logs when searching for errors)
  3. Remove
    pod
    label → reduces stream count by ~5× in typical K8s deployments
  4. Replace regex line filters (
    |~
    ) with exact filters (
    |=
    ) where possible

  1. 在行过滤器之前添加
    container
    workload
    以缩小范围
  2. 添加
    level
    标签并始终在查询中使用(搜索错误时可过滤掉94%以上的日志)
  3. 移除
    pod
    标签→在典型K8s部署中可将流数量减少约5倍
  4. 尽可能将正则行过滤器(
    |~
    )替换为精确过滤器(
    |=

Alloy / Agent Configuration Patterns

Alloy / Agent配置模式

Normalize Log Level

标准化日志级别

alloy
loki.process "normalize_level" {
 forward_to = [...]
 stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
 stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
 stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
 stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
 stage.labels { values = { level = "" } }
}
alloy
loki.process "normalize_level" {
 forward_to = [...]
 stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
 stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
 stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
 stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
 stage.labels { values = { level = "" } }
}

Conditional Meta-Label Extraction

条件元标签提取

alloy
// Only extract when the relevant field is present — avoids unnecessary cardinality
loki.process "conditional_extraction" {
 forward_to = [...]
 stage.match {
 selector = "{app=\"loki\"} |= \"component\""
 stage.logfmt { mapping = { "component" = "" } }
 stage.labels { values = { component = "" } }
 }
}
alloy
// 仅当相关字段存在时提取——避免不必要的基数
loki.process "conditional_extraction" {
 forward_to = [...]
 stage.match {
 selector = "{app=\"loki\"} |= \"component\""
 stage.logfmt { mapping = { "component" = "" } }
 stage.labels { values = { component = "" } }
 }
}

Enforce Approved Label Set (always use as final stage)

强制使用批准的标签集(始终作为最后阶段)

alloy
loki.process "enforce_labels" {
 forward_to = [loki.write.default.receiver]
 // ... other stages ...
 stage.label_keep {
 values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
 }
}
alloy
loki.process "enforce_labels" {
 forward_to = [loki.write.default.receiver]
 // ... 其他阶段 ...
 stage.label_keep {
 values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
 }
}

Soft Enforcement (inject "unknown" for missing labels)

软强制(为缺失标签注入"unknown")

alloy
stage.template {
 source = "team"
 template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }

alloy
stage.template {
 source = "team"
 template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }

Log Line Optimization

日志行优化

These reduce storage costs. Establish a cost-per-GB baseline before implementing.
这些操作可降低存储成本。实施前先建立每GB成本基准。

Remove Timestamps from Log Lines

移除日志行中的时间戳

Each log entry already has a metadata timestamp — the inline timestamp is redundant (~30–34 bytes each, ~6% of a typical log line).
alloy
loki.process "drop_timestamp" {
 forward_to = [...]
 // logfmt timestamps
 stage.replace {
 expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
 replace = " "
 }
 // JSON timestamps
 stage.replace {
 expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
 replace = " "
 }
 // ISO-8601 at start of line
 stage.replace {
 expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
 replace = ""
 }
}
The original timestamp is still accessible at query time:
| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'
每个日志条目已包含元数据时间戳——行内时间戳是冗余的(每个约30-34字节,约占典型日志行的6%)。
alloy
loki.process "drop_timestamp" {
 forward_to = [...]
 // logfmt时间戳
 stage.replace {
 expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
 replace = " "
 }
 // JSON时间戳
 stage.replace {
 expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
 replace = " "
 }
 // 行首的ISO-8601格式时间戳
 stage.replace {
 expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
 replace = ""
 }
}
原始时间戳仍可在查询时访问:
| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'

Remove ANSI Color Codes

移除ANSI颜色代码

alloy
loki.process "decolorize" {
 forward_to = [...]
 stage.decolorize {}
}
alloy
loki.process "decolorize" {
 forward_to = [...]
 stage.decolorize {}
}

Remove Duplicate Level Field (when
level
is already a label)

移除重复的Level字段(当
level
已作为标签时)

alloy
stage.replace { expression = "(level=[^ ]+ )"; replace = "" }
alloy
stage.replace { expression = "(level=[^ ]+ )"; replace = "" }

JSON Optimizations

JSON优化

alloy
// Remove null values
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
 replace = ""
}

// Remove placeholder values ("-", "undefined", "null" strings)
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
 replace = ""
}

// Remove empty values ("", [], {})
stage.replace {
 expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
 replace = ""
}
Practical savings (Istio access log example): Starting at 753 bytes (minified) → after removing nulls, placeholders, unused fields, normalizing keys: 464 bytes — 38% reduction.

alloy
// 移除空值
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
 replace = ""
}

// 移除占位符值("-"、"undefined"、"null"字符串)
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
 replace = ""
}

// 移除空值(""、[]、{})
stage.replace {
 expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
 replace = ""
}
实际节省效果(Istio访问日志示例): 从753字节(压缩后)→移除空值、占位符、未使用字段、标准化键后:464字节——减少38%

Security & LBAC

安全与LBAC

Grafana Enterprise Logs (GEL) supports Label-Based Access Control (LBAC). Any label can serve as an access control selector.
Best labels for LBAC:
  • classification
    — data sensitivity (
    public
    ,
    restricted
    ,
    confidential
    ,
    top-secret
    )
  • source
    — controls which teams can see which log origins
  • team
    /
    squad
    — ownership-based access
  • env
    — environment-level restrictions
Static aggregate labels like
owner=sysadmins
or
category=database
are particularly effective: one label value gates access to many log files, rather than requiring a long allowlist of filenames or streams.

Grafana Enterprise Logs(GEL)支持基于标签的访问控制(LBAC)。任何标签都可作为访问控制选择器。
LBAC最佳标签
  • classification
    —— 数据敏感度(
    public
    restricted
    confidential
    top-secret
  • source
    —— 控制哪些团队可查看哪些日志来源
  • team
    /
    squad
    —— 基于归属的访问
  • env
    —— 环境级限制
静态聚合标签如
owner=sysadmins
category=database
尤其有效:一个标签值即可控制对多个日志文件的访问,无需冗长的文件名或流允许列表。

The 80/20 Rule

二八法则

The most impactful improvements almost always come from these four changes:
  1. Remove
    pod
    as a label
    — biggest stream reduction in K8s environments
  2. Add
    level
    as a label AND always specify it in queries
    — can eliminate 94%+ of scanned data when searching for errors
  3. Normalize label values — eliminates phantom duplicate streams from inconsistent casing
  4. Remove or normalize
    filename
    in K8s — highly variable paths inflate stream count significantly
Focus on these before anything else.

最具影响力的改进几乎总是来自以下四项变更:
  1. 移除
    pod
    作为标签
    ——K8s环境中流数量减少最显著的操作
  2. 添加
    level
    作为标签并始终在查询中指定
    ——搜索错误时可消除94%以上的扫描数据
  3. 标准化标签值——消除因大小写不一致导致的幻影重复流
  4. 移除或标准化K8s中的
    filename
    ——高度可变的路径会大幅增加流数量
在处理其他事项之前先专注于这些操作。

Labels to Avoid — Quick Reference

应避免的标签——快速参考

LabelWhyAlternative
pod
Transient, unbounded
workload
label +
pod
in structured metadata
user_id
UnboundedKeep only in log content
request_id
/
trace_id
UnboundedStructured metadata
filename
(raw K8s path)
Contains pod UIDNormalize or drop
Unnormalized
level
INFO
/
info
/
Info
= 3 streams
Normalize at collection time
Any dynamically-named label keyCannot be boundedUse fixed keys with bounded values
标签原因替代方案
pod
临时、无上限
workload
标签 +
pod
存入结构化元数据
user_id
无上限仅保留在日志内容中
request_id
/
trace_id
无上限结构化元数据
filename
(原始K8s路径)
包含Pod UID标准化或丢弃
未标准化的
level
INFO
/
info
/
Info
= 3个流
在采集时标准化
任何动态命名的标签键无法限制使用固定键和有界值