loki-label-analyzer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Loki Label Strategy Evaluator

Loki标签策略评估工具

You are an expert in Grafana Loki label strategy. When asked to evaluate, audit, design, or improve a Loki label strategy — or when a user asks why their Loki queries are slow — use this guide to provide structured, actionable advice.

你是Grafana Loki标签策略方面的专家。当用户请求评估、审计、设计或优化Loki标签策略，或者询问其Loki查询为何缓慢时，请使用本指南提供结构化、可落地的建议。

Core Concepts

核心概念

Streams are the fundamental unit in Loki. Each unique combination of label key-value pairs creates a new stream. Too many streams = performance problems. Too few = broad, slow queries.

Cardinality = the number of unique values a label can have. High-cardinality labels (like

pod

user_id

request_id

) dramatically increase stream count and hurt performance — especially when those labels are not specified in every query.

The dual impact rule: High-cardinality labels hurt on both paths:

Ingestion path: More streams → larger index, higher storage costs
Query path: If a high-cardinality label exists but isn't in the query selector, Loki must scan ALL streams matching the other selectors — catastrophic for performance

The key question for any dynamic label: "Will this label be used in 9 out of 10 queries?" If no → it should NOT be a label.

**流（Streams）**是Loki中的基本单元。每个唯一的标签键值对组合都会创建一个新流。流数量过多会导致性能问题，流数量过少则会导致查询范围过宽、速度变慢。

**基数（Cardinality）**指一个标签可拥有的唯一值数量。高基数标签（如

pod

、

user_id

、

request_id

）会大幅增加流数量并损害性能——尤其是当这些标签并未在每次查询中指定时。

双重影响规则：高基数标签会在两个环节造成负面影响：

摄入环节：流数量越多→索引越大，存储成本越高
查询环节：如果存在高基数标签但未在查询选择器中指定，Loki必须扫描所有匹配其他选择器的流——这对性能来说是灾难性的

动态标签的关键问题：“该标签会在10次查询中的9次被使用吗？”如果答案是否→则不应将其设为标签。

Label Evaluation Framework

标签评估框架

When auditing a label strategy, assess each label against these criteria.

审计标签策略时，请根据以下标准评估每个标签。

Cardinality Scoring

基数评分

Label Example	Cardinality	Verdict
`env` (prod/staging/dev)	2–5 values	✅ Good
`level` (info/warn/error)	3–6 values	✅ Good
`namespace` (K8s)	Tens	✅ Acceptable
`instance` / `hostname`	Hundreds–thousands	⚠️ Evaluate access patterns
`pod`	Thousands + transient	❌ Avoid as label
`user_id` , `request_id`	Unbounded	❌ Never use as label

标签示例	基数	结论
`env` （prod/staging/dev）	2–5个值	✅ 良好
`level` （info/warn/error）	3–6个值	✅ 良好
`namespace` （K8s）	数十个	✅ 可接受
`instance` / `hostname`	数百到数千个	⚠️ 评估访问模式
`pod`	数千个+临时值	❌ 避免作为标签
`user_id` , `request_id`	无上限	❌ 绝不要作为标签

Access Pattern Alignment

访问模式对齐

For each label, ask:

Is this label used as a selector in most queries targeting these logs?
Does this label logically segment data in the way users think about it?
Would removing this label force users to scan dramatically more data?

针对每个标签，需询问：

该标签是否在大多数针对这些日志的查询中被用作选择器？
该标签是否按照用户的思维逻辑对数据进行分段？
删除该标签是否会迫使用户扫描大量额外数据？

Static vs. Dynamic Label Values

静态与动态标签值

Static labels (values don't change per log line, e.g.,
```
platform=linux
```
,
```
job=agent
```
) add no cardinality cost relative to the query scope. Use freely for LBAC, exploration, and alert routing.
Dynamic labels (values change per log line) must be bounded. Keep possible values in the single digits or low tens.

静态标签（值不会随日志行变化，例如
```
platform=linux
```
、
```
job=agent
```
）相对于查询范围不会增加基数成本。可自由用于LBAC、数据探索和告警路由。
动态标签（值随日志行变化）必须有界。可能的取值应控制在个位数或低十位数。

Consistency Check

一致性检查

Are label names consistent across services? (case-sensitive —
```
Level
```
≠
```
level
```
)
Are label values normalized? (
```
INFO
```
,
```
info
```
,
```
Info
```
should all become
```
info
```
)
Is there a naming convention? (pick one:
```
snake_case
```
or
```
camelCase
```
— be consistent)

跨服务的标签名称是否一致？（区分大小写——
```
Level
```
≠
```
level
```
）
标签值是否已标准化？（
```
INFO
```
、
```
info
```
、
```
Info
```
应统一为
```
info
```
）
是否有命名规范？（选择一种：
```
snake_case
```
或
```
camelCase
```
——保持一致）

Evaluation Output Format

评估输出格式

When auditing a label set, produce a report in this structure:

undefined

审计标签集时，请按照以下结构生成报告：

undefined

Loki Label Strategy Audit

Loki标签策略审计

Summary

摘要

[1-2 sentence overall assessment]

[1-2句话的整体评估]

Label Analysis

标签分析

Label	Cardinality	Used in Queries?	Verdict	Action
app	Low (tens)	Always	✅ Keep	—
pod	Very High (transient)	Rarely	❌ Remove	Move to structured metadata or embed in log line

标签	基数	是否用于查询？	结论	操作
app	低（数十个）	总是	✅ 保留	—
pod	极高（临时值）	很少	❌ 删除	移至结构化元数据或嵌入日志行

Estimated Impact

预估影响

Stream count reduction: [X streams → Y streams]
Query performance: [describe improvement]
Storage impact: [if log line changes are involved]

流数量减少：[X个流 → Y个流]
查询性能：[描述改进效果]
存储影响：[如果涉及日志行修改]

Recommended Label Set

Migration Notes

迁移说明

[How to implement changes via Alloy/Agent pipeline stages]

---

[如何通过Alloy/Agent流水线阶段实现变更]

---

Recommended Common Labels

Label	Purpose
`app` / `service`	Identifying the generating application
`env`	Environment (prod, staging, dev)
`cluster`	Multi-cluster differentiation
`region`	Geographic region
`level`	Log severity — normalize to: `info` , `warn` , `error` , `debug`
`job`	Collector job name
`team` / `squad`	Ownership (also useful for LBAC)
`source`	Log origin type ( `file` , `k8s-events` , `journal` , `syslog` , etc.)
`classification`	Data sensitivity level — for LBAC policies

标签	用途
`app` / `service`	标识生成日志的应用
`env`	环境（prod、staging、dev）
`cluster`	多集群区分
`region`	地理区域
`level`	日志级别——标准化为： `info` 、 `warn` 、 `error` 、 `debug`
`job`	采集器任务名称
`team` / `squad`	归属（也适用于LBAC）
`source`	日志来源类型（ `file` 、 `k8s-events` 、 `journal` 、 `syslog` 等）
`classification`	数据敏感级别——用于LBAC策略

Kubernetes Pod Logs

Kubernetes Pod日志

Recommended Labels

Label	Description
`namespace`	K8s namespace — delineates isolation boundaries
`container`	Container name — low cardinality, differentiates log formats
`service`	K8s service generating logs
`workload`	`{controller_kind}/{controller_name}` e.g. `ReplicaSet/payment-api` — strongly recommended

标签	描述
`namespace`	K8s命名空间——划分隔离边界
`container`	容器名称——低基数，区分日志格式
`service`	生成日志的K8s服务
`workload`	`{controller_kind}/{controller_name}` 例如 `ReplicaSet/payment-api` —— 强烈推荐

Labels to AVOID in Kubernetes

Kubernetes中应避免的标签

pod
label ❌

Highly transient: pod names change on every restart/rollout
Very high cardinality: 5 pods × 2 containers = 10 streams; add
```
pod
```
→ 10 × N streams
Users almost never query for a specific pod; they query for the workload
Solution: Use
```
workload
```
as the label; store
```
pod
```
in structured metadata or embed in the log line

filename
label (raw K8s path) ❌

K8s log paths contain pod UID:

/var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log

The
```
pod_id
```
component makes this unbounded

Solution: Normalize to

/var/log/pods/{namespace}/{controller_name}/{container}.log

or drop entirely

alloy

// Normalize K8s filename to remove pod UID
stage.replace {
 source = "filename"
 expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
 replace = "/var/log/pods/$1/$2/current.log"
}

pod
标签 ❌

高度临时：Pod名称在每次重启/滚动更新时都会变化
基数极高：5个Pod × 2个容器 = 10个流；添加
```
pod
```
后→10 × N个流
用户几乎从不查询特定Pod；他们查询的是工作负载
解决方案：使用
```
workload
```
作为标签；将
```
pod
```
存储在结构化元数据或嵌入日志行中

filename
标签（原始K8s路径） ❌

K8s日志路径包含Pod UID：

/var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log

```
pod_id
```
部分使其无上限

解决方案：标准化为

/var/log/pods/{namespace}/{controller_name}/{container}.log

或完全丢弃

alloy

// 标准化K8s文件名以移除Pod UID
stage.replace {
 source = "filename"
 expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
 replace = "/var/log/pods/$1/$2/current.log"
}

Host / VM / Bare Metal Labels

主机/虚拟机/裸金属标签

In addition to common labels, add:

Label	Description	Notes
`instance`	Hostname of the machine	Cardinality = number of machines; acceptable for fixed infrastructure
`filename`	Full path to the file being tailed	Normalize rotating filenames — strip date suffixes

alloy

// Remove date suffixes from rotating log file names
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
 source = "filename"
 expression = "-\\d{8}(\\.log|\\.txt)$"
 replace = "$1"
}

除通用标签外，添加：

标签	描述	说明
`instance`	机器主机名	基数=机器数量；适用于固定基础设施
`filename`	被追踪文件的完整路径	标准化滚动文件名——移除日期后缀

alloy

// 移除滚动日志文件名中的日期后缀
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
 source = "filename"
 expression = "-\\d{8}(\\.log|\\.txt)$"
 replace = "$1"
}

Journal Logs

Journal日志

When collecting via

loki.source.journal

, many labels are auto-discovered under

__journal__*

boot_id

cap_effective

cmdline

comm

exe

gid

hostname

machine_id

pid

stream_id

systemd_cgroup

systemd_invocation_id

systemd_slice

systemd_unit

transport

uid

Almost all are high-cardinality. Keep only:

```
instance
```
— hostname where journal logs were collected
```
unit
```
— the
```
systemd_unit
```
name (e.g.,
```
nginx.service
```
)

Drop everything else:

alloy

loki.process "journal_labels" {
 forward_to = [...]
 stage.label_keep {
 values = ["instance", "unit", "env", "cluster"]
 }
}

通过

loki.source.journal

采集时，许多标签会自动发现并以

__journal__*

为前缀：

boot_id

cap_effective

cmdline

comm

exe

gid

hostname

machine_id

pid

stream_id

systemd_cgroup

systemd_invocation_id

systemd_slice

systemd_unit

transport

uid

几乎所有这些标签都是高基数的。仅保留：

```
instance
```
—— 采集Journal日志的主机名
```
unit
```
——
```
systemd_unit
```
名称（例如
```
nginx.service
```
）

丢弃所有其他标签：

alloy

loki.process "journal_labels" {
 forward_to = [...]
 stage.label_keep {
 values = ["instance", "unit", "env", "cluster"]
 }
}

Structured Metadata

结构化元数据

Structured metadata attaches key-value pairs to log entries without making them index labels. The ideal home for high-cardinality values users occasionally need.

Requires: Loki 2.9+, Grafana Agent/Alloy. Enable via

limits_config

yaml

limits_config:
 allow_structured_metadata: true

Good candidates for structured metadata (not labels):

```
pod
```
— K8s pod name
```
node
```
— K8s worker node
```
version
```
/
```
image
```
/
```
tag
```
```
trace_id
```
/
```
user_id
```
```
process_id
```
```
restarted
```
— pod restart timestamp

Query structured metadata at query time without a parser:

logql

{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"

结构化元数据会将键值对附加到日志条目上，但不会将其设为索引标签。这是用户偶尔需要的高基数值的理想存储位置。

要求：Loki 2.9+，Grafana Agent/Alloy。通过

limits_config

启用：

yaml

limits_config:
 allow_structured_metadata: true

适合结构化元数据的候选项（而非标签）：

```
pod
```
—— K8s Pod名称
```
node
```
—— K8s工作节点
```
version
```
/
```
image
```
/
```
tag
```
```
trace_id
```
/
```
user_id
```
```
process_id
```
```
restarted
```
—— Pod重启时间戳

查询时无需解析器即可访问结构化元数据：

logql

{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"

Embedding Metadata in Log Lines

将元数据嵌入日志行

When structured metadata isn't available, embed high-cardinality values into the log line rather than using them as labels.

当无法使用结构化元数据时，将高基数值嵌入日志行而非用作标签。

Method 1: stage.template (append to log line)

方法1：stage.template（追加到日志行）

alloy

loki.process "embed_pod" {
 forward_to = [...]

 // For JSON logs
 stage.match {
 selector = "{} |~ \"^\\s*\\{\""
 stage.replace {
 expression = "\\}$"
 replace = ""
 }
 stage.template {
 source = "log_line"
 template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
 }
 }

 // For text logs
 stage.match {
 selector = "{} !~ \"^\\s*\\{\""
 stage.template {
 source = "log_line"
 template = "{{ .Entry }} _pod={{ .pod }}"
 }
 }

 stage.output { source = "log_line" }
}

Result:

ts=... msg="..." _pod=agent-logs-cqhfk

Query by aggregate (normal use):

logql

sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))

Query a specific pod (edge case debugging):

logql

{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`

alloy

loki.process "embed_pod" {
 forward_to = [...]

 // 针对JSON日志
 stage.match {
 selector = "{} |~ \"^\\s*\\{\""
 stage.replace {
 expression = "\\}$"
 replace = ""
 }
 stage.template {
 source = "log_line"
 template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
 }
 }

 // 针对文本日志
 stage.match {
 selector = "{} !~ \"^\\s*\\{\""
 stage.template {
 source = "log_line"
 template = "{{ .Entry }} _pod={{ .pod }}"
 }
 }

 stage.output { source = "log_line" }
}

结果：

ts=... msg="..." _pod=agent-logs-cqhfk

按聚合查询（常规使用）：

logql

sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))

查询特定Pod（边缘场景调试）：

logql

{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`

Method 2: stage.pack (JSON envelope)

方法2：stage.pack（JSON信封）

alloy

loki.process "pack_pod" {
 forward_to = [...]
 stage.pack {
 labels = ["pod"]
 ingest_timestamp = false
 }
}

Packed result:

{"_entry": "original log line", "pod": "agent-logs-cqhfk"}

Unpack at query time:

logql

{workload="ReplicaSet/payment-api", level="error"}
 |= `agent-logs-cqhfk`
 | unpack

alloy

loki.process "pack_pod" {
 forward_to = [...]
 stage.pack {
 labels = ["pod"]
 ingest_timestamp = false
 }
}

打包结果：

{"_entry": "original log line", "pod": "agent-logs-cqhfk"}

查询时解包：

logql

{workload="ReplicaSet/payment-api", level="error"}
 |= `agent-logs-cqhfk`
 | unpack

Performance Bottleneck Diagnosis

性能瓶颈诊断

When a user reports slow queries, identify where time is spent using Querier

metrics.go

logs.

当用户反馈查询缓慢时，使用Querier的

metrics.go

日志确定时间消耗位置。

Four Query Stages

四个查询阶段

Stage	Metric	High Value Means	Fix
Queue	`queue_time`	Not enough Queriers	Add Queriers or reduce parallelism
Index	`chunk_refs_fetch_time`	Need more Index Gateway instances	Scale index-gateways; check CPU
Storage	`store_chunks_download_time`	Chunks too small OR storage bottleneck	Check avg chunk size: `total_bytes / cache_chunk_req`
Execution	`duration - chunk_refs_fetch_time - store_chunks_download_time`	CPU-intensive regex, or too many tiny log lines	Reduce regex; add CPU; increase parallelism

Ideally, the majority of time is spent in Execution. If not, that indicates infrastructure or label design problems.

阶段	指标	数值高意味着	修复方案
队列	`queue_time`	Querier数量不足	添加Querier或降低并行度
索引	`chunk_refs_fetch_time`	需要更多Index Gateway实例	扩容index-gateways；检查CPU
存储	`store_chunks_download_time`	Chunk过小或存储瓶颈	检查平均Chunk大小： `total_bytes / cache_chunk_req`
执行	`duration - chunk_refs_fetch_time - store_chunks_download_time`	CPU密集型正则表达式，或过多小日志行	简化正则表达式；添加CPU；提高并行度

理想情况下，大部分时间应消耗在执行阶段。否则表明存在基础设施或标签设计问题。

Checking Chunk Size

检查Chunk大小

avg chunk size = total_bytes / cache_chunk_req

If the result is a few hundred bytes or kilobytes (instead of megabytes), chunks are too small. This means labels are over-splitting data into too many streams. Revisit and reduce label cardinality.

avg chunk size = total_bytes / cache_chunk_req

如果结果为几百字节或几千字节（而非兆字节），则Chunk过小。这意味着标签将数据过度拆分为过多流。重新审视并降低标签基数。

Common Label-Related Performance Problems

常见标签相关性能问题

Problem: Query scans too many streams

Cause: High-cardinality labels exist but aren't specified in the query selector
Fix: Remove the label, or ensure queries always include it as a filter

Problem: High
post_filter_lines
discard ratio (

post_filter_lines << total_lines

)

Cause: Insufficient label selectivity; query scans and discards most logs
Fix: Add labels matching user access patterns (
```
level
```
,
```
workload
```
,
```
container
```
)

Problem: Small chunks

Cause: Too many labels creating too many fine-grained streams
Fix: Remove high-cardinality labels to consolidate streams

问题：查询扫描过多流

原因：存在高基数标签但未在查询选择器中指定
修复：移除该标签，或确保查询始终将其作为过滤器

问题：
post_filter_lines
丢弃率过高（

post_filter_lines << total_lines

）

原因：标签选择性不足；查询扫描并丢弃大部分日志
修复：添加匹配用户访问模式的标签（
```
level
```
、
```
workload
```
、
```
container
```
）

问题：Chunk过小

原因：过多标签创建了过多细粒度流
修复：移除高基数标签以合并流

Query Optimization Quick Wins

查询优化快速方案

Add
```
container
```
or
```
workload
```
to narrow scope before line filters
Add
```
level
```
label + always use it in queries (filters out 94%+ of logs when searching for errors)
Remove
```
pod
```
label → reduces stream count by ~5× in typical K8s deployments
Replace regex line filters (
```
|~
```
) with exact filters (
```
|=
```
) where possible

在行过滤器之前添加
```
container
```
或
```
workload
```
以缩小范围
添加
```
level
```
标签并始终在查询中使用（搜索错误时可过滤掉94%以上的日志）
移除
```
pod
```
标签→在典型K8s部署中可将流数量减少约5倍
尽可能将正则行过滤器（
```
|~
```
）替换为精确过滤器（
```
|=
```
）

Alloy / Agent Configuration Patterns

Alloy / Agent配置模式

Normalize Log Level

标准化日志级别

alloy

loki.process "normalize_level" {
 forward_to = [...]
 stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
 stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
 stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
 stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
 stage.labels { values = { level = "" } }
}

alloy

loki.process "normalize_level" {
 forward_to = [...]
 stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
 stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
 stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
 stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
 stage.labels { values = { level = "" } }
}

Conditional Meta-Label Extraction

条件元标签提取

alloy

// Only extract when the relevant field is present — avoids unnecessary cardinality
loki.process "conditional_extraction" {
 forward_to = [...]
 stage.match {
 selector = "{app=\"loki\"} |= \"component\""
 stage.logfmt { mapping = { "component" = "" } }
 stage.labels { values = { component = "" } }
 }
}

alloy

// 仅当相关字段存在时提取——避免不必要的基数
loki.process "conditional_extraction" {
 forward_to = [...]
 stage.match {
 selector = "{app=\"loki\"} |= \"component\""
 stage.logfmt { mapping = { "component" = "" } }
 stage.labels { values = { component = "" } }
 }
}

Enforce Approved Label Set (always use as final stage)

强制使用批准的标签集（始终作为最后阶段）

alloy

loki.process "enforce_labels" {
 forward_to = [loki.write.default.receiver]
 // ... other stages ...
 stage.label_keep {
 values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
 }
}

alloy

loki.process "enforce_labels" {
 forward_to = [loki.write.default.receiver]
 // ... 其他阶段 ...
 stage.label_keep {
 values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
 }
}

Soft Enforcement (inject "unknown" for missing labels)

软强制（为缺失标签注入"unknown"）

alloy

stage.template {
 source = "team"
 template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }

alloy

stage.template {
 source = "team"
 template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }

Log Line Optimization

日志行优化

These reduce storage costs. Establish a cost-per-GB baseline before implementing.

这些操作可降低存储成本。实施前先建立每GB成本基准。

Remove Timestamps from Log Lines

移除日志行中的时间戳

Each log entry already has a metadata timestamp — the inline timestamp is redundant (~30–34 bytes each, ~6% of a typical log line).

alloy

loki.process "drop_timestamp" {
 forward_to = [...]
 // logfmt timestamps
 stage.replace {
 expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
 replace = " "
 }
 // JSON timestamps
 stage.replace {
 expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
 replace = " "
 }
 // ISO-8601 at start of line
 stage.replace {
 expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
 replace = ""
 }
}

The original timestamp is still accessible at query time:

| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'

每个日志条目已包含元数据时间戳——行内时间戳是冗余的（每个约30-34字节，约占典型日志行的6%）。

alloy

loki.process "drop_timestamp" {
 forward_to = [...]
 // logfmt时间戳
 stage.replace {
 expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
 replace = " "
 }
 // JSON时间戳
 stage.replace {
 expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
 replace = " "
 }
 // 行首的ISO-8601格式时间戳
 stage.replace {
 expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
 replace = ""
 }
}

原始时间戳仍可在查询时访问：

| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'

Remove ANSI Color Codes

移除ANSI颜色代码

alloy

loki.process "decolorize" {
 forward_to = [...]
 stage.decolorize {}
}

alloy

loki.process "decolorize" {
 forward_to = [...]
 stage.decolorize {}
}

Remove Duplicate Level Field (when

level

is already a label)

移除重复的Level字段（当

level

已作为标签时）

alloy

stage.replace { expression = "(level=[^ ]+ )"; replace = "" }

alloy

stage.replace { expression = "(level=[^ ]+ )"; replace = "" }

JSON Optimizations

JSON优化

alloy

// Remove null values
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
 replace = ""
}

// Remove placeholder values ("-", "undefined", "null" strings)
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
 replace = ""
}

// Remove empty values ("", [], {})
stage.replace {
 expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
 replace = ""
}

Practical savings (Istio access log example): Starting at 753 bytes (minified) → after removing nulls, placeholders, unused fields, normalizing keys: 464 bytes — 38% reduction.

alloy

// 移除空值
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
 replace = ""
}

// 移除占位符值（"-"、"undefined"、"null"字符串）
stage.replace {
 expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
 replace = ""
}

// 移除空值（""、[]、{}）
stage.replace {
 expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
 replace = ""
}

实际节省效果（Istio访问日志示例）：从753字节（压缩后）→移除空值、占位符、未使用字段、标准化键后：464字节——减少38%

Security & LBAC

安全与LBAC

Grafana Enterprise Logs (GEL) supports Label-Based Access Control (LBAC). Any label can serve as an access control selector.

Best labels for LBAC:

classification

— data sensitivity (

public

restricted

confidential

top-secret

)

```
source
```
— controls which teams can see which log origins
```
team
```
/
```
squad
```
— ownership-based access
```
env
```
— environment-level restrictions

Static aggregate labels like

owner=sysadmins

category=database

are particularly effective: one label value gates access to many log files, rather than requiring a long allowlist of filenames or streams.

Grafana Enterprise Logs（GEL）支持基于标签的访问控制（LBAC）。任何标签都可作为访问控制选择器。

LBAC最佳标签：

classification

—— 数据敏感度（

public

、

restricted

、

confidential

、

top-secret

）

```
source
```
—— 控制哪些团队可查看哪些日志来源
```
team
```
/
```
squad
```
—— 基于归属的访问
```
env
```
—— 环境级限制

静态聚合标签如

owner=sysadmins

或

category=database

尤其有效：一个标签值即可控制对多个日志文件的访问，无需冗长的文件名或流允许列表。

The 80/20 Rule

二八法则

The most impactful improvements almost always come from these four changes:

Remove
pod
as a label — biggest stream reduction in K8s environments
Add
level
as a label AND always specify it in queries — can eliminate 94%+ of scanned data when searching for errors
Normalize label values — eliminates phantom duplicate streams from inconsistent casing
Remove or normalize
filename
in K8s — highly variable paths inflate stream count significantly

Focus on these before anything else.

最具影响力的改进几乎总是来自以下四项变更：

移除
pod
作为标签——K8s环境中流数量减少最显著的操作
添加
level
作为标签并始终在查询中指定——搜索错误时可消除94%以上的扫描数据
标准化标签值——消除因大小写不一致导致的幻影重复流
移除或标准化K8s中的
filename
——高度可变的路径会大幅增加流数量

在处理其他事项之前先专注于这些操作。

Labels to Avoid — Quick Reference

应避免的标签——快速参考

Label	Why	Alternative
`pod`	Transient, unbounded	`workload` label + `pod` in structured metadata
`user_id`	Unbounded	Keep only in log content
`request_id` / `trace_id`	Unbounded	Structured metadata
`filename` (raw K8s path)	Contains pod UID	Normalize or drop
Unnormalized `level`	`INFO` / `info` / `Info` = 3 streams	Normalize at collection time
Any dynamically-named label key	Cannot be bounded	Use fixed keys with bounded values

标签	原因	替代方案
`pod`	临时、无上限	`workload` 标签 + `pod` 存入结构化元数据
`user_id`	无上限	仅保留在日志内容中
`request_id` / `trace_id`	无上限	结构化元数据
`filename` （原始K8s路径）	包含Pod UID	标准化或丢弃
未标准化的 `level`	`INFO` / `info` / `Info` = 3个流	在采集时标准化
任何动态命名的标签键	无法限制	使用固定键和有界值

loki-label-analyzer

Original

Translation

Loki Label Strategy Evaluator

Loki标签策略评估工具

Core Concepts

核心概念

Label Evaluation Framework

标签评估框架

Cardinality Scoring

基数评分

Access Pattern Alignment

访问模式对齐

Static vs. Dynamic Label Values

静态与动态标签值

Consistency Check

一致性检查

Evaluation Output Format

评估输出格式

Loki Label Strategy Audit

Loki标签策略审计

Summary

摘要

Label Analysis

标签分析

Estimated Impact

预估影响

Recommended Label Set

推荐标签集

Migration Notes

迁移说明

Recommended Common Labels

推荐通用标签

Kubernetes Pod Logs

Kubernetes Pod日志

Recommended Labels

推荐标签

Labels to AVOID in Kubernetes

Kubernetes中应避免的标签

Host / VM / Bare Metal Labels

主机/虚拟机/裸金属标签

Journal Logs

Journal日志

Structured Metadata

结构化元数据

Embedding Metadata in Log Lines

将元数据嵌入日志行

Method 1: stage.template (append to log line)

方法1：stage.template（追加到日志行）

Method 2: stage.pack (JSON envelope)

方法2：stage.pack（JSON信封）

Performance Bottleneck Diagnosis

性能瓶颈诊断

Four Query Stages

四个查询阶段

Checking Chunk Size

检查Chunk大小

Common Label-Related Performance Problems

常见标签相关性能问题

Query Optimization Quick Wins

查询优化快速方案

Alloy / Agent Configuration Patterns

Alloy / Agent配置模式

Normalize Log Level

标准化日志级别

Conditional Meta-Label Extraction

条件元标签提取

Enforce Approved Label Set (always use as final stage)

强制使用批准的标签集（始终作为最后阶段）

Soft Enforcement (inject "unknown" for missing labels)

软强制（为缺失标签注入"unknown"）

Log Line Optimization

日志行优化

Remove Timestamps from Log Lines

移除日志行中的时间戳

Remove ANSI Color Codes

移除ANSI颜色代码

Remove Duplicate Level Field (when level is already a label)

移除重复的Level字段（当level已作为标签时）

JSON Optimizations

Remove Duplicate Level Field (when
`level`
is already a label)

移除重复的Level字段（当
`level`
已作为标签时）