loki-label-analyzer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLoki Label Strategy Evaluator
Loki标签策略评估工具
You are an expert in Grafana Loki label strategy. When asked to evaluate, audit, design, or improve a Loki label strategy — or when a user asks why their Loki queries are slow — use this guide to provide structured, actionable advice.
你是Grafana Loki标签策略方面的专家。当用户请求评估、审计、设计或优化Loki标签策略,或者询问其Loki查询为何缓慢时,请使用本指南提供结构化、可落地的建议。
Core Concepts
核心概念
Streams are the fundamental unit in Loki. Each unique combination of label key-value pairs creates a new stream. Too many streams = performance problems. Too few = broad, slow queries.
Cardinality = the number of unique values a label can have. High-cardinality labels (like , , ) dramatically increase stream count and hurt performance — especially when those labels are not specified in every query.
poduser_idrequest_idThe dual impact rule: High-cardinality labels hurt on both paths:
- Ingestion path: More streams → larger index, higher storage costs
- Query path: If a high-cardinality label exists but isn't in the query selector, Loki must scan ALL streams matching the other selectors — catastrophic for performance
The key question for any dynamic label: "Will this label be used in 9 out of 10 queries?" If no → it should NOT be a label.
**流(Streams)**是Loki中的基本单元。每个唯一的标签键值对组合都会创建一个新流。流数量过多会导致性能问题,流数量过少则会导致查询范围过宽、速度变慢。
**基数(Cardinality)**指一个标签可拥有的唯一值数量。高基数标签(如、、)会大幅增加流数量并损害性能——尤其是当这些标签并未在每次查询中指定时。
poduser_idrequest_id双重影响规则:高基数标签会在两个环节造成负面影响:
- 摄入环节:流数量越多→索引越大,存储成本越高
- 查询环节:如果存在高基数标签但未在查询选择器中指定,Loki必须扫描所有匹配其他选择器的流——这对性能来说是灾难性的
动态标签的关键问题:“该标签会在10次查询中的9次被使用吗?”如果答案是否→则不应将其设为标签。
Label Evaluation Framework
标签评估框架
When auditing a label strategy, assess each label against these criteria.
审计标签策略时,请根据以下标准评估每个标签。
Cardinality Scoring
基数评分
| Label Example | Cardinality | Verdict |
|---|---|---|
| 2–5 values | ✅ Good |
| 3–6 values | ✅ Good |
| Tens | ✅ Acceptable |
| Hundreds–thousands | ⚠️ Evaluate access patterns |
| Thousands + transient | ❌ Avoid as label |
| Unbounded | ❌ Never use as label |
| 标签示例 | 基数 | 结论 |
|---|---|---|
| 2–5个值 | ✅ 良好 |
| 3–6个值 | ✅ 良好 |
| 数十个 | ✅ 可接受 |
| 数百到数千个 | ⚠️ 评估访问模式 |
| 数千个+临时值 | ❌ 避免作为标签 |
| 无上限 | ❌ 绝不要作为标签 |
Access Pattern Alignment
访问模式对齐
For each label, ask:
- Is this label used as a selector in most queries targeting these logs?
- Does this label logically segment data in the way users think about it?
- Would removing this label force users to scan dramatically more data?
针对每个标签,需询问:
- 该标签是否在大多数针对这些日志的查询中被用作选择器?
- 该标签是否按照用户的思维逻辑对数据进行分段?
- 删除该标签是否会迫使用户扫描大量额外数据?
Static vs. Dynamic Label Values
静态与动态标签值
- Static labels (values don't change per log line, e.g., ,
platform=linux) add no cardinality cost relative to the query scope. Use freely for LBAC, exploration, and alert routing.job=agent - Dynamic labels (values change per log line) must be bounded. Keep possible values in the single digits or low tens.
- 静态标签(值不会随日志行变化,例如、
platform=linux)相对于查询范围不会增加基数成本。可自由用于LBAC、数据探索和告警路由。job=agent - 动态标签(值随日志行变化)必须有界。可能的取值应控制在个位数或低十位数。
Consistency Check
一致性检查
- Are label names consistent across services? (case-sensitive — ≠
Level)level - Are label values normalized? (,
INFO,infoshould all becomeInfo)info - Is there a naming convention? (pick one: or
snake_case— be consistent)camelCase
- 跨服务的标签名称是否一致?(区分大小写——≠
Level)level - 标签值是否已标准化?(、
INFO、info应统一为Info)info - 是否有命名规范?(选择一种:或
snake_case——保持一致)camelCase
Evaluation Output Format
评估输出格式
When auditing a label set, produce a report in this structure:
undefined审计标签集时,请按照以下结构生成报告:
undefinedLoki Label Strategy Audit
Loki标签策略审计
Summary
摘要
[1-2 sentence overall assessment]
[1-2句话的整体评估]
Label Analysis
标签分析
| Label | Cardinality | Used in Queries? | Verdict | Action |
|---|---|---|---|---|
| app | Low (tens) | Always | ✅ Keep | — |
| pod | Very High (transient) | Rarely | ❌ Remove | Move to structured metadata or embed in log line |
| 标签 | 基数 | 是否用于查询? | 结论 | 操作 |
|---|---|---|---|---|
| app | 低(数十个) | 总是 | ✅ 保留 | — |
| pod | 极高(临时值) | 很少 | ❌ 删除 | 移至结构化元数据或嵌入日志行 |
Estimated Impact
预估影响
- Stream count reduction: [X streams → Y streams]
- Query performance: [describe improvement]
- Storage impact: [if log line changes are involved]
- 流数量减少:[X个流 → Y个流]
- 查询性能:[描述改进效果]
- 存储影响:[如果涉及日志行修改]
Recommended Label Set
推荐标签集
[Final recommended labels]
[最终推荐的标签列表]
Migration Notes
迁移说明
[How to implement changes via Alloy/Agent pipeline stages]
---[如何通过Alloy/Agent流水线阶段实现变更]
---Recommended Common Labels
推荐通用标签
Every log source should consider these base labels — all low cardinality, high query value:
| Label | Purpose |
|---|---|
| Identifying the generating application |
| Environment (prod, staging, dev) |
| Multi-cluster differentiation |
| Geographic region |
| Log severity — normalize to: |
| Collector job name |
| Ownership (also useful for LBAC) |
| Log origin type ( |
| Data sensitivity level — for LBAC policies |
每个日志源都应考虑以下基础标签——均为低基数、高查询价值:
| 标签 | 用途 |
|---|---|
| 标识生成日志的应用 |
| 环境(prod、staging、dev) |
| 多集群区分 |
| 地理区域 |
| 日志级别——标准化为: |
| 采集器任务名称 |
| 归属(也适用于LBAC) |
| 日志来源类型( |
| 数据敏感级别——用于LBAC策略 |
Kubernetes Pod Logs
Kubernetes Pod日志
Recommended Labels
推荐标签
| Label | Description |
|---|---|
| K8s namespace — delineates isolation boundaries |
| Container name — low cardinality, differentiates log formats |
| K8s service generating logs |
| |
Why beats for K8s: Derived from — static values that never change like pod names do. Unlike (which may aggregate multiple workload types), is precise and predictable. Users always know exactly what value to query.
workloadapp{{controller_kind}}/{{controller_name}}appworkload| 标签 | 描述 |
|---|---|
| K8s命名空间——划分隔离边界 |
| 容器名称——低基数,区分日志格式 |
| 生成日志的K8s服务 |
| |
为何K8s中优于:由派生而来——是像Pod名称那样不会变化的静态值。与(可能聚合多种工作负载类型)不同,精准且可预测。用户总能明确知道要查询的具体值。
workloadapp{{controller_kind}}/{{controller_name}}appworkloadLabels to AVOID in Kubernetes
Kubernetes中应避免的标签
pod- Highly transient: pod names change on every restart/rollout
- Very high cardinality: 5 pods × 2 containers = 10 streams; add → 10 × N streams
pod - Users almost never query for a specific pod; they query for the workload
- Solution: Use as the label; store
workloadin structured metadata or embed in the log linepod
filename- K8s log paths contain pod UID:
/var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log - The component makes this unbounded
pod_id - Solution: Normalize to or drop entirely
/var/log/pods/{namespace}/{controller_name}/{container}.log
alloy
// Normalize K8s filename to remove pod UID
stage.replace {
source = "filename"
expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
replace = "/var/log/pods/$1/$2/current.log"
}pod- 高度临时:Pod名称在每次重启/滚动更新时都会变化
- 基数极高:5个Pod × 2个容器 = 10个流;添加后→10 × N个流
pod - 用户几乎从不查询特定Pod;他们查询的是工作负载
- 解决方案:使用作为标签;将
workload存储在结构化元数据或嵌入日志行中pod
filename- K8s日志路径包含Pod UID:
/var/log/pods/{namespace}_{pod}_{pod_id}/{container}/{rotation}.log - 部分使其无上限
pod_id - 解决方案:标准化为或完全丢弃
/var/log/pods/{namespace}/{controller_name}/{container}.log
alloy
// 标准化K8s文件名以移除Pod UID
stage.replace {
source = "filename"
expression = "/var/log/pods/([^/]+)_[^_]+_[^/]+/([^/]+)/\\d+\\.log"
replace = "/var/log/pods/$1/$2/current.log"
}Host / VM / Bare Metal Labels
主机/虚拟机/裸金属标签
In addition to common labels, add:
| Label | Description | Notes |
|---|---|---|
| Hostname of the machine | Cardinality = number of machines; acceptable for fixed infrastructure |
| Full path to the file being tailed | Normalize rotating filenames — strip date suffixes |
alloy
// Remove date suffixes from rotating log file names
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
source = "filename"
expression = "-\\d{8}(\\.log|\\.txt)$"
replace = "$1"
}除通用标签外,添加:
| 标签 | 描述 | 说明 |
|---|---|---|
| 机器主机名 | 基数=机器数量;适用于固定基础设施 |
| 被追踪文件的完整路径 | 标准化滚动文件名——移除日期后缀 |
alloy
// 移除滚动日志文件名中的日期后缀
// /var/log/myapp/logfile-20230927.txt → /var/log/myapp/logfile.txt
stage.replace {
source = "filename"
expression = "-\\d{8}(\\.log|\\.txt)$"
replace = "$1"
}Journal Logs
Journal日志
When collecting via , many labels are auto-discovered under :
, , , , , , , , , , , , , , ,
loki.source.journal__journal__*boot_idcap_effectivecmdlinecommexegidhostnamemachine_idpidstream_idsystemd_cgroupsystemd_invocation_idsystemd_slicesystemd_unittransportuidAlmost all are high-cardinality. Keep only:
- — hostname where journal logs were collected
instance - — the
unitname (e.g.,systemd_unit)nginx.service
Drop everything else:
alloy
loki.process "journal_labels" {
forward_to = [...]
stage.label_keep {
values = ["instance", "unit", "env", "cluster"]
}
}通过采集时,许多标签会自动发现并以为前缀:
, , , , , , , , , , , , , , ,
loki.source.journal__journal__*boot_idcap_effectivecmdlinecommexegidhostnamemachine_idpidstream_idsystemd_cgroupsystemd_invocation_idsystemd_slicesystemd_unittransportuid几乎所有这些标签都是高基数的。仅保留:
- —— 采集Journal日志的主机名
instance - ——
unit名称(例如systemd_unit)nginx.service
丢弃所有其他标签:
alloy
loki.process "journal_labels" {
forward_to = [...]
stage.label_keep {
values = ["instance", "unit", "env", "cluster"]
}
}Structured Metadata
结构化元数据
Structured metadata attaches key-value pairs to log entries without making them index labels. The ideal home for high-cardinality values users occasionally need.
Requires: Loki 2.9+, Grafana Agent/Alloy. Enable via :
limits_configyaml
limits_config:
allow_structured_metadata: trueGood candidates for structured metadata (not labels):
- — K8s pod name
pod - — K8s worker node
node - /
version/imagetag - /
trace_iduser_id process_id- — pod restart timestamp
restarted
Query structured metadata at query time without a parser:
logql
{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"结构化元数据会将键值对附加到日志条目上,但不会将其设为索引标签。这是用户偶尔需要的高基数值的理想存储位置。
要求:Loki 2.9+,Grafana Agent/Alloy。通过启用:
limits_configyaml
limits_config:
allow_structured_metadata: true适合结构化元数据的候选项(而非标签):
- —— K8s Pod名称
pod - —— K8s工作节点
node - /
version/imagetag - /
trace_iduser_id process_id- —— Pod重启时间戳
restarted
查询时无需解析器即可访问结构化元数据:
logql
{app="payment-api"} | pod="payment-api-7f9d4b-xk2r9"Embedding Metadata in Log Lines
将元数据嵌入日志行
When structured metadata isn't available, embed high-cardinality values into the log line rather than using them as labels.
当无法使用结构化元数据时,将高基数值嵌入日志行而非用作标签。
Method 1: stage.template (append to log line)
方法1:stage.template(追加到日志行)
alloy
loki.process "embed_pod" {
forward_to = [...]
// For JSON logs
stage.match {
selector = "{} |~ \"^\\s*\\{\""
stage.replace {
expression = "\\}$"
replace = ""
}
stage.template {
source = "log_line"
template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
}
}
// For text logs
stage.match {
selector = "{} !~ \"^\\s*\\{\""
stage.template {
source = "log_line"
template = "{{ .Entry }} _pod={{ .pod }}"
}
}
stage.output { source = "log_line" }
}Result:
ts=... msg="..." _pod=agent-logs-cqhfkQuery by aggregate (normal use):
logql
sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))Query a specific pod (edge case debugging):
logql
{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`alloy
loki.process "embed_pod" {
forward_to = [...]
// 针对JSON日志
stage.match {
selector = "{} |~ \"^\\s*\\{\""
stage.replace {
expression = "\\}$"
replace = ""
}
stage.template {
source = "log_line"
template = "{{ .Entry }},\"_pod\":\"{{ .pod }}\"}"
}
}
// 针对文本日志
stage.match {
selector = "{} !~ \"^\\s*\\{\""
stage.template {
source = "log_line"
template = "{{ .Entry }} _pod={{ .pod }}"
}
}
stage.output { source = "log_line" }
}结果:
ts=... msg="..." _pod=agent-logs-cqhfk按聚合查询(常规使用):
logql
sum(count_over_time({workload="ReplicaSet/payment-api", level="error"}[1m]))查询特定Pod(边缘场景调试):
logql
{workload="ReplicaSet/payment-api", level="error"} |= `_pod=payment-api-3`Method 2: stage.pack (JSON envelope)
方法2:stage.pack(JSON信封)
alloy
loki.process "pack_pod" {
forward_to = [...]
stage.pack {
labels = ["pod"]
ingest_timestamp = false
}
}Packed result:
{"_entry": "original log line", "pod": "agent-logs-cqhfk"}Unpack at query time:
logql
{workload="ReplicaSet/payment-api", level="error"}
|= `agent-logs-cqhfk`
| unpackalloy
loki.process "pack_pod" {
forward_to = [...]
stage.pack {
labels = ["pod"]
ingest_timestamp = false
}
}打包结果:
{"_entry": "original log line", "pod": "agent-logs-cqhfk"}查询时解包:
logql
{workload="ReplicaSet/payment-api", level="error"}
|= `agent-logs-cqhfk`
| unpackPerformance Bottleneck Diagnosis
性能瓶颈诊断
When a user reports slow queries, identify where time is spent using Querier logs.
metrics.go当用户反馈查询缓慢时,使用Querier的日志确定时间消耗位置。
metrics.goFour Query Stages
四个查询阶段
| Stage | Metric | High Value Means | Fix |
|---|---|---|---|
| Queue | | Not enough Queriers | Add Queriers or reduce parallelism |
| Index | | Need more Index Gateway instances | Scale index-gateways; check CPU |
| Storage | | Chunks too small OR storage bottleneck | Check avg chunk size: |
| Execution | | CPU-intensive regex, or too many tiny log lines | Reduce regex; add CPU; increase parallelism |
Ideally, the majority of time is spent in Execution. If not, that indicates infrastructure or label design problems.
| 阶段 | 指标 | 数值高意味着 | 修复方案 |
|---|---|---|---|
| 队列 | | Querier数量不足 | 添加Querier或降低并行度 |
| 索引 | | 需要更多Index Gateway实例 | 扩容index-gateways;检查CPU |
| 存储 | | Chunk过小或存储瓶颈 | 检查平均Chunk大小: |
| 执行 | | CPU密集型正则表达式,或过多小日志行 | 简化正则表达式;添加CPU;提高并行度 |
理想情况下,大部分时间应消耗在执行阶段。否则表明存在基础设施或标签设计问题。
Checking Chunk Size
检查Chunk大小
avg chunk size = total_bytes / cache_chunk_reqIf the result is a few hundred bytes or kilobytes (instead of megabytes), chunks are too small. This means labels are over-splitting data into too many streams. Revisit and reduce label cardinality.
avg chunk size = total_bytes / cache_chunk_req如果结果为几百字节或几千字节(而非兆字节),则Chunk过小。这意味着标签将数据过度拆分为过多流。重新审视并降低标签基数。
Common Label-Related Performance Problems
常见标签相关性能问题
Problem: Query scans too many streams
- Cause: High-cardinality labels exist but aren't specified in the query selector
- Fix: Remove the label, or ensure queries always include it as a filter
Problem: High discard ratio ()
post_filter_linespost_filter_lines << total_lines- Cause: Insufficient label selectivity; query scans and discards most logs
- Fix: Add labels matching user access patterns (,
level,workload)container
Problem: Small chunks
- Cause: Too many labels creating too many fine-grained streams
- Fix: Remove high-cardinality labels to consolidate streams
问题:查询扫描过多流
- 原因:存在高基数标签但未在查询选择器中指定
- 修复:移除该标签,或确保查询始终将其作为过滤器
问题:丢弃率过高()
post_filter_linespost_filter_lines << total_lines- 原因:标签选择性不足;查询扫描并丢弃大部分日志
- 修复:添加匹配用户访问模式的标签(、
level、workload)container
问题:Chunk过小
- 原因:过多标签创建了过多细粒度流
- 修复:移除高基数标签以合并流
Query Optimization Quick Wins
查询优化快速方案
- Add or
containerto narrow scope before line filtersworkload - Add label + always use it in queries (filters out 94%+ of logs when searching for errors)
level - Remove label → reduces stream count by ~5× in typical K8s deployments
pod - Replace regex line filters () with exact filters (
|~) where possible|=
- 在行过滤器之前添加或
container以缩小范围workload - 添加标签并始终在查询中使用(搜索错误时可过滤掉94%以上的日志)
level - 移除标签→在典型K8s部署中可将流数量减少约5倍
pod - 尽可能将正则行过滤器()替换为精确过滤器(
|~)|=
Alloy / Agent Configuration Patterns
Alloy / Agent配置模式
Normalize Log Level
标准化日志级别
alloy
loki.process "normalize_level" {
forward_to = [...]
stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
stage.labels { values = { level = "" } }
}alloy
loki.process "normalize_level" {
forward_to = [...]
stage.replace { source = "level"; expression = "(?i)I(nfo)?"; replace = "info" }
stage.replace { source = "level"; expression = "(?i)W(arn(ing)?)?"; replace = "warn" }
stage.replace { source = "level"; expression = "(?i)E(rr(or)?)?"; replace = "error" }
stage.replace { source = "level"; expression = "(?i)D(ebug?)?"; replace = "debug" }
stage.labels { values = { level = "" } }
}Conditional Meta-Label Extraction
条件元标签提取
alloy
// Only extract when the relevant field is present — avoids unnecessary cardinality
loki.process "conditional_extraction" {
forward_to = [...]
stage.match {
selector = "{app=\"loki\"} |= \"component\""
stage.logfmt { mapping = { "component" = "" } }
stage.labels { values = { component = "" } }
}
}alloy
// 仅当相关字段存在时提取——避免不必要的基数
loki.process "conditional_extraction" {
forward_to = [...]
stage.match {
selector = "{app=\"loki\"} |= \"component\""
stage.logfmt { mapping = { "component" = "" } }
stage.labels { values = { component = "" } }
}
}Enforce Approved Label Set (always use as final stage)
强制使用批准的标签集(始终作为最后阶段)
alloy
loki.process "enforce_labels" {
forward_to = [loki.write.default.receiver]
// ... other stages ...
stage.label_keep {
values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
}
}alloy
loki.process "enforce_labels" {
forward_to = [loki.write.default.receiver]
// ... 其他阶段 ...
stage.label_keep {
values = ["app", "env", "cluster", "level", "namespace", "workload", "container"]
}
}Soft Enforcement (inject "unknown" for missing labels)
软强制(为缺失标签注入"unknown")
alloy
stage.template {
source = "team"
template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }alloy
stage.template {
source = "team"
template = "{{ if .Value }}{{ .Value }}{{ else }}unknown{{ end }}"
}
stage.labels { values = { team = "" } }Log Line Optimization
日志行优化
These reduce storage costs. Establish a cost-per-GB baseline before implementing.
这些操作可降低存储成本。实施前先建立每GB成本基准。
Remove Timestamps from Log Lines
移除日志行中的时间戳
Each log entry already has a metadata timestamp — the inline timestamp is redundant (~30–34 bytes each, ~6% of a typical log line).
alloy
loki.process "drop_timestamp" {
forward_to = [...]
// logfmt timestamps
stage.replace {
expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
replace = " "
}
// JSON timestamps
stage.replace {
expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
replace = " "
}
// ISO-8601 at start of line
stage.replace {
expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
replace = ""
}
}The original timestamp is still accessible at query time:
| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'每个日志条目已包含元数据时间戳——行内时间戳是冗余的(每个约30-34字节,约占典型日志行的6%)。
alloy
loki.process "drop_timestamp" {
forward_to = [...]
// logfmt时间戳
stage.replace {
expression = "(?i)((?:time_?(?:stamp)?|ts|logdate|start_?time)=[^ \\n]+(?: |$))"
replace = " "
}
// JSON时间戳
stage.replace {
expression = "(\"@?(?:time_?(?:stamp)?|ts|logdate|start_?time)\"\\s*:\\s*\"[^\"]+\",?)"
replace = " "
}
// 行首的ISO-8601格式时间戳
stage.replace {
expression = "^(\\d{4}-\\d{2}-\\d{2})T\\d{2}:\\d{2}(?::\\d{2}(?:\\.\\d{1,9})?Z?)?"
replace = ""
}
}原始时间戳仍可在查询时访问:
| line_format '{{ __timestamp__ | date "2006-01-02T15:04:05Z" }}'Remove ANSI Color Codes
移除ANSI颜色代码
alloy
loki.process "decolorize" {
forward_to = [...]
stage.decolorize {}
}alloy
loki.process "decolorize" {
forward_to = [...]
stage.decolorize {}
}Remove Duplicate Level Field (when level
is already a label)
level移除重复的Level字段(当level
已作为标签时)
levelalloy
stage.replace { expression = "(level=[^ ]+ )"; replace = "" }alloy
stage.replace { expression = "(level=[^ ]+ )"; replace = "" }JSON Optimizations
JSON优化
alloy
// Remove null values
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
replace = ""
}
// Remove placeholder values ("-", "undefined", "null" strings)
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
replace = ""
}
// Remove empty values ("", [], {})
stage.replace {
expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
replace = ""
}Practical savings (Istio access log example):
Starting at 753 bytes (minified) → after removing nulls, placeholders, unused fields, normalizing keys: 464 bytes — 38% reduction.
alloy
// 移除空值
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*null)(?:\\s*,)?\\s*)"
replace = ""
}
// 移除占位符值("-"、"undefined"、"null"字符串)
stage.replace {
expression = "(\\s*(\"[^\"]+\"\\s*:\\s*\"(?:-|null|undefined)\")(?:\\s*,)?\\s*)"
replace = ""
}
// 移除空值(""、[]、{})
stage.replace {
expression = "(\\s*,\\s*(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))|(\"[^\"]+\"\\s*:\\s*(\\[\\s*\\]|\\{\\s*\\}|\"\\s*\"))\\s*,\\s*)"
replace = ""
}实际节省效果(Istio访问日志示例):
从753字节(压缩后)→移除空值、占位符、未使用字段、标准化键后:464字节——减少38%
Security & LBAC
安全与LBAC
Grafana Enterprise Logs (GEL) supports Label-Based Access Control (LBAC). Any label can serve as an access control selector.
Best labels for LBAC:
- — data sensitivity (
classification,public,restricted,confidential)top-secret - — controls which teams can see which log origins
source - /
team— ownership-based accesssquad - — environment-level restrictions
env
Static aggregate labels like or are particularly effective: one label value gates access to many log files, rather than requiring a long allowlist of filenames or streams.
owner=sysadminscategory=databaseGrafana Enterprise Logs(GEL)支持基于标签的访问控制(LBAC)。任何标签都可作为访问控制选择器。
LBAC最佳标签:
- —— 数据敏感度(
classification、public、restricted、confidential)top-secret - —— 控制哪些团队可查看哪些日志来源
source - /
team—— 基于归属的访问squad - —— 环境级限制
env
静态聚合标签如或尤其有效:一个标签值即可控制对多个日志文件的访问,无需冗长的文件名或流允许列表。
owner=sysadminscategory=databaseThe 80/20 Rule
二八法则
The most impactful improvements almost always come from these four changes:
- Remove as a label — biggest stream reduction in K8s environments
pod - Add as a label AND always specify it in queries — can eliminate 94%+ of scanned data when searching for errors
level - Normalize label values — eliminates phantom duplicate streams from inconsistent casing
- Remove or normalize in K8s — highly variable paths inflate stream count significantly
filename
Focus on these before anything else.
最具影响力的改进几乎总是来自以下四项变更:
- 移除作为标签——K8s环境中流数量减少最显著的操作
pod - 添加作为标签并始终在查询中指定——搜索错误时可消除94%以上的扫描数据
level - 标准化标签值——消除因大小写不一致导致的幻影重复流
- 移除或标准化K8s中的——高度可变的路径会大幅增加流数量
filename
在处理其他事项之前先专注于这些操作。
Labels to Avoid — Quick Reference
应避免的标签——快速参考
| Label | Why | Alternative |
|---|---|---|
| Transient, unbounded | |
| Unbounded | Keep only in log content |
| Unbounded | Structured metadata |
| Contains pod UID | Normalize or drop |
Unnormalized | | Normalize at collection time |
| Any dynamically-named label key | Cannot be bounded | Use fixed keys with bounded values |
| 标签 | 原因 | 替代方案 |
|---|---|---|
| 临时、无上限 | |
| 无上限 | 仅保留在日志内容中 |
| 无上限 | 结构化元数据 |
| 包含Pod UID | 标准化或丢弃 |
未标准化的 | | 在采集时标准化 |
| 任何动态命名的标签键 | 无法限制 | 使用固定键和有界值 |