fluentbit-validator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFluent Bit Config Validator
Fluent Bit 配置验证工具
Overview
概述
This skill provides a comprehensive validation workflow for Fluent Bit configurations, combining syntax validation, semantic checks, security auditing, best practice enforcement, and dry-run testing. Validate Fluent Bit configs with confidence before deploying to production.
Fluent Bit uses an INI-like configuration format with sections ([SERVICE], [INPUT], [FILTER], [OUTPUT], [PARSER]) and key-value pairs. This validator ensures configurations are syntactically correct, semantically valid, secure, and optimized for production use.
本工具为Fluent Bit配置提供了一套全面的验证工作流,结合了语法验证、语义检查、安全审计、最佳实践合规性检查以及试运行测试。在部署到生产环境前,可放心地用它验证Fluent Bit配置。
Fluent Bit使用类INI的配置格式,包含[SERVICE]、[INPUT]、[FILTER]、[OUTPUT]、[PARSER]等区段以及键值对。该验证工具可确保配置在语法上正确、语义上有效、安全且针对生产环境进行了优化。
When to Use This Skill
适用场景
Invoke this skill when:
- Validating Fluent Bit configurations before deployment
- Debugging configuration syntax errors
- Testing configurations with fluent-bit --dry-run
- Working with custom plugins that need documentation
- Ensuring configs follow Fluent Bit best practices
- Auditing configurations for security issues
- Optimizing performance settings (buffers, flush intervals)
- The user asks to "validate", "lint", "check", or "test" Fluent Bit configs
- Troubleshooting configuration-related errors
在以下场景中调用本工具:
- 部署前验证Fluent Bit配置
- 调试配置语法错误
- 使用fluent-bit --dry-run测试配置
- 处理需要文档的自定义插件
- 确保配置遵循Fluent Bit最佳实践
- 审计配置中的安全问题
- 优化性能设置(缓冲区、刷新间隔)
- 用户要求“验证”“检查”或“测试”Fluent Bit配置时
- 排查与配置相关的错误
Validation Workflow
验证工作流
Follow this sequential validation workflow. Each stage catches different types of issues.
Recommended: For comprehensive validation, usewhich runs all validation stages in sequence:--check allbashpython3 scripts/validate_config.py --file <config-file> --check allIndividual check modes are available for targeted validation when debugging specific issues.
遵循以下顺序验证工作流,每个阶段可发现不同类型的问题。
推荐: 如需全面验证,使用参数,它会按顺序运行所有验证阶段:--check allbashpython3 scripts/validate_config.py --file <config-file> --check all当调试特定问题时,可使用单独的检查模式进行针对性验证。
Stage 1: Configuration File Structure
阶段1:配置文件结构
Verify the basic file structure and format:
bash
python3 scripts/validate_config.py --file <config-file> --check structureExpected format:
- INI-style sections with headers
[SECTION] - Key-value pairs with proper spacing
- Comments starting with
# - Sections: SERVICE, INPUT, FILTER, OUTPUT, PARSER (or MULTILINE_PARSER)
- Proper indentation (spaces, not tabs recommended)
Common issues caught:
- Missing section headers
- Malformed key-value pairs
- Invalid section names
- Syntax errors (unclosed brackets, etc.)
- Mixed tabs and spaces
- UTF-8 encoding issues
验证基本文件结构和格式:
bash
python3 scripts/validate_config.py --file <config-file> --check structure预期格式:
- 带有头的INI式区段
[SECTION] - 格式规范的键值对
- 以开头的注释
# - 区段类型:SERVICE、INPUT、FILTER、OUTPUT、PARSER(或MULTILINE_PARSER)
- 规范的缩进(推荐使用空格而非制表符)
可捕获的常见问题:
- 缺失区段头
- 格式错误的键值对
- 无效的区段名称
- 语法错误(如未闭合的括号等)
- 混合使用制表符和空格
- UTF-8编码问题
Stage 2: Section Validation
阶段2:区段验证
Validate all configuration sections (SERVICE, INPUT, FILTER, OUTPUT, PARSER):
bash
python3 scripts/validate_config.py --file <config-file> --check sectionsThis single command validates all section types. The checks performed for each section type are detailed below.
验证所有配置区段(SERVICE、INPUT、FILTER、OUTPUT、PARSER):
bash
python3 scripts/validate_config.py --file <config-file> --check sections该命令可验证所有类型的区段。以下是针对每种区段类型执行的检查详情。
SERVICE Section Checks
SERVICE区段检查
Checks:
- Required parameters: Flush
- Valid parameter names (no typos)
- Parameter value types (Flush must be numeric)
- Log_Level values: off, error, warn, info, debug, trace
- HTTP_Server values: On/Off
- Parsers_File references (file existence)
Common issues:
- Missing Flush parameter
- Invalid Log_Level value
- Parsers_File path doesn't exist
- Negative or zero Flush interval
Best practices:
- Flush: 1-5 seconds (balance latency vs. efficiency)
- Log_Level: info for production, debug for troubleshooting
- HTTP_Server: On (for health checks and metrics)
- storage.metrics: on (for monitoring)
检查项:
- 必填参数:Flush
- 有效的参数名称(无拼写错误)
- 参数值类型(Flush必须为数值型)
- Log_Level的有效值:off、error、warn、info、debug、trace
- HTTP_Server的有效值:On/Off
- Parsers_File引用的文件是否存在
常见问题:
- 缺失Flush参数
- Log_Level值无效
- Parsers_File路径不存在
- Flush间隔为负数或0
最佳实践:
- Flush:1-5秒(平衡延迟与效率)
- Log_Level:生产环境用info,排查问题时用debug
- HTTP_Server:设为On(用于健康检查和指标采集)
- storage.metrics:设为on(用于监控)
INPUT Section Checks
INPUT区段检查
Checks:
- Required parameters: Name
- Valid plugin names (tail, systemd, tcp, forward, http, etc.)
- Tag format (no spaces, valid characters)
- File paths exist (for tail plugin)
- Memory limits are set (Mem_Buf_Limit)
- DB file paths are valid
- Port numbers are in valid range (1-65535)
Common issues:
- Missing Name parameter
- Invalid plugin name (typo)
- Missing Tag parameter
- Path doesn't exist
- Missing Mem_Buf_Limit (OOM risk)
- Missing DB file (no position tracking)
- Port conflicts
Best practices:
- Always set Mem_Buf_Limit (50-100MB typical)
- Use DB for tail inputs (crash recovery)
- Set Skip_Long_Lines On (prevents hang)
- Use appropriate Tag patterns for routing
- Set Refresh_Interval for tail (10 seconds typical)
检查项:
- 必填参数:Name
- 有效的插件名称(tail、systemd、tcp、forward、http等)
- Tag格式(无空格,使用有效字符)
- 文件路径是否存在(针对tail插件)
- 是否设置了内存限制(Mem_Buf_Limit)
- DB文件路径是否有效
- 端口号在有效范围(1-65535)内
常见问题:
- 缺失Name参数
- 插件名称无效(拼写错误)
- 缺失Tag参数
- 路径不存在
- 缺失Mem_Buf_Limit(存在内存耗尽风险)
- 缺失DB文件(无位置跟踪功能)
- 端口冲突
最佳实践:
- 始终设置Mem_Buf_Limit(典型值为50-100MB)
- 对tail输入使用DB(用于崩溃恢复)
- 设置Skip_Long_Lines为On(防止进程挂起)
- 使用合适的Tag模式进行路由
- 为tail设置Refresh_Interval(典型值为10秒)
FILTER Section Checks
FILTER区段检查
Checks:
- Required parameters: Name, Match (or Match_Regex)
- Valid filter plugin names
- Match pattern syntax
- Tag pattern wildcards are valid
- Filter-specific parameters
Common issues:
- Missing Match parameter
- Invalid filter plugin name
- Match pattern doesn't match any INPUT tags
- Missing required plugin-specific parameters
Best practices:
- Use specific Match patterns (avoid "*" unless intended)
- Order filters logically (parsers before modifiers)
- Use kubernetes filter in K8s environments
- Parse JSON logs early in pipeline
检查项:
- 必填参数:Name、Match(或Match_Regex)
- 有效的过滤器插件名称
- Match模式语法
- Tag模式通配符是否有效
- 过滤器专属参数
常见问题:
- 缺失Match参数
- 过滤器插件名称无效
- Match模式与任何INPUT的Tag不匹配
- 缺失插件专属的必填参数
最佳实践:
- 使用具体的Match模式(除非有意,否则避免使用"*")
- 按逻辑顺序排列过滤器(解析器在前,修改器在后)
- 在K8s环境中使用kubernetes过滤器
- 在流水线早期解析JSON日志
OUTPUT Section Checks
OUTPUT区段检查
Checks:
- Required parameters: Name, Match
- Valid output plugin names (including elasticsearch, kafka, loki, s3, cloudwatch, http, forward, file, opentelemetry)
- Host/Port validity
- Retry_Limit is set
- Storage limits are configured
- TLS configuration (if enabled)
- OpenTelemetry-specific: URI endpoints (metrics_uri, logs_uri, traces_uri), authentication headers, resource attributes
Common issues:
- Missing Match parameter
- Invalid output plugin name
- Match pattern doesn't match any INPUT tags
- Missing Retry_Limit (infinite retries risk)
- Missing storage.total_limit_size (disk exhaustion risk)
- Hardcoded credentials (security issue)
Best practices:
- Set Retry_Limit 3-5
- Configure storage.total_limit_size
- Enable TLS in production
- Use environment variables for credentials
- Enable compression when available
检查项:
- 必填参数:Name、Match
- 有效的输出插件名称(包括elasticsearch、kafka、loki、s3、cloudwatch、http、forward、file、opentelemetry)
- Host/Port的有效性
- 是否设置了Retry_Limit
- 是否配置了存储限制
- TLS配置(若启用)
- OpenTelemetry专属项:URI端点(metrics_uri、logs_uri、traces_uri)、认证头、资源属性
常见问题:
- 缺失Match参数
- 输出插件名称无效
- Match模式与任何INPUT的Tag不匹配
- 缺失Retry_Limit(存在无限重试风险)
- 缺失storage.total_limit_size(存在磁盘耗尽风险)
- 硬编码凭证(安全问题)
最佳实践:
- 设置Retry_Limit为3-5
- 配置storage.total_limit_size
- 生产环境中启用TLS
- 使用环境变量存储凭证
- 可用时启用压缩
PARSER Section Checks
PARSER区段检查
Checks:
- Required parameters: Name, Format
- Valid parser formats: json, regex, logfmt, ltsv
- Regex syntax validity
- Time_Format compatibility with Time_Key
- MULTILINE_PARSER rule syntax
Common issues:
- Invalid regex patterns
- Time_Format doesn't match log timestamps
- Missing Time_Key when using Time_Format
- MULTILINE_PARSER rules don't match
Best practices:
- Test regex patterns with sample logs
- Use built-in parsers when possible
- Set proper Time_Format for timestamp parsing
- Use MULTILINE_PARSER for stack traces
检查项:
- 必填参数:Name、Format
- 有效的解析器格式:json、regex、logfmt、ltsv
- Regex语法有效性
- Time_Format与Time_Key的兼容性
- MULTILINE_PARSER规则语法
常见问题:
- Regex模式无效
- Time_Format与日志时间戳不匹配
- 使用Time_Format时缺失Time_Key
- MULTILINE_PARSER规则不匹配
最佳实践:
- 用示例日志测试Regex模式
- 尽可能使用内置解析器
- 为时间戳解析设置合适的Time_Format
- 对堆栈跟踪使用MULTILINE_PARSER
Stage 3: Tag Consistency Check
阶段3:Tag一致性检查
Validate that tags flow correctly through the pipeline:
bash
python3 scripts/validate_config.py --file <config-file> --check tagsChecks:
- INPUT tags match FILTER Match patterns
- FILTER tags match OUTPUT Match patterns
- No orphaned filters (Match pattern doesn't match any INPUT)
- No orphaned outputs (Match pattern doesn't match any INPUT/FILTER)
- Tag wildcards are used correctly
Common issues:
- FILTER Match pattern doesn't match any INPUT Tag
- OUTPUT Match pattern doesn't match any logs
- Typo in Match pattern
- Incorrect wildcard usage
Example validation:
ini
[INPUT]
Tag kube.* # Produces: kube.var.log.containers.pod.log
[FILTER]
Match kube.* # Matches: ✅
[OUTPUT]
Match app.* # Matches: ❌ No logs will reach this output验证Tag在整个流水线中的流转是否正确:
bash
python3 scripts/validate_config.py --file <config-file> --check tags检查项:
- INPUT的Tag与FILTER的Match模式匹配
- FILTER的Tag与OUTPUT的Match模式匹配
- 无孤立过滤器(Match模式与任何INPUT都不匹配)
- 无孤立输出(Match模式与任何INPUT/FILTER都不匹配)
- Tag通配符使用正确
常见问题:
- FILTER的Match模式与任何INPUT的Tag都不匹配
- OUTPUT的Match模式与任何日志都不匹配
- Match模式存在拼写错误
- 通配符使用错误
验证示例:
ini
[INPUT]
Tag kube.* # 生成:kube.var.log.containers.pod.log
[FILTER]
Match kube.* # 匹配:✅
[OUTPUT]
Match app.* # 匹配:❌ 无日志会到达此输出Stage 4: Security Audit
阶段4:安全审计
Scan configuration for security issues:
bash
python3 scripts/validate_config.py --file <config-file> --check securityChecks performed:
-
Hardcoded credentials:
- HTTP_User, HTTP_Passwd in OUTPUT
- AWS_Access_Key, AWS_Secret_Key
- Passwords in plain text
- API keys and tokens
-
TLS configuration:
- TLS disabled for production outputs
- tls.verify Off (man-in-the-middle risk)
- Missing certificate files
-
File permissions:
- DB files readable/writable
- Parser files exist and readable
- Log files have appropriate permissions
-
Network exposure:
- INPUT plugins listening on 0.0.0.0 without auth
- Open ports without firewall mentions
- HTTP_Server exposed without auth
Security best practices:
- Use environment variables:
HTTP_User ${ES_USER} - Enable TLS:
tls On - Verify certificates:
tls.verify On - Don't listen on 0.0.0.0 for sensitive inputs
- Use authentication for HTTP endpoints
Auto-fix suggestions:
ini
undefined扫描配置中的安全问题:
bash
python3 scripts/validate_config.py --file <config-file> --check security执行的检查项:
-
硬编码凭证:
- OUTPUT中的HTTP_User、HTTP_Passwd
- AWS_Access_Key、AWS_Secret_Key
- 明文密码
- API密钥和令牌
-
TLS配置:
- 生产环境输出禁用TLS
- tls.verify设为Off(存在中间人攻击风险)
- 缺失证书文件
-
文件权限:
- DB文件可读写
- 解析器文件存在且可读
- 日志文件权限合适
-
网络暴露:
- INPUT插件在无认证的情况下监听0.0.0.0
- 开放端口未提及防火墙设置
- HTTP_Server在无认证的情况下暴露
安全最佳实践:
- 使用环境变量:
HTTP_User ${ES_USER} - 启用TLS:
tls On - 验证证书:
tls.verify On - 敏感输入不要监听0.0.0.0
- 为HTTP端点启用认证
自动修复建议:
ini
undefinedBefore (insecure)
修复前(不安全)
[OUTPUT]
HTTP_User admin
HTTP_Passwd password123
[OUTPUT]
HTTP_User admin
HTTP_Passwd password123
After (secure)
修复后(安全)
[OUTPUT]
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASSWORD}
undefined[OUTPUT]
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASSWORD}
undefinedStage 5: Performance Analysis
阶段5:性能分析
Analyze configuration for performance issues:
bash
python3 scripts/validate_config.py --file <config-file> --check performanceChecks:
-
Buffer limits:
- Mem_Buf_Limit is set on all tail inputs
- storage.total_limit_size is set on outputs
- Limits are reasonable (not too small or too large)
-
Flush intervals:
- Flush interval is appropriate (1-5 sec typical)
- Not too low (high CPU) or too high (high memory)
-
Resource usage:
- Skip_Long_Lines enabled (prevents hang)
- Refresh_Interval set (file discovery)
- Compression enabled on network outputs
-
Kubernetes-specific:
- Buffer_Size 0 for kubernetes filter (recommended)
- Mem_Buf_Limit not too low for container logs
Performance recommendations:
ini
undefined分析配置中的性能问题:
bash
python3 scripts/validate_config.py --file <config-file> --check performance检查项:
-
缓冲区限制:
- 所有tail输入都设置了Mem_Buf_Limit
- 输出配置了storage.total_limit_size
- 限制值合理(不过小或过大)
-
刷新间隔:
- Flush间隔合适(典型值为1-5秒)
- 不要过低(CPU占用高)或过高(内存占用高)
-
资源使用:
- 启用Skip_Long_Lines(防止进程挂起)
- 设置了Refresh_Interval(用于文件发现)
- 网络输出启用了压缩
-
Kubernetes专属项:
- kubernetes过滤器的Buffer_Size设为0(推荐值)
- 容器日志的Mem_Buf_Limit不要过低
性能优化建议:
ini
undefinedGood configuration
优化后的配置
[SERVICE]
Flush 1 # 1 second: good balance
[INPUT]
Mem_Buf_Limit 50MB # Prevents OOM
Skip_Long_Lines On # Prevents hang
Refresh_Interval 10 # File discovery every 10s
[OUTPUT]
storage.total_limit_size 5G # Disk buffer limit
Retry_Limit 3 # Don't retry forever
Compress gzip # Reduce bandwidth
undefined[SERVICE]
Flush 1 # 1秒:平衡效果好
[INPUT]
Mem_Buf_Limit 50MB # 防止内存耗尽
Skip_Long_Lines On # 防止进程挂起
Refresh_Interval 10 # 每10秒执行一次文件发现
[OUTPUT]
storage.total_limit_size 5G # 磁盘缓冲区限制
Retry_Limit 3 # 不要无限重试
Compress gzip # 减少带宽占用
undefinedStage 6: Best Practice Validation
阶段6:最佳实践验证
Check against Fluent Bit best practices:
bash
python3 scripts/validate_config.py --file <config-file> --check best-practicesChecks:
-
Required configurations:
- SERVICE section exists
- At least one INPUT
- At least one OUTPUT
- HTTP_Server enabled (for health checks)
-
Kubernetes configurations:
- kubernetes filter used for K8s logs
- Proper Kube_URL, Kube_CA_File, Kube_Token_File
- Exclude_Path to prevent log loops
- DB file for position tracking
-
Reliability:
- Retry_Limit set on outputs
- DB file for tail inputs
- storage.type filesystem for critical logs
-
Observability:
- HTTP_Server enabled
- storage.metrics enabled
- Proper Log_Level (info or debug)
Best practice checklist:
- ✅ SERVICE section with Flush parameter
- ✅ HTTP_Server enabled for health checks
- ✅ Mem_Buf_Limit on all tail inputs
- ✅ DB file for tail inputs (position tracking)
- ✅ Retry_Limit on all outputs
- ✅ storage.total_limit_size on outputs
- ✅ TLS enabled for production
- ✅ Environment variables for credentials
- ✅ kubernetes filter for K8s environments
- ✅ Exclude_Path to prevent log loops
检查配置是否符合Fluent Bit最佳实践:
bash
python3 scripts/validate_config.py --file <config-file> --check best-practices检查项:
-
必填配置:
- 存在SERVICE区段
- 至少有一个INPUT
- 至少有一个OUTPUT
- 启用了HTTP_Server(用于健康检查)
-
Kubernetes配置:
- 针对K8s日志使用kubernetes过滤器
- 正确配置Kube_URL、Kube_CA_File、Kube_Token_File
- 使用Exclude_Path防止日志循环
- 使用DB文件进行位置跟踪
-
可靠性:
- 输出设置了Retry_Limit
- tail输入使用DB文件
- 关键日志的storage.type设为filesystem
-
可观测性:
- 启用了HTTP_Server
- 启用了storage.metrics
- 设置了合适的Log_Level(info或debug)
最佳实践 checklist:
- ✅ 带有Flush参数的SERVICE区段
- ✅ 启用HTTP_Server用于健康检查
- ✅ 所有tail输入都设置了Mem_Buf_Limit
- ✅ tail输入使用DB文件(位置跟踪)
- ✅ 所有输出都设置了Retry_Limit
- ✅ 输出配置了storage.total_limit_size
- ✅ 生产环境启用TLS
- ✅ 使用环境变量存储凭证
- ✅ K8s环境中使用kubernetes过滤器
- ✅ 使用Exclude_Path防止日志循环
Stage 7: Dry-Run Testing
阶段7:试运行测试
Test configuration with Fluent Bit dry-run (if binary available):
bash
fluent-bit -c <config-file> --dry-runThis catches:
- Configuration parsing errors
- Plugin loading errors
- Parser syntax errors
- File permission issues
- Missing dependencies
Common errors:
- Parser file not found:
[error] [config] parser file 'parsers.conf' not foundFix: Create parser file or update Parsers_File path
- Plugin not found:
[error] [plugins] invalid plugin 'unknownplugin'Fix: Check plugin name spelling or install plugin
- Invalid parameter:
[error] [input:tail] invalid property 'InvalidParam'Fix: Remove invalid parameter or check documentation
- Permission denied:
[error] cannot open /var/log/containers/*.logFix: Check file permissions or run with appropriate user
If fluent-bit binary is not available:
- Skip this stage
- Document that dry-run testing was skipped
- Recommend testing in development environment
使用Fluent Bit的试运行功能测试配置(若二进制文件可用):
bash
fluent-bit -c <config-file> --dry-run可捕获的问题:
- 配置解析错误
- 插件加载错误
- 解析器语法错误
- 文件权限问题
- 缺失依赖
常见错误:
- 解析器文件未找到:
[error] [config] parser file 'parsers.conf' not found修复方案:创建解析器文件或更新Parsers_File路径
- 插件未找到:
[error] [plugins] invalid plugin 'unknownplugin'修复方案:检查插件名称拼写或安装插件
- 无效参数:
[error] [input:tail] invalid property 'InvalidParam'修复方案:移除无效参数或查阅文档
- 权限拒绝:
[error] cannot open /var/log/containers/*.log修复方案:检查文件权限或使用合适的用户运行
若fluent-bit二进制文件不可用:
- 跳过此阶段
- 记录试运行测试已跳过
- 建议在开发环境中进行测试
Stage 8: Documentation Lookup (if needed)
阶段8:文档查询(如有需要)
If configuration uses unfamiliar plugins or parameters:
Try context7 MCP first:
Use mcp__context7__resolve-library-id with "fluent-bit"
Then use mcp__context7__get-library-docs with:
- context7CompatibleLibraryID: /fluent/fluent-bit-docs
- topic: "<plugin-type> <plugin-name> configuration"
- page: 1Fallback to WebSearch:
Search query: "fluent-bit <plugin-type> <plugin-name> configuration parameters site:docs.fluentbit.io"
Examples:
- "fluent-bit output elasticsearch configuration parameters site:docs.fluentbit.io"
- "fluent-bit filter kubernetes configuration parameters site:docs.fluentbit.io"Extract information:
- Required parameters
- Optional parameters and defaults
- Valid value ranges
- Example configurations
若配置使用了不熟悉的插件或参数:
优先使用context7 MCP:
使用mcp__context7__resolve-library-id,参数为"fluent-bit"
然后使用mcp__context7__get-library-docs,参数:
- context7CompatibleLibraryID: /fluent/fluent-bit-docs
- topic: "<plugin-type> <plugin-name> configuration"
- page: 1备用方案:Web搜索
搜索查询:"fluent-bit <plugin-type> <plugin-name> configuration parameters site:docs.fluentbit.io"
示例:
- "fluent-bit output elasticsearch configuration parameters site:docs.fluentbit.io"
- "fluent-bit filter kubernetes configuration parameters site:docs.fluentbit.io"提取信息:
- 必填参数
- 可选参数及默认值
- 有效值范围
- 配置示例
Stage 9: Report and Fix Issues
阶段9:问题报告与修复
After validation, present comprehensive findings:
1. Summarize all issues:
Validation Report for fluent-bit.conf
=====================================
Errors (3):
- [Line 15] OUTPUT elasticsearch missing required parameter 'Host'
- [Line 25] FILTER Match pattern 'app.*' doesn't match any INPUT tags
- [Line 8] INPUT tail missing Mem_Buf_Limit (OOM risk)
Warnings (2):
- [Line 30] OUTPUT elasticsearch has hardcoded password (security risk)
- [Line 12] INPUT tail missing DB file (no crash recovery)
Info (1):
- [Line 3] SERVICE Flush interval is 10s (consider reducing for lower latency)
Best Practices (2):
- Consider enabling HTTP_Server for health checks
- Consider enabling compression on OUTPUT elasticsearch2. Categorize by severity:
- Errors (must fix): Configuration won't work, Fluent Bit won't start
- Warnings (should fix): Configuration works but has issues
- Info (consider): Optimization opportunities
- Best Practices: Recommended improvements
3. Propose specific fixes:
ini
undefined验证完成后,呈现全面的检查结果:
1. 汇总所有问题:
fluent-bit.conf验证报告
=====================================
错误(3项):
- [第15行] OUTPUT elasticsearch缺失必填参数'Host'
- [第25行] FILTER的Match模式'app.*'与任何INPUT的Tag都不匹配
- [第8行] INPUT tail缺失Mem_Buf_Limit(存在内存耗尽风险)
警告(2项):
- [第30行] OUTPUT elasticsearch存在硬编码密码(安全风险)
- [第12行] INPUT tail缺失DB文件(无崩溃恢复功能)
信息(1项):
- [第3行] SERVICE的Flush间隔为10秒(考虑缩短以降低延迟)
最佳实践建议(2项):
- 考虑启用HTTP_Server用于健康检查
- 考虑在OUTPUT elasticsearch上启用压缩2. 按严重程度分类:
- 错误(必须修复): 配置无法运行,Fluent Bit无法启动
- 警告(应该修复): 配置可运行但存在问题
- 信息(可考虑): 优化机会
- 最佳实践: 推荐的改进项
3. 提出具体修复方案:
ini
undefinedFix 1: Add missing Host parameter
修复1:添加缺失的Host参数
[OUTPUT]
Name es
Match *
Host elasticsearch.logging.svc # Added
Port 9200
[OUTPUT]
Name es
Match *
Host elasticsearch.logging.svc # 新增
Port 9200
Fix 2: Add Mem_Buf_Limit to prevent OOM
修复2:添加Mem_Buf_Limit防止内存耗尽
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Mem_Buf_Limit 50MB # Added
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Mem_Buf_Limit 50MB # 新增
Fix 3: Use environment variable for password
修复3:使用环境变量存储密码
[OUTPUT]
Name es
HTTP_User admin
HTTP_Passwd ${ES_PASSWORD} # Changed from hardcoded
**4. Get user approval** via AskUserQuestion
**5. Apply approved fixes** using Edit tool
**6. Re-run validation** to confirm
**7. Provide completion summary:**✅ Validation Complete - 5 issues fixed
Fixed Issues:
- fluent-bit.conf:15 - Added missing Host parameter to OUTPUT elasticsearch
- fluent-bit.conf:8 - Added Mem_Buf_Limit 50MB to INPUT tail
- fluent-bit.conf:30 - Changed hardcoded password to environment variable
- fluent-bit.conf:12 - Added DB file for crash recovery
- fluent-bit.conf:25 - Fixed FILTER Match pattern to match INPUT tags
Validation Status: All checks passed ✅
- Structure: Valid
- Syntax: Valid
- Tags: Consistent
- Security: No issues
- Performance: Optimized
- Best Practices: Compliant
- Dry-run: Passed (if applicable)
**8. Report-only summary (when user declines fixes):**
If user chooses not to apply fixes, provide a report-only summary:📋 Validation Report Complete - No fixes applied
Summary:
- Errors: 2 (must fix before deployment)
- Warnings: 16 (should fix)
- Info: 15 (optimization suggestions)
Critical Issues Requiring Attention:
- [Line 5] Invalid Log_Level 'invalid_level'
- [Line 52] [OUTPUT opentelemetry] missing required parameter 'Host'
Recommendations:
- Review the errors above before deploying this configuration
- Consider addressing warnings to improve reliability and security
- Run validation again after manual fixes: python3 scripts/validate_config.py --file <config> --check all
undefined[OUTPUT]
Name es
HTTP_User admin
HTTP_Passwd ${ES_PASSWORD} # 从硬编码修改为环境变量
**4. 通过AskUserQuestion获取用户批准**
**5. 使用Edit工具应用已批准的修复**
**6. 重新运行验证以确认问题已解决**
**7. 提供完成总结:**✅ 验证完成 - 已修复5项问题
已修复问题:
- fluent-bit.conf:15 - 为OUTPUT elasticsearch添加缺失的Host参数
- fluent-bit.conf:8 - 为INPUT tail添加Mem_Buf_Limit 50MB
- fluent-bit.conf:30 - 将硬编码密码改为环境变量
- fluent-bit.conf:12 - 添加DB文件用于崩溃恢复
- fluent-bit.conf:25 - 修改FILTER的Match模式以匹配INPUT的Tag
验证状态:所有检查通过 ✅
- 结构:有效
- 语法:有效
- Tag:一致
- 安全:无问题
- 性能:已优化
- 最佳实践:合规
- 试运行:通过(若适用)
**8. 仅报告总结(当用户拒绝修复时):**
若用户选择不应用修复,提供仅报告的总结:📋 验证报告完成 - 未应用任何修复
总结:
- 错误:2项(部署前必须修复)
- 警告:16项(应该修复)
- 信息:15项(优化建议)
需重点关注的关键问题:
- [第5行] Log_Level值'invalid_level'无效
- [第52行] [OUTPUT opentelemetry]缺失必填参数'Host'
建议:
- 部署前修复上述错误
- 考虑处理警告以提升可靠性和安全性
- 手动修复后重新运行验证:python3 scripts/validate_config.py --file <config> --check all
undefinedCommon Issues and Solutions
常见问题与解决方案
Configuration Errors
配置错误
Issue: Parser file not found
[error] [config] parser file 'parsers.conf' not foundSolution:
- Verify Parsers_File path in SERVICE section
- Check if file exists at specified location
- Use relative path from config file location
Issue: Missing required parameter
[error] [output:es] property 'Host' not setSolution:
- Add required parameter to OUTPUT section
- Check documentation for required fields
Issue: Invalid plugin name
[error] [plugins] invalid plugin 'unknownplugin'Solution:
- Check plugin name spelling
- Verify plugin is available (may need installation)
- Consult documentation for correct plugin names
问题:解析器文件未找到
[error] [config] parser file 'parsers.conf' not found解决方案:
- 验证SERVICE区段中的Parsers_File路径
- 检查指定位置是否存在该文件
- 使用相对于配置文件位置的相对路径
问题:缺失必填参数
[error] [output:es] property 'Host' not set解决方案:
- 为OUTPUT区段添加必填参数
- 查阅文档确认必填字段
问题:插件名称无效
[error] [plugins] invalid plugin 'unknownplugin'解决方案:
- 检查插件名称拼写
- 验证插件是否可用(可能需要安装)
- 查阅文档获取正确的插件名称
Tag Routing Issues
Tag路由问题
Issue: No logs reaching output
undefined问题:无日志到达输出
undefinedLogs are generated but don't appear in output
日志已生成但未出现在输出中
Debug:
1. Check INPUT Tag matches FILTER Match
2. Check FILTER Match/tag_prefix matches OUTPUT Match
3. Enable debug logging: `Log_Level debug`
4. Check for grep filters excluding all logs
Solution:
```ini
[INPUT]
Tag kube.*
[FILTER]
Match kube.* # Must match INPUT Tag
[OUTPUT]
Match kube.* # Must match INPUT or FILTER tag调试步骤:
1. 检查INPUT的Tag是否与FILTER的Match匹配
2. 检查FILTER的Match/tag_prefix是否与OUTPUT的Match匹配
3. 启用调试日志:`Log_Level debug`
4. 检查是否有grep过滤器排除了所有日志
解决方案:
```ini
[INPUT]
Tag kube.*
[FILTER]
Match kube.* # 必须与INPUT的Tag匹配
[OUTPUT]
Match kube.* # 必须与INPUT或FILTER的Tag匹配Memory Issues
内存问题
Issue: Fluent Bit OOM killed
undefined问题:Fluent Bit因内存耗尽被终止
undefinedContainer or process killed due to memory
容器或进程因内存问题被终止
Solution:
- Add Mem_Buf_Limit to all tail inputs
- Reduce Mem_Buf_Limit values
- Set storage.total_limit_size on outputs
- Increase Flush interval (batch more)
- Add log filtering to reduce volume解决方案:
- 为所有tail输入添加Mem_Buf_Limit
- 降低Mem_Buf_Limit的值
- 为输出设置storage.total_limit_size
- 增大Flush间隔(批量处理更多日志)
- 添加日志过滤以减少日志量Security Issues
安全问题
Issue: Hardcoded credentials in config
[OUTPUT]
HTTP_Passwd secretpasswordSolution:
- Use environment variables:
ini
[OUTPUT]
HTTP_Passwd ${ES_PASSWORD}- Mount secrets in Kubernetes
- Use IAM roles for cloud services (AWS, GCP, Azure)
Issue: TLS disabled or not verified
[OUTPUT]
tls On
tls.verify OffSolution:
- Enable verification for production:
ini
[OUTPUT]
tls On
tls.verify On
tls.ca_file /path/to/ca.crt问题:配置中存在硬编码凭证
[OUTPUT]
HTTP_Passwd secretpassword解决方案:
- 使用环境变量:
ini
[OUTPUT]
HTTP_Passwd ${ES_PASSWORD}- 在Kubernetes中挂载密钥
- 为云服务使用IAM角色(AWS、GCP、Azure)
问题:TLS已禁用或未验证
[OUTPUT]
tls On
tls.verify Off解决方案:
- 生产环境中启用验证:
ini
[OUTPUT]
tls On
tls.verify On
tls.ca_file /path/to/ca.crtIntegration with fluentbit-generator
与fluentbit-generator的集成
This validator is automatically invoked by the fluentbit-generator skill after generating configurations. It can also be used standalone to validate existing configurations.
Generator workflow:
- Generate configuration using fluentbit-generator
- Automatically validate using fluentbit-validator
- Fix any issues found
- Re-validate until all checks pass
- Deploy with confidence
在fluentbit-generator工具生成配置后,会自动调用本验证工具。它也可单独用于验证现有配置。
生成器工作流:
- 使用fluentbit-generator生成配置
- 自动使用fluentbit-validator进行验证
- 修复发现的问题
- 重新验证直至所有检查通过
- 放心部署
Resources
资源
scripts/
scripts/
validate_config.py
- Main validation script with all checks integrated in a single file
- Usage:
python3 scripts/validate_config.py --file <config> --check <type> - Available check types: ,
all,structure,syntax,sections,tags,security,performance,best-practicesdry-run - Comprehensive 1000+ line validator covering all validation stages
- Includes syntax validation, section validation, tag consistency, security audit, performance analysis, and best practices
- Returns detailed error messages with line numbers
- Supports JSON output format:
--json
validate.sh
- Convenience wrapper script for easier invocation
- Usage:
bash scripts/validate.sh <config-file> - Automatically calls validate_config.py with proper Python interpreter
- Simplifies command-line usage
validate_config.py
- 主验证脚本,所有检查都集成在单个文件中
- 使用方式:
python3 scripts/validate_config.py --file <config> --check <type> - 可用的检查类型:,
all,structure,syntax,sections,tags,security,performance,best-practicesdry-run - 包含1000+行代码的全面验证器,覆盖所有验证阶段
- 包含语法验证、区段验证、Tag一致性检查、安全审计、性能分析和最佳实践检查
- 返回带有行号的详细错误信息
- 支持JSON输出格式:
--json
validate.sh
- 便于调用的包装脚本
- 使用方式:
bash scripts/validate.sh <config-file> - 自动调用validate_config.py并使用合适的Python解释器
- 简化命令行使用
tests/
tests/
Test Configuration Files:
- - Valid basic Kubernetes logging setup
valid-basic.conf - - Valid configuration with multiple outputs
valid-multioutput.conf - - Valid OpenTelemetry output configuration (Fluent Bit 2.x+)
valid-opentelemetry.conf - - Missing required parameters
invalid-missing-required.conf - - Security vulnerabilities (hardcoded credentials, disabled TLS)
invalid-security-issues.conf - - OpenTelemetry configuration errors
invalid-opentelemetry.conf - - Tag routing issues
invalid-tag-mismatch.conf
Running Tests:
bash
undefined测试配置文件:
- - 有效的基础Kubernetes日志收集配置
valid-basic.conf - - 带有多个输出的有效配置
valid-multioutput.conf - - 有效的OpenTelemetry输出配置(Fluent Bit 2.x+)
valid-opentelemetry.conf - - 缺失必填参数的配置
invalid-missing-required.conf - - 存在安全漏洞的配置(硬编码凭证、禁用TLS)
invalid-security-issues.conf - - 存在错误的OpenTelemetry配置
invalid-opentelemetry.conf - - 存在Tag路由问题的配置
invalid-tag-mismatch.conf
运行测试:
bash
undefinedTest on valid config
测试有效配置
python3 scripts/validate_config.py --file tests/valid-basic.conf
python3 scripts/validate_config.py --file tests/valid-basic.conf
Test on invalid config (should report errors)
测试无效配置(应报告错误)
python3 scripts/validate_config.py --file tests/invalid-security-issues.conf
python3 scripts/validate_config.py --file tests/invalid-security-issues.conf
Test all configs
测试所有配置
for config in tests/*.conf; do
echo "Testing $config"
python3 scripts/validate_config.py --file "$config"
done
undefinedfor config in tests/*.conf; do
echo "Testing $config"
python3 scripts/validate_config.py --file "$config"
done
undefinedDocumentation Sources
文档来源
Based on comprehensive research from:
- Fluent Bit Official Documentation
- Fluent Bit Operations and Best Practices
- Configuration File Format
- Context7 Fluent Bit documentation (/fluent/fluent-bit-docs)
基于以下全面研究资料:
- Fluent Bit官方文档
- Fluent Bit操作与最佳实践
- 配置文件格式
- Context7 Fluent Bit文档 (/fluent/fluent-bit-docs)