kafka-topic-audit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kafka Topic Configuration Audit

Kafka主题配置审计

Audits all topic configurations against production best practices. Misconfigured topics are the #1 cause of Kafka data loss - engineers create topics and forget to tune them.
对照生产环境最佳实践审计所有主题配置。配置错误的主题是Kafka数据丢失的首要原因——工程师创建主题后往往会忘记调优配置。

Workflow

工作流程

Copy this checklist and track your progress:
Audit Progress:
- [ ] Step 1: Check environment health
- [ ] Step 2: Fetch all topics
- [ ] Step 3: Audit configurations against best practices
- [ ] Step 4: Check metadata completeness
- [ ] Step 5: Detect orphaned topics
- [ ] Step 6: Run consistency checks
- [ ] Step 7: Generate report
  1. Check environment health for a high-level summary
  2. Fetch all topics and their configurations
  3. Audit each topic against best practices (see
    references/audit-rules.md
    )
  4. Cross-reference metadata for completeness
  5. Detect orphaned topics with no consumers
  6. Report findings with prioritised recommendations
复制以下清单并跟踪进度:
Audit Progress:
- [ ] Step 1: Check environment health
- [ ] Step 2: Fetch all topics
- [ ] Step 3: Audit configurations against best practices
- [ ] Step 4: Check metadata completeness
- [ ] Step 5: Detect orphaned topics
- [ ] Step 6: Run consistency checks
- [ ] Step 7: Generate report
  1. 检查环境健康状况,获取高层级汇总信息
  2. 获取所有主题及其配置
  3. 审计每个主题是否符合最佳实践(详见
    references/audit-rules.md
  4. 交叉验证元数据的完整性
  5. 检测孤立主题(无消费者的主题)
  6. 输出审计结果并给出优先级排序的建议

Step 1: Environment Overview

步骤1:环境概览

Use the Lenses MCP
check_environment_health
tool to get a quick summary:
  • Broker count, topic count, consumer count, connector count
  • Any existing issues flagged by Lenses
Expected output: Environment health summary with broker, topic and consumer counts.
Validation: If the environment is unhealthy or unreachable, stop and report the connection issue before proceeding.
使用Lenses MCP的
check_environment_health
工具快速获取汇总信息:
  • Broker数量、主题数量、消费者数量、连接器数量
  • Lenses标记的所有现存问题
预期输出:包含Broker、主题和消费者数量的环境健康状况汇总。
验证:如果环境不健康或无法访问,请先停止操作并报告连接问题,再继续后续步骤。

Step 2: Fetch All Topics

步骤2:获取所有主题

Use the Lenses MCP
list_topics
tool to retrieve all topics with their configurations in one call.
For topics that need deeper inspection, use:
  • get_topic
    for detailed config including partitions and consumers
  • get_topic_broker_configs
    for broker-level config overrides
  • get_topic_partitions
    for partition-level message counts and bytes
Expected output: Full list of topics with their configurations. If zero topics are returned, report this and stop.
使用Lenses MCP的
list_topics
工具一次性获取所有主题及其配置。
对于需要深入检查的主题,可使用以下工具:
  • get_topic
    :获取包含分区和消费者信息的详细配置
  • get_topic_broker_configs
    :获取Broker级别的配置覆盖项
  • get_topic_partitions
    :获取分区级别的消息数量和字节数
预期输出:包含所有主题配置的完整列表。如果未返回任何主题,请报告此情况并停止操作。

Step 3: Audit Configurations

步骤3:配置审计

For each topic, check against the thresholds in
references/audit-rules.md
:
  • Replication factor - RF=1 is critical, RF=2 is a warning in production
  • Retention policies - unbounded growth, too short or excessively long
  • Partition count - single-partition bottlenecks or excessive partitions
  • Compaction settings - compact without keys, delete for state topics
  • Naming conventions - must follow
    {domain}.{entity}.{event}
    pattern
针对每个主题,对照
references/audit-rules.md
中的阈值进行检查:
  • 副本因子(replication factor):生产环境中RF=1属于严重问题,RF=2属于警告级别
  • 保留策略:无限制增长、保留时间过短或过长
  • 分区数量:单分区瓶颈或分区数量过多
  • 压缩设置:无键值的压缩配置、状态主题使用删除策略
  • 命名规范:必须遵循
    {domain}.{entity}.{event}
    格式

Step 4: Metadata Completeness

步骤4:元数据完整性检查

Use the Lenses MCP
list_topic_metadata
tool to check:
  • Topics missing descriptions
  • Topics missing tags
  • Topics without registered schemas (key or value)
Use
list_datasets
with filters (
is_compacted
,
has_records
) to find anomalies.
使用Lenses MCP的
list_topic_metadata
工具检查:
  • 缺少描述的主题
  • 缺少标签的主题
  • 未注册Schema(键或值)的主题
使用带筛选条件的
list_datasets
is_compacted
has_records
)查找异常情况。

Step 5: Orphan Detection

步骤5:孤立主题检测

For each topic, use
list_consumer_groups_by_topic
to check for active consumers.
  • Warning: Topics with zero consumer groups (may be orphaned/dead)
  • Suggestion: Topics with only inactive/empty consumer groups
针对每个主题,使用
list_consumer_groups_by_topic
工具检查是否存在活跃消费者:
  • 警告:无消费者组的主题(可能是孤立/废弃主题)
  • 建议:仅包含非活跃/空消费者组的主题

Step 6: Consistency Checks

步骤6:一致性检查

  • Flag topics in the same domain with different retention policies
  • Flag topics in the same domain with different replication factors
  • Flag topics with inconsistent serialisation formats within a domain
  • 标记同一领域内保留策略不同的主题
  • 标记同一领域内副本因子不同的主题
  • 标记同一领域内序列化格式不一致的主题

Success Criteria

成功标准

Quantitative

量化标准

  • Triggers on 90% of topic-related queries (test with 10-20 varied phrasings)
  • Completes full audit in under 15 MCP tool calls
  • 0 failed MCP calls per run
  • 响应90%的主题相关查询(用10-20种不同表述测试)
  • 在15次以内的MCP工具调用中完成完整审计
  • 每次运行无MCP调用失败

Qualitative

定性标准

  • Report is actionable without follow-up questions from the user
  • Consistent severity categorisation (critical/warning/suggestion) across runs
  • Every finding includes a concrete remediation step
  • 输出的报告无需用户后续提问即可直接执行
  • 跨运行保持一致的严重程度分类(严重/警告/建议)
  • 每个问题都包含具体的修复步骤

Examples

示例

Example 1: Routine weekly audit

示例1:每周例行审计

User says: "Run a topic audit on the staging environment"
Actions:
  1. Check staging environment health via Lenses MCP
  2. Fetch all topics and configs
  3. Audit each topic against rules in
    references/audit-rules.md
  4. Check metadata completeness and orphaned topics Result: Full audit report with prioritised findings
用户说:“在预发布环境运行主题审计”
操作步骤:
  1. 通过Lenses MCP检查预发布环境健康状况
  2. 获取所有主题及其配置
  3. 对照
    references/audit-rules.md
    中的规则审计每个主题
  4. 检查元数据完整性和孤立主题 结果:包含优先级排序结果的完整审计报告

Example 2: Pre-deployment check

示例2:部署前检查

User says: "Check if my topic configs are production-ready"
Actions:
  1. Audit all topics for RF < 3, unbounded retention, single partitions
  2. Flag any critical issues that would block a production deployment Result: Report highlighting critical issues that must be fixed before go-live
用户说:“检查我的主题配置是否符合生产环境要求”
操作步骤:
  1. 审计所有主题是否存在RF < 3、无限制保留、单分区等问题
  2. 标记所有会阻碍生产部署的严重问题 结果:突出显示部署前必须修复的严重问题的报告

Example 3: Investigate a specific topic

示例3:排查特定主题

User says: "Is the orders.payment.completed topic configured correctly?"
Actions:
  1. Fetch detailed config for the specific topic using
    get_topic
  2. Check broker-level overrides with
    get_topic_broker_configs
  3. Verify metadata and consumer groups Result: Focused report on a single topic with all findings
用户说:“orders.payment.completed主题的配置是否正确?”
操作步骤:
  1. 使用
    get_topic
    获取该特定主题的详细配置
  2. 使用
    get_topic_broker_configs
    检查Broker级别的覆盖配置
  3. 验证元数据和消费者组 结果:针对单个主题的聚焦式报告,包含所有检查结果

Troubleshooting

故障排除

Lenses MCP connection failed

Lenses MCP连接失败

Cause: Environment name is incorrect or Lenses agent is offline. Solution: Run
check_environment_health
first. Verify the environment name matches what
list_environments
returns.
原因:环境名称不正确或Lenses代理离线。 解决方案:先运行
check_environment_health
。验证环境名称与
list_environments
返回的名称一致。

No topics returned

未返回任何主题

Cause: Environment exists but has no topics or permissions are restricted. Solution: Confirm the cluster has topics via the Lenses UI. Check that the Lenses agent has read access.
原因:环境存在但无主题,或权限受限。 解决方案:通过Lenses UI确认集群存在主题。检查Lenses代理是否具有读取权限。

Metadata endpoint returns empty

元数据端点返回空值

Cause: Schema Registry is not configured or topics have no registered schemas. Solution: This is a valid finding - report it as missing metadata rather than treating it as an error.
原因:Schema Registry未配置,或主题未注册Schema。 解决方案:这是有效的审计发现——将其报告为缺失元数据,而非错误。

Output Format

输出格式

undefined
undefined

Topic Audit Report

Topic Audit Report

Environment: {name}

Environment: {name}

  • Brokers: X | Topics: Y | Consumer groups: Z
  • Brokers: X | Topics: Y | Consumer groups: Z

Critical (must fix)

Critical (must fix)

  • [topic-name] Description of the issue Current: {current value} | Recommended: {recommended value}
  • [topic-name] Description of the issue Current: {current value} | Recommended: {recommended value}

Warning (should fix)

Warning (should fix)

  • [topic-name] Description of the issue Current: {current value} | Recommended: {recommended value}
  • [topic-name] Description of the issue Current: {current value} | Recommended: {recommended value}

Suggestion (consider improving)

Suggestion (consider improving)

  • [topic-name] Description of the issue Recommendation: How to fix it
  • [topic-name] Description of the issue Recommendation: How to fix it

Summary

Summary

  • X critical issues found
  • Y warnings found
  • Z suggestions found
  • Topics audited: N
  • Orphaned topics: M
undefined
  • X critical issues found
  • Y warnings found
  • Z suggestions found
  • Topics audited: N
  • Orphaned topics: M
undefined