migrate-to-msk
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMigrating to MSK Express
迁移至MSK Express
Overview
概述
This skill helps customers migrate self-managed Apache Kafka workloads to Amazon MSK
Express. It provides two independent phases — Discovery and Assessment —
that can be run end-to-end or individually depending on the customer's needs.
本技能可帮助客户将自管理的Apache Kafka工作负载迁移至Amazon MSK Express。它提供两个独立阶段——发现和评估——客户可根据自身需求选择端到端运行或单独运行任一阶段。
Scope
适用范围
This skill covers migrations from self-managed Apache Kafka (on-premises, EC2,
Docker, Kubernetes, or other non-MSK deployments) to MSK Express. Migrations from
MSK Standard (Provisioned) to MSK Express are out of scope.
本技能涵盖从自管理Apache Kafka(本地部署、EC2、Docker、Kubernetes或其他非MSK部署)到MSK Express的迁移。从MSK标准型(预配置)到MSK Express的迁移不在本技能范围内。
Prerequisites
前提条件
The AWS MCP server is recommended for documentation lookups and informational
questions, but is not required. The assessment scripts are pure file processors
with no AWS API calls.
建议使用AWS MCP服务器进行文档查询和信息类问题解答,但并非必需。评估脚本为纯文件处理器,无需调用AWS API。
Intent Routing
意图路由
Route the customer's request based on their intent:
根据客户的意图路由其请求:
1. Open/exploratory question ("How do I migrate to MSK?")
1. 开放式/探索性问题("如何迁移至MSK?")
Explain what this skill offers:
This skill helps you migrate to MSK Express in two phases:Phase 1 — Discovery: Inventory your source Kafka cluster — brokers, topics, partition counts, configs, authentication, and workload metrics. I can discover this from IaC files (Terraform, CDK, Docker Compose, Kubernetes manifests), provide commands for you to run on your cluster, or you can provide the information manually. Output:.migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.jsonPhase 2 — Assessment: Validate your cluster against MSK Express across 5 compatibility pillars (topology, Kafka version, configs, auth, quotas) and produce a target Express specification using the AWS-published MSK Sizing/Pricing workbook. I'll flag what Express will refuse vs what Express will silently convert. Outputs:, the filledcompatibility.<cluster_name>.json, andMSK_Sizing_Pricing.<cluster_name>.xlsx.msk-sizing-inputs.<cluster_name>.jsonData replication: For migrating data to your Express cluster, you can use MSK Replicator. I can provide guidance on setup and configuration.Where would you like to start? I can begin with discovery if you point me to your infrastructure code or describe your cluster, or jump to assessment if you already have afile.cluster-config.json
Guardrails for this overview response:
- This response is an overview and a routing question only. Do NOT begin, simulate, or pre-empt any phase.
- Do NOT produce or estimate assessment output here — no verdicts, pillar findings, compatibility conclusions, broker counts, instance recommendations, or cost figures. Those values exist only after you run the Phase 2 scripts against a real .
cluster-config.json - Do NOT open, read, or summarize the internals of ,
compatibility.py, or the reference files to explain how a phase works. Describe the phases at the level shown above; do not walk the customer through the implementation.sizing.py - When the customer chooses a phase, run that phase's scripts or flow to produce real results. Always operate the skill to answer — never answer from having read its source. For the exact commands, see "Running the assessment" in references/assessment-compatibility.md for Phase 2.
说明本技能提供的功能:
本技能可通过两个阶段帮助您迁移至MSK Express:阶段1 — 发现: 盘点您的源Kafka集群——代理、主题、分区数量、配置、认证和工作负载指标。 我可以从IaC文件(Terraform、CDK、Docker Compose、Kubernetes清单)中发现这些信息,提供供您在集群上运行的命令,或者您也可以手动提供相关信息。输出:。migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json阶段2 — 评估: 从5个兼容性支柱(拓扑、Kafka版本、配置、认证、配额)验证您的集群与MSK Express的兼容性,并使用AWS发布的MSK规格/定价工作簿生成目标Express规格。 我会标记MSK Express会拒绝的内容以及会自动转换的内容。输出:、已填写的compatibility.<cluster_name>.json和MSK_Sizing_Pricing.<cluster_name>.xlsx。msk-sizing-inputs.<cluster_name>.json数据复制: 如需将数据迁移至您的Express集群,您可以使用MSK Replicator。我可以提供设置和配置方面的指导。您想从哪里开始?如果您提供基础设施代码或描述您的集群,我可以从发现阶段开始;如果您已有文件,也可以直接进入评估阶段。cluster-config.json
此概述回复的约束规则:
- 此回复仅为概述和路由问题。请勿开始、模拟或预先执行任何阶段。
- 请勿在此处生成或估算评估输出——包括任何结论、支柱发现、兼容性结论、代理数量、实例建议或成本数据。这些值仅在针对真实的运行阶段2脚本后才会生成。
cluster-config.json - 请勿打开、读取或总结、
compatibility.py或参考文件的内部内容来解释某个阶段的工作原理。仅按上述层级描述阶段;请勿引导客户了解实现细节。sizing.py - 当客户选择某个阶段时,运行该阶段的脚本或流程以生成真实结果。始终通过运行技能来回答问题——切勿仅凭阅读其源代码作答。有关确切命令,请参阅阶段2的references/assessment-compatibility.md中的“运行评估”部分。
2. Discovery intent (DEFAULT when IaC files are provided)
2. 发现意图(提供IaC文件时的默认意图)
If the customer provides a directory path, IaC files, or says "here's our infra" —
this is discovery intent. Run ONLY Phase 1 (Discovery). Do NOT run assessment,
do NOT suggest migration steps, do NOT mention blockers or compatibility.
Produce the file and stop.
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json如果客户提供目录路径、IaC文件或表示“这是我们的基础设施”——则为发现意图。仅运行阶段1(发现)。请勿运行评估,请勿建议迁移步骤,请勿提及障碍或兼容性问题。生成文件后即停止。
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json3. Assessment intent
3. 评估意图
Customer explicitly asks to assess or has a file
already produced. Run Phase 2 (Assessment) only.
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json客户明确要求进行评估,或已有生成好的文件。仅运行阶段2(评估)。
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json4. Informational questions
4. 信息类问题
Customer asks about Express capabilities, constraints, configuration differences,
authentication support, pricing, or compaction behavior without providing
cluster-specific data. Use AWS documentation tools (,
) if available to look up the answer from MSK Express
documentation. If MCP tools are not available, reference the
MSK Express documentation
and answer based on knowledge of AWS MSK.
aws___search_documentationaws___read_documentation客户询问Express的功能、限制、配置差异、认证支持、定价或压缩行为,但未提供集群特定数据。如果可用,请使用AWS文档工具(、)从MSK Express文档中查找答案。如果无法使用MCP工具,请参考MSK Express文档并基于AWS MSK相关知识作答。
aws___search_documentationaws___read_documentation5. Migration strategy questions
5. 迁移策略问题
Customer asks about MSK Replicator compatibility, version upgrade paths, MirrorMaker 2,
or migration strategies. MSK Replicator is the native AWS-supported solution for data
replication and works for both MSK-to-MSK and non-MSK-to-MSK migrations. Use AWS
documentation tools (, ) if
available to retrieve current requirements and supported configurations. If MCP tools
are not available, reference the
MSK Replicator documentation
and answer based on knowledge of AWS MSK.
aws___search_documentationaws___read_documentation客户询问MSK Replicator兼容性、版本升级路径、MirrorMaker 2或迁移策略。MSK Replicator是AWS原生支持的数据复制解决方案,适用于MSK到MSK以及非MSK到MSK的迁移。如果可用,请使用AWS文档工具(、)获取当前要求和支持的配置。如果无法使用MCP工具,请参考MSK Replicator文档并基于AWS MSK相关知识作答。
aws___search_documentationaws___read_documentationPhase 1 — Discovery
阶段1 — 发现
Purpose: Inventory the source cluster to build a migration profile.
Input: One of:
- A directory path containing IaC files (CDK, CloudFormation, Docker Compose, Kubernetes manifests, Terraform)
- Output from Kafka CLI commands the customer runs on their cluster
- Manual information provided by the customer in conversation
Output: — saved to the working directory.
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json目的: 盘点源集群以构建迁移配置文件。
输入: 以下任意一种:
- 包含IaC文件的目录路径(CDK、CloudFormation、Docker Compose、Kubernetes清单、Terraform)
- 客户在其集群上运行Kafka CLI命令的输出
- 客户在对话中手动提供的信息
输出: — 保存至工作目录。
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.jsonMANDATORY first step for discovery
发现阶段的强制性第一步
Before doing ANYTHING else in discovery, you MUST read the reference file:
(located at the skill path shown above).
references/discovery.mdUse to read the full content of . This file
contains the REQUIRED response template and JSON schema. You MUST follow the
template exactly — your response format, forbidden content, and JSON structure
are all defined there. Do NOT respond until you have read this file.
file_readreferences/discovery.md在发现阶段执行任何操作之前,您必须阅读参考文件:(位于上述技能路径中)。
references/discovery.md使用读取的完整内容。该文件包含必填的回复模板和JSON架构。您必须严格遵循该模板——回复格式、禁止内容和JSON结构均在此定义。在读取此文件之前请勿回复。
file_readreferences/discovery.mdDiscovery methods
发现方法
-
IaC analysis — Read infrastructure files and extract cluster metadata.
-
Kafka CLI commands — Display standard Kafka CLI commands for the customer to run on their cluster (kafka-topics.sh, kafka-configs.sh, kafka-broker-api-versions.sh). Do NOT generate or offer Python scripts.
-
Runtime metrics intake — Ingest metrics provided by the customer.
-
Manual conversation — Ask the customer for cluster details.
-
IaC分析 — 读取基础设施文件并提取集群元数据。
-
Kafka CLI命令 — 显示供客户在其集群上运行的标准Kafka CLI命令(kafka-topics.sh、kafka-configs.sh、kafka-broker-api-versions.sh)。请勿生成或提供Python脚本。
-
运行时指标收集 — 接收客户提供的指标。
-
手动对话 — 向客户询问集群详细信息。
Discovery rules
发现规则
- You MUST read before responding.
references/discovery.md - Follow the response template from that file EXACTLY.
- ALWAYS save in the working directory.
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json - Do NOT proceed to Phase 2 without explicit customer confirmation.
- 您必须先阅读再回复。
references/discovery.md - 严格遵循该文件中的回复模板。
- 始终将保存至工作目录。
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json - 未经客户明确确认,请勿进入阶段2。
Phase 2 — Assessment
阶段2 — 评估
Purpose: Assess the cluster against MSK Express requirements and produce a target
Express specification (instance type, broker count, monthly cost projection).
Input: from Phase 1.
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.jsonOutputs:
- — five-pillar verdict.
migrate-to-msk-skill-artifacts/<cluster_name>/compatibility.<cluster_name>.json - — the AWS-published MSK Sizing/Pricing workbook (downloaded by the agent) with the six workload inputs filled into the
migrate-to-msk-skill-artifacts/<cluster_name>/MSK_Sizing_Pricing.<cluster_name>.xlsxsheet. Open it to read the broker count and cost recommendations.MSK Provisioned - — a record of the six input values and the cell each maps to.
migrate-to-msk-skill-artifacts/<cluster_name>/msk-sizing-inputs.<cluster_name>.json
Assessment is implemented as two file processors (no live AWS API calls):
- — five-pillar compatibility assessment.
scripts/compatibility.py - — computes the six workbook inputs from the discovery contract and fills them into the AWS-published workbook the agent downloads.
scripts/sizing.py
Both run via with PEP 723 inline dependencies. For the exact
invocation commands, see "Running the assessment" in
references/assessment-compatibility.md.
uv run目的: 根据MSK Express的要求评估集群,并生成目标Express规格(实例类型、代理数量、月度成本预估)。
输入: 阶段1生成的。
migrate-to-msk-skill-artifacts/<cluster_name>/cluster-config.json输出:
- — 五支柱评估结论。
migrate-to-msk-skill-artifacts/<cluster_name>/compatibility.<cluster_name>.json - — AWS发布的MSK规格/定价工作簿(由代理下载),其中六个工作负载输入已填写至
migrate-to-msk-skill-artifacts/<cluster_name>/MSK_Sizing_Pricing.<cluster_name>.xlsx工作表。打开该文件即可查看代理数量和成本建议。MSK Provisioned - — 六个输入值及其对应单元格的记录。
migrate-to-msk-skill-artifacts/<cluster_name>/msk-sizing-inputs.<cluster_name>.json
评估通过两个文件处理器实现(无实时AWS API调用):
- — 五支柱兼容性评估。
scripts/compatibility.py - — 根据发现合约计算六个工作簿输入值,并将其填充到代理下载的AWS发布工作簿中。
scripts/sizing.py
两者均通过运行,依赖项内联遵循PEP 723。有关确切调用命令,请参阅references/assessment-compatibility.md中的“运行评估”部分。
uv runCompatibility pillars
兼容性支柱
compatibility.py- Topology — AZ count, broker count, KRaft vs ZooKeeper, per-cluster broker quota.
- Kafka version — source version against the Express supported set (3.6, 3.8, 3.9).
- Configs — broker- and topic-level configs against Express's editable, read-only,
range-restricted, and enforced-value sets (sourced from the Express broker configuration
documentation on ).
docs.aws.amazon.com/msk - Auth — checks the source's authentication mechanism against those MSK Express supports and surfaces any incompatibilities.
- Quotas — peak workload against absolute Express ceilings (per-broker ingress/ egress, partition count, IAM connection cap, per-partition throughput).
See references/assessment-compatibility.md
for the full pseudocode, evidence codes, and verdict mapping.
compatibility.py- 拓扑 — AZ数量、代理数量、KRaft vs ZooKeeper、每集群代理配额。
- Kafka版本 — 源版本与Express支持的版本集(3.6、3.8、3.9)对比。
- 配置 — 代理和主题级配置与Express的可编辑、只读、范围限制和强制值集对比(数据来源于上的Express代理配置文档)。
docs.aws.amazon.com/msk - 认证 — 检查源集群的认证机制是否为MSK Express所支持,并指出任何不兼容性。
- 配额 — 峰值工作负载与Express的绝对上限对比(每代理入站/出站流量、分区数量、IAM连接上限、每分区吞吐量)。
有关完整伪代码、证据代码和结论映射,请参阅references/assessment-compatibility.md。
Verdict vocabulary
结论词汇
Each pillar emits one of three verdicts; the overall is the worst across pillars.
| Verdict | Meaning |
|---|---|
| Your source cluster already lines up with MSK Express here. Surfaced for informational purposes. No action needed. |
| Your source cluster differs from MSK Express here, but Express handles this for you at the target by adjusting or replacing the setting. Migration can proceed; review it so the resulting behavior change is expected. |
| Identifies a configuration or condition that MSK Express is not expected to accept in its current form. Remediation on the source prior to migration is recommended. |
每个支柱会输出以下三种结论之一;整体结论取各支柱中最严重的结论。
| Verdict | 含义 |
|---|---|
| 您的源集群在此方面已符合MSK Express要求。仅作信息展示。无需操作。 |
| 您的源集群在此方面与MSK Express存在差异,但Express会在目标端自动调整或替换相关设置。迁移可继续进行;请查看该内容以预期后续行为变化。 |
| 发现MSK Express当前形式无法接受的配置或条件。建议在迁移前对源集群进行修复。 |
Sizing
规格调整
sizing.pysizing.py --workbook <downloaded.xlsx>MSK ProvisionedMSK_Sizing_Pricing.<cluster_name>.xlsxmsk-sizing-inputs.<cluster_name>.jsonsizing.pysizing.py --workbook <downloaded.xlsx>MSK ProvisionedMSK_Sizing_Pricing.<cluster_name>.xlsxmsk-sizing-inputs.<cluster_name>.jsonAssessment rules
评估规则
- Run and
compatibility.pyindependently; neither blocks the other.sizing.py - Surface any evidence to the user for awareness, but do not gate further phases on it. Express may still accept the workload with mitigations.
ACTION_REQUIRED - Do NOT pivot back into discovery. Assessment operates on the existing
as-is. Partial data is fine — the scripts emit ADVISORY evidence (
cluster-config.json,METRICS_MISSING, etc.) for missing fields; surface those findings and stop. Do not propose Kafka CLI commands, IaC walks, scripts, or questionnaires to fill the gaps. Full forbidden-behavior list in references/assessment-compatibility.md.AZ_COUNT_UNKNOWN - Your response MUST follow the assessment response template in references/assessment-compatibility.md (section "Response Template"). One template covers both artifacts. Do not freestyle the post-script summary — the template defines required sections, mandatory vocabulary (use the verdict strings verbatim), and forbidden content (no scores, no narrative editorializing, no in-prose cost / instance recommendations — the user reads those from the filled workbook).
- 独立运行和
compatibility.py;两者互不阻塞。sizing.py - 向用户展示所有证据以引起注意,但无需以此作为进入后续阶段的门槛。Express可能仍可通过缓解措施接受该工作负载。
ACTION_REQUIRED - 请勿转回发现阶段。 评估基于现有进行。数据不完整也无妨——脚本会针对缺失字段输出ADVISORY证据(
cluster-config.json、METRICS_MISSING等);展示这些发现后即停止。请勿提供Kafka CLI命令、IaC遍历、脚本或调查问卷来填补空白。完整的禁止行为列表请参阅references/assessment-compatibility.md。AZ_COUNT_UNKNOWN - 您的回复必须遵循references/assessment-compatibility.md中的评估回复模板(“回复模板”部分)。一个模板适用于所有产出物。请勿自由发挥脚本执行后的总结内容——模板定义了必填部分、强制词汇(请严格使用结论字符串)和禁止内容(无分数、无叙述性评论、无 prose 形式的成本/实例建议——用户需从已填写的工作簿中读取这些信息)。
Execution model
执行模型
Scripts run on the customer's local machine via . They declare their own
dependencies (PEP 723) and are pure file processors — no AWS API calls, no
network access, and no third-party dependencies (standard library only).
uv run脚本通过在客户本地机器上运行。它们声明了自己的依赖项(遵循PEP 723),且为纯文件处理器——无需调用AWS API、无网络访问、无第三方依赖(仅使用标准库)。
uv runSecurity Considerations
安全注意事项
Apply these controls at every phase. For additional detail, see
MSK Security best practices
and MSK IAM access control.
-
Encryption in transit (mandatory). Enforce TLS for client-broker traffic on the MSK Express target ().
EncryptionInTransit.ClientBroker = TLS -
Encryption at rest (mandatory). Provision the target cluster with a customer-managed KMS key (or AWS-managed if your compliance posture allows).
-
Authentication — prefer IAM over long-lived credentials. Configure the MSK Express target with IAM authentication as the sole client auth method. This gives ephemeral, role-based credentials with full CloudTrail coverage.
-
Credential storage — use AWS Secrets Manager. Store SASL/SCRAM and TLS credentials for source cluster access in Secrets Manager. Never pass passwords as CLI arguments.
-
Network isolation. Deploy MSK clusters in private subnets. Use security groups scoped to specific CIDR ranges or security group references. Do NOT use 0.0.0.0/0 ingress rules.
-
CloudTrail logging and CloudWatch alarms. Ensure CloudTrail is enabled in the target account and coversAPI calls. Configure alarms:
kafka.amazonaws.com- — surge indicates credential problems or attack
ClientAuthenticationFailure - — abnormal spike may indicate connection-flooding
ConnectionCloseCount - CloudTrail metric filters for denied actions
kafka-cluster:* - Connection-rate alarms approaching the 100 conn/sec/broker IAM limit
-
Sensitive data handling. Discovery and assessment outputs contain broker addresses, auth hints, and broker config values. Treat these as sensitive — do not paste into public channels or ticketing systems without redaction.
在每个阶段均应用以下控制措施。如需更多详细信息,请参阅MSK安全最佳实践和MSK IAM访问控制。
-
传输中加密(强制性)。 在MSK Express目标集群上强制要求客户端与代理之间的流量使用TLS()。
EncryptionInTransit.ClientBroker = TLS -
静态加密(强制性)。 使用客户管理的KMS密钥(或符合合规要求的AWS管理密钥)配置目标集群。
-
认证 — 优先使用IAM而非长期凭证。 将MSK Express目标集群配置为仅使用IAM认证作为客户端认证方法。这可提供临时的基于角色的凭证,并具备完整的CloudTrail覆盖。
-
凭证存储 — 使用AWS Secrets Manager。 将源集群访问的SASL/SCRAM和TLS凭证存储在Secrets Manager中。切勿将密码作为CLI参数传递。
-
网络隔离。 在私有子网中部署MSK集群。使用限定于特定CIDR范围或安全组引用的安全组。请勿使用0.0.0.0/0入站规则。
-
CloudTrail日志和CloudWatch告警。 确保目标账户中已启用CloudTrail并覆盖API调用。配置告警:
kafka.amazonaws.com- — 激增表明凭证存在问题或遭受攻击
ClientAuthenticationFailure - — 异常峰值可能表明存在连接泛洪
ConnectionCloseCount - 针对被拒绝的操作的CloudTrail指标过滤器
kafka-cluster:* - 接近每代理100 conn/sec IAM限制的连接速率告警
-
敏感数据处理。 发现和评估输出包含代理地址、认证提示和代理配置值。请将这些视为敏感数据——未经编辑请勿粘贴到公共渠道或工单系统中。
Troubleshooting
故障排除
Single-broker / single-AZ source. Topology pillar emits /
ADVISORY — Express auto-fixes at the target by deploying across 3
AZs with ≥3 brokers regardless of source.
BROKER_COUNT_LT_3AZ_COUNT_NOT_3Out-of-range topic configs. is the only
Express-rejected topic-config bound encoded in compatibility.py. Adjust on the
source before migration.
max.compaction.lag.ms < 1 dayWorkbook recommendations look blank or stale. The recommendation and cost
cells are workbook formulas; they populate once the filled workbook is opened
in Excel / LibreOffice / Sheets and its formulas recalculate. sets
so this happens automatically on open — if your spreadsheet
app has automatic recalculation disabled, trigger a manual recalculation.
sizing.pyfullCalcOnLoad单代理/单AZ源集群。 拓扑支柱会输出/ ADVISORY结论——Express会在目标端自动修复,无论源集群如何,都会部署在3个AZ中且代理数量≥3。
BROKER_COUNT_LT_3AZ_COUNT_NOT_3超出范围的主题配置。 是中唯一编码的Express拒绝的主题配置限制。请在迁移前调整源集群的该配置。
max.compaction.lag.ms < 1天compatibility.py工作簿建议显示空白或过时。 建议和成本单元格为工作簿公式;在Excel/LibreOffice/Sheets中打开已填写的工作簿并重新计算公式后,这些单元格会填充数据。设置了,因此打开时会自动重新计算——如果您的电子表格应用禁用了自动重新计算,请触发手动重新计算。
sizing.pyfullCalcOnLoad