developing-applications-on-managed-service-for-apache-flink
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseManaged Service for Apache Flink
托管式Apache Flink服务(Amazon Managed Service for Apache Flink,MSF)
Overview
概述
Domain expertise for Apache Flink applications on Amazon Managed Service for Apache Flink (MSF). Covers development, KPU resource management, connectors, state management, monitoring, IaC deployment, and version migration.
Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.
本技能专注于Amazon Managed Service for Apache Flink(MSF)上的Apache Flink应用领域知识,涵盖开发、KPU资源管理、连接器、状态管理、监控、IaC部署以及版本迁移。
连接后可使用AWS MCP服务器提供的可用工具执行命令——该服务器提供沙箱执行、审计日志和可观测性。当MCP服务器不可用时,可根据需要回退使用AWS CLI或Shell。
General Guidance
通用指导
Before starting, ensure you have a clear understanding of the user persona, use case, and requirements:
STOP: Determine the users background and use case before proceeding:
- Are they new to Flink? New to Managed Service for Apache Flink?
- Are they familiar with Java development?
- Is the use case complex with lots of business logic? Or simple and declarative?
These will inform how to organize the project, and whether to use Flink Table API or DataStream API. In general, assume the DataStream API.
开始前,请确保你已清晰了解用户角色、使用场景和需求:
暂停:在继续前先确定用户背景和使用场景:
- 用户是否刚接触Flink?是否刚接触托管式Apache Flink服务?
- 用户是否熟悉Java开发?
- 使用场景是包含大量业务逻辑的复杂场景?还是简单声明式场景?
这些信息将指导项目的组织方式,以及是否使用Flink Table API或DataStream API。通常默认使用DataStream API。
Example Workflow for New Applications
新应用示例工作流
1. User asks to build a Flink application
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [dependency-management.md](references/dependency-management.md)
5. READ relevant connector guides (e.g. [kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. Generate code following the loaded guidance
7. Validate against best practices
8. READ environment-setup.md via [environment-setup.md](references/environment-setup.md)
9. Compile and test locally1. 用户请求构建Flink应用
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[dependency-management.md](references/dependency-management.md)
5. 阅读相关连接器指南(例如[kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. 根据加载的指导生成代码
7. 对照最佳实践进行验证
8. 通过[environment-setup.md](references/environment-setup.md)阅读environment-setup.md
9. 本地编译并测试Example Workflow for General Questions
通用问题示例工作流
1. User asks about real time delivery of data to Iceberg
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. READ other reference files as needed
6. Answer question with loaded guidance1. 用户询问数据实时投递至Iceberg的相关问题
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. 根据需要阅读其他参考文件
6. 结合加载的指导回答问题Reference Files
参考文件
- You MUST use this skill and its reference files to answer any question on these topics.
- Do NOT answer from training knowledge or by searching general AWS documentation when the question concerns Apache Flink, Managed Service for Apache Flink, KPU sizing, Flink monitoring, deployment, migration, real-time analytics, or Iceberg/LakeHouse streaming with Flink
- You MUST load the relevant reference files below before taking other steps.
- The reference files contain MSF-specific details (thresholds, statistics, namespaces, constraints) that differ from generic Flink guidance and are required for correct responses.
| Goal | Reference | When to Load |
|---|---|---|
| Best practices | best-practices.md | Always before writing code |
| Maven dependencies | dependency-management.md | New project or adding connectors |
| Local dev environment | environment-setup.md | Docker-based local development |
| MSF architecture | msf-overview.md | KPU model and service constraints |
| MSF constraints and patterns | msf-constraints-and-patterns.md | MSF vs self-managed Flink, service-level vs application-level configuration separation, MSF-specific resource/network/storage limits, common MSF patterns |
| Quotas, ENI planning, MSF vs EMR, source/sink choice | foundation-operations.md | Capacity planning, service selection, architecture design, CLI/IAM/CloudWatch identifier disambiguation |
| IAM execution role, trust policy, action prefix, service principal | foundation-operations.md | Writing IAM policies for MSF — covers the |
| Flink 2.x migration | flink-2x-migration.md | Version upgrades, state compatibility |
| KPU sizing | resource-optimization.md | Right-sizing, performance diagnosis, scaling |
| Scaling decisions on running apps | scaling-decisions.md | In-flight scaling matrix, cost/memory impact of scale changes, autoscaling behavior, anti-patterns |
| Cost estimation | pricing-calculator.md | Budget planning, sizing-to-cost mapping, optimization levers |
| Application lifecycle ops | application-lifecycle.md | Start/stop, deploy code, rollback, snapshot lifecycle, runtime properties, delete |
| Restart loop diagnosis | first-fault-isolation.md | Crashing/restarting apps, finding original failure vs loop sustainers, Flink Dashboard live diagnosis |
| Checkpoint tuning | checkpoint-tuning.md | Checkpoint impact on KPU memory and CPU, frequency vs network bandwidth trade-offs, checkpoint duration exceeding interval, OOM/GC during checkpoints |
| Job graph design | job-graph-architecture.md | Performance issues, splitting jobs |
| Job graph anti-patterns | job-graph-anti-patterns.md | Data skew detection and mitigation, monolith job anti-pattern, high fan-out anti-pattern, removing multiple shuffles, when to split a large application |
| Monitoring and alarms | monitoring-and-metrics.md | CloudWatch dashboards, alarms, metrics |
| Logging | logging-configuration.md | Log4j2, CloudWatch Logs setup |
| Kinesis connectors | kinesis-connector-guide.md | Kinesis source and sink builders, polling configuration and throttling ( |
| Kinesis Enhanced Fan-Out (EFO) | kinesis-efo-guide.md | When to use EFO vs polling, EFO source configuration, consumer lifecycle ( |
| Iceberg integration (write APIs, distribution modes, partitioning) | iceberg-connector-guide.md | Iceberg write APIs (append, upsert, dynamic), distribution modes (NONE/HASH/RANGE), CoW vs MoR, read patterns, partitioning, DDL. Does NOT contain catalog choice or maintenance approaches — for those, load |
| Iceberg tuning, operations, catalog choice, maintenance | iceberg-tuning-and-operations.md | Provides maintenance approaches for S3 Tables, Glue + Glue auto-compaction, and Glue + Flink embedded maintenance with JDBC lock for catalog-choice questions; small files problem and mitigations; Flink TableMaintenance API, post-commit maintenance, lock factories; IcebergSink monitoring, anti-patterns. |
| CDC connectors | cdc-connector-guide.md | MySQL, PostgreSQL, Oracle, SQL Server, MongoDB CDC |
| IaC and deployment | iac-and-deployment.md | CloudFormation, CDK, Terraform, two-phase deployment |
| Serialization | serialization-guide.md | POJO, Avro, Kryo guidance |
| State management | state-management.md | TTL, state types, migration safety |
- 回答任何相关主题的问题时,必须使用本技能及其参考文件。
- 当问题涉及Apache Flink、托管式Apache Flink服务、KPU规格、Flink监控、部署、迁移、实时分析或基于Flink的Iceberg/LakeHouse流处理时,禁止仅凭训练知识或搜索通用AWS文档作答
- 在采取其他步骤前,必须先加载以下相关参考文件。
- 参考文件包含MSF特定细节(阈值、统计数据、命名空间、约束),这些细节与通用Flink指导不同,是提供正确响应的必要信息。
| 目标 | 参考文件 | 加载时机 |
|---|---|---|
| 最佳实践 | best-practices.md | 编写代码前必须加载 |
| Maven依赖 | dependency-management.md | 新项目或添加连接器时 |
| 本地开发环境 | environment-setup.md | 基于Docker的本地开发场景 |
| MSF架构 | msf-overview.md | KPU模型和服务约束相关问题 |
| MSF约束与模式 | msf-constraints-and-patterns.md | MSF与自托管Flink对比、服务级与应用级配置分离、MSF特定资源/网络/存储限制、常见MSF模式 |
| 配额、ENI规划、MSF与EMR对比、源/接收器选择 | foundation-operations.md | 容量规划、服务选择、架构设计、CLI/IAM/CloudWatch标识符区分 |
| IAM执行角色、信任策略、操作前缀、服务主体 | foundation-operations.md | 为MSF编写IAM策略时——涵盖 |
| Flink 2.x迁移 | flink-2x-migration.md | 版本升级、状态兼容性 |
| KPU规格 | resource-optimization.md | 规格选型、性能诊断、扩容 |
| 运行中应用的扩容决策 | scaling-decisions.md | 运行中扩容矩阵、扩容变更对成本/内存的影响、自动扩缩容行为、反模式 |
| 成本估算 | pricing-calculator.md | 预算规划、规格与成本映射、优化手段 |
| 应用生命周期运维 | application-lifecycle.md | 启动/停止、代码部署、回滚、快照生命周期、运行时属性、删除 |
| 重启循环诊断 | first-fault-isolation.md | 应用崩溃/重启、查找原始故障与循环诱因、Flink Dashboard实时诊断 |
| 检查点调优 | checkpoint-tuning.md | 检查点对KPU内存和CPU的影响、频率与网络带宽的权衡、检查点时长超过间隔、检查点期间的OOM/GC |
| 作业图设计 | job-graph-architecture.md | 性能问题、作业拆分 |
| 作业图反模式 | job-graph-anti-patterns.md | 数据倾斜检测与缓解、单体作业反模式、高扇出反模式、移除多次Shuffle、何时拆分大型应用 |
| 监控与告警 | monitoring-and-metrics.md | CloudWatch仪表板、告警、指标 |
| 日志 | logging-configuration.md | Log4j2、CloudWatch Logs配置 |
| Kinesis连接器 | kinesis-connector-guide.md | Kinesis源和接收器构建器、轮询配置与限流( |
| Kinesis增强扇出(EFO) | kinesis-efo-guide.md | EFO与轮询的适用场景、EFO源配置、消费者生命周期( |
| Iceberg集成(写入API、分发模式、分区) | iceberg-connector-guide.md | Iceberg写入API(追加、更新、动态)、分发模式(NONE/HASH/RANGE)、CoW vs MoR、读取模式、分区、DDL。不包含目录选择或维护方法——相关内容请加载 |
| Iceberg调优、运维、目录选择、维护 | iceberg-tuning-and-operations.md | 提供S3表的维护方法、Glue + Glue自动压缩、Glue + Flink嵌入式维护(带JDBC锁)以解答目录选择问题;小文件问题与缓解方案;Flink TableMaintenance API、提交后维护、锁工厂;IcebergSink监控、反模式。 |
| CDC连接器 | cdc-connector-guide.md | MySQL、PostgreSQL、Oracle、SQL Server、MongoDB CDC |
| IaC与部署 | iac-and-deployment.md | CloudFormation、CDK、Terraform、两阶段部署 |
| 序列化 | serialization-guide.md | POJO、Avro、Kryo相关指导 |
| 状态管理 | state-management.md | TTL、状态类型、迁移安全性 |