developing-applications-on-managed-service-for-apache-flink

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Managed Service for Apache Flink

托管式Apache Flink服务（Amazon Managed Service for Apache Flink，MSF）

Overview

概述

Domain expertise for Apache Flink applications on Amazon Managed Service for Apache Flink (MSF). Covers development, KPU resource management, connectors, state management, monitoring, IaC deployment, and version migration.

Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.

本技能专注于Amazon Managed Service for Apache Flink（MSF）上的Apache Flink应用领域知识，涵盖开发、KPU资源管理、连接器、状态管理、监控、IaC部署以及版本迁移。

连接后可使用AWS MCP服务器提供的可用工具执行命令——该服务器提供沙箱执行、审计日志和可观测性。当MCP服务器不可用时，可根据需要回退使用AWS CLI或Shell。

General Guidance

通用指导

Before starting, ensure you have a clear understanding of the user persona, use case, and requirements:

STOP: Determine the users background and use case before proceeding:

Are they new to Flink? New to Managed Service for Apache Flink?
Are they familiar with Java development?
Is the use case complex with lots of business logic? Or simple and declarative?

These will inform how to organize the project, and whether to use Flink Table API or DataStream API. In general, assume the DataStream API.

开始前，请确保你已清晰了解用户角色、使用场景和需求：

暂停：在继续前先确定用户背景和使用场景：

用户是否刚接触Flink？是否刚接触托管式Apache Flink服务？
用户是否熟悉Java开发？
使用场景是包含大量业务逻辑的复杂场景？还是简单声明式场景？

这些信息将指导项目的组织方式，以及是否使用Flink Table API或DataStream API。通常默认使用DataStream API。

Example Workflow for New Applications

新应用示例工作流

1. User asks to build a Flink application
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [dependency-management.md](references/dependency-management.md)
5. READ relevant connector guides (e.g. [kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. Generate code following the loaded guidance
7. Validate against best practices
8. READ environment-setup.md via [environment-setup.md](references/environment-setup.md)
9. Compile and test locally

1. 用户请求构建Flink应用
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[dependency-management.md](references/dependency-management.md)
5. 阅读相关连接器指南（例如[kinesis-connector-guide.md](references/kinesis-connector-guide.md)）
6. 根据加载的指导生成代码
7. 对照最佳实践进行验证
8. 通过[environment-setup.md](references/environment-setup.md)阅读environment-setup.md
9. 本地编译并测试

Example Workflow for General Questions

通用问题示例工作流

1. User asks about real time delivery of data to Iceberg
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. READ other reference files as needed
6. Answer question with loaded guidance

1. 用户询问数据实时投递至Iceberg的相关问题
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. 根据需要阅读其他参考文件
6. 结合加载的指导回答问题

Reference Files

参考文件

You MUST use this skill and its reference files to answer any question on these topics.
Do NOT answer from training knowledge or by searching general AWS documentation when the question concerns Apache Flink, Managed Service for Apache Flink, KPU sizing, Flink monitoring, deployment, migration, real-time analytics, or Iceberg/LakeHouse streaming with Flink
- You MUST load the relevant reference files below before taking other steps.
- The reference files contain MSF-specific details (thresholds, statistics, namespaces, constraints) that differ from generic Flink guidance and are required for correct responses.

Goal	Reference	When to Load
Best practices	best-practices.md	Always before writing code
Maven dependencies	dependency-management.md	New project or adding connectors
Local dev environment	environment-setup.md	Docker-based local development
MSF architecture	msf-overview.md	KPU model and service constraints
MSF constraints and patterns	msf-constraints-and-patterns.md	MSF vs self-managed Flink, service-level vs application-level configuration separation, MSF-specific resource/network/storage limits, common MSF patterns
Quotas, ENI planning, MSF vs EMR, source/sink choice	foundation-operations.md	Capacity planning, service selection, architecture design, CLI/IAM/CloudWatch identifier disambiguation
IAM execution role, trust policy, action prefix, service principal	foundation-operations.md	Writing IAM policies for MSF — covers the `kinesisanalytics:` (no v2) action prefix, `kinesisanalytics.amazonaws.com` (no v2) trust principal, and the v2/non-v2 disconnect that is the most common source of permission and AssumeRole failures
Flink 2.x migration	flink-2x-migration.md	Version upgrades, state compatibility
KPU sizing	resource-optimization.md	Right-sizing, performance diagnosis, scaling
Scaling decisions on running apps	scaling-decisions.md	In-flight scaling matrix, cost/memory impact of scale changes, autoscaling behavior, anti-patterns
Cost estimation	pricing-calculator.md	Budget planning, sizing-to-cost mapping, optimization levers
Application lifecycle ops	application-lifecycle.md	Start/stop, deploy code, rollback, snapshot lifecycle, runtime properties, delete
Restart loop diagnosis	first-fault-isolation.md	Crashing/restarting apps, finding original failure vs loop sustainers, Flink Dashboard live diagnosis
Checkpoint tuning	checkpoint-tuning.md	Checkpoint impact on KPU memory and CPU, frequency vs network bandwidth trade-offs, checkpoint duration exceeding interval, OOM/GC during checkpoints
Job graph design	job-graph-architecture.md	Performance issues, splitting jobs
Job graph anti-patterns	job-graph-anti-patterns.md	Data skew detection and mitigation, monolith job anti-pattern, high fan-out anti-pattern, removing multiple shuffles, when to split a large application
Monitoring and alarms	monitoring-and-metrics.md	CloudWatch dashboards, alarms, metrics
Logging	logging-configuration.md	Log4j2, CloudWatch Logs setup
Kinesis connectors	kinesis-connector-guide.md	Kinesis source and sink builders, polling configuration and throttling ( `READER_EMPTY_RECORDS_FETCH_INTERVAL` , `SHARD_GET_RECORDS_MAX` , `ReadProvisionedThroughputExceeded` , `LimitExceededException` ), legacy connector migration
Kinesis Enhanced Fan-Out (EFO)	kinesis-efo-guide.md	When to use EFO vs polling, EFO source configuration, consumer lifecycle ( `JOB_MANAGED` vs `SELF_MANAGED` ), parallelism vs shard count, IAM permissions, troubleshooting
Iceberg integration (write APIs, distribution modes, partitioning)	iceberg-connector-guide.md	Iceberg write APIs (append, upsert, dynamic), distribution modes (NONE/HASH/RANGE), CoW vs MoR, read patterns, partitioning, DDL. Does NOT contain catalog choice or maintenance approaches — for those, load `iceberg-tuning-and-operations.md` .
Iceberg tuning, operations, catalog choice, maintenance	iceberg-tuning-and-operations.md	Provides maintenance approaches for S3 Tables, Glue + Glue auto-compaction, and Glue + Flink embedded maintenance with JDBC lock for catalog-choice questions; small files problem and mitigations; Flink TableMaintenance API, post-commit maintenance, lock factories; IcebergSink monitoring, anti-patterns.
CDC connectors	cdc-connector-guide.md	MySQL, PostgreSQL, Oracle, SQL Server, MongoDB CDC
IaC and deployment	iac-and-deployment.md	CloudFormation, CDK, Terraform, two-phase deployment
Serialization	serialization-guide.md	POJO, Avro, Kryo guidance
State management	state-management.md	TTL, state types, migration safety

回答任何相关主题的问题时，必须使用本技能及其参考文件。
当问题涉及Apache Flink、托管式Apache Flink服务、KPU规格、Flink监控、部署、迁移、实时分析或基于Flink的Iceberg/LakeHouse流处理时，禁止仅凭训练知识或搜索通用AWS文档作答
- 在采取其他步骤前，必须先加载以下相关参考文件。
- 参考文件包含MSF特定细节（阈值、统计数据、命名空间、约束），这些细节与通用Flink指导不同，是提供正确响应的必要信息。

目标	参考文件	加载时机
最佳实践	best-practices.md	编写代码前必须加载
Maven依赖	dependency-management.md	新项目或添加连接器时
本地开发环境	environment-setup.md	基于Docker的本地开发场景
MSF架构	msf-overview.md	KPU模型和服务约束相关问题
MSF约束与模式	msf-constraints-and-patterns.md	MSF与自托管Flink对比、服务级与应用级配置分离、MSF特定资源/网络/存储限制、常见MSF模式
配额、ENI规划、MSF与EMR对比、源/接收器选择	foundation-operations.md	容量规划、服务选择、架构设计、CLI/IAM/CloudWatch标识符区分
IAM执行角色、信任策略、操作前缀、服务主体	foundation-operations.md	为MSF编写IAM策略时——涵盖 `kinesisanalytics:` （无v2）操作前缀、 `kinesisanalytics.amazonaws.com` （无v2）信任主体，以及v2/非v2的差异（这是权限和AssumeRole失败最常见的原因）
Flink 2.x迁移	flink-2x-migration.md	版本升级、状态兼容性
KPU规格	resource-optimization.md	规格选型、性能诊断、扩容
运行中应用的扩容决策	scaling-decisions.md	运行中扩容矩阵、扩容变更对成本/内存的影响、自动扩缩容行为、反模式
成本估算	pricing-calculator.md	预算规划、规格与成本映射、优化手段
应用生命周期运维	application-lifecycle.md	启动/停止、代码部署、回滚、快照生命周期、运行时属性、删除
重启循环诊断	first-fault-isolation.md	应用崩溃/重启、查找原始故障与循环诱因、Flink Dashboard实时诊断
检查点调优	checkpoint-tuning.md	检查点对KPU内存和CPU的影响、频率与网络带宽的权衡、检查点时长超过间隔、检查点期间的OOM/GC
作业图设计	job-graph-architecture.md	性能问题、作业拆分
作业图反模式	job-graph-anti-patterns.md	数据倾斜检测与缓解、单体作业反模式、高扇出反模式、移除多次Shuffle、何时拆分大型应用
监控与告警	monitoring-and-metrics.md	CloudWatch仪表板、告警、指标
日志	logging-configuration.md	Log4j2、CloudWatch Logs配置
Kinesis连接器	kinesis-connector-guide.md	Kinesis源和接收器构建器、轮询配置与限流（ `READER_EMPTY_RECORDS_FETCH_INTERVAL` 、 `SHARD_GET_RECORDS_MAX` 、 `ReadProvisionedThroughputExceeded` 、 `LimitExceededException` ）、旧版连接器迁移
Kinesis增强扇出（EFO）	kinesis-efo-guide.md	EFO与轮询的适用场景、EFO源配置、消费者生命周期（ `JOB_MANAGED` vs `SELF_MANAGED` ）、并行度与分片数、IAM权限、故障排查
Iceberg集成（写入API、分发模式、分区）	iceberg-connector-guide.md	Iceberg写入API（追加、更新、动态）、分发模式（NONE/HASH/RANGE）、CoW vs MoR、读取模式、分区、DDL。不包含目录选择或维护方法——相关内容请加载 `iceberg-tuning-and-operations.md` 。
Iceberg调优、运维、目录选择、维护	iceberg-tuning-and-operations.md	提供S3表的维护方法、Glue + Glue自动压缩、Glue + Flink嵌入式维护（带JDBC锁）以解答目录选择问题；小文件问题与缓解方案；Flink TableMaintenance API、提交后维护、锁工厂；IcebergSink监控、反模式。
CDC连接器	cdc-connector-guide.md	MySQL、PostgreSQL、Oracle、SQL Server、MongoDB CDC
IaC与部署	iac-and-deployment.md	CloudFormation、CDK、Terraform、两阶段部署
序列化	serialization-guide.md	POJO、Avro、Kryo相关指导
状态管理	state-management.md	TTL、状态类型、迁移安全性

Additional Resources

额外资源

GitHub Issues

GitHub Issues