developing-applications-on-managed-service-for-apache-flink

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Managed Service for Apache Flink

托管式Apache Flink服务(Amazon Managed Service for Apache Flink,MSF)

Overview

概述

Domain expertise for Apache Flink applications on Amazon Managed Service for Apache Flink (MSF). Covers development, KPU resource management, connectors, state management, monitoring, IaC deployment, and version migration.
Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.
本技能专注于Amazon Managed Service for Apache Flink(MSF)上的Apache Flink应用领域知识,涵盖开发、KPU资源管理、连接器、状态管理、监控、IaC部署以及版本迁移。
连接后可使用AWS MCP服务器提供的可用工具执行命令——该服务器提供沙箱执行、审计日志和可观测性。当MCP服务器不可用时,可根据需要回退使用AWS CLI或Shell。

General Guidance

通用指导

Before starting, ensure you have a clear understanding of the user persona, use case, and requirements:
STOP: Determine the users background and use case before proceeding:
  • Are they new to Flink? New to Managed Service for Apache Flink?
  • Are they familiar with Java development?
  • Is the use case complex with lots of business logic? Or simple and declarative?
These will inform how to organize the project, and whether to use Flink Table API or DataStream API. In general, assume the DataStream API.
开始前,请确保你已清晰了解用户角色、使用场景和需求:
暂停:在继续前先确定用户背景和使用场景:
  • 用户是否刚接触Flink?是否刚接触托管式Apache Flink服务?
  • 用户是否熟悉Java开发?
  • 使用场景是包含大量业务逻辑的复杂场景?还是简单声明式场景?
这些信息将指导项目的组织方式,以及是否使用Flink Table API或DataStream API。通常默认使用DataStream API。

Example Workflow for New Applications

新应用示例工作流

1. User asks to build a Flink application
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [dependency-management.md](references/dependency-management.md)
5. READ relevant connector guides (e.g. [kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. Generate code following the loaded guidance
7. Validate against best practices
8. READ environment-setup.md via [environment-setup.md](references/environment-setup.md)
9. Compile and test locally
1. 用户请求构建Flink应用
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[dependency-management.md](references/dependency-management.md)
5. 阅读相关连接器指南(例如[kinesis-connector-guide.md](references/kinesis-connector-guide.md))
6. 根据加载的指导生成代码
7. 对照最佳实践进行验证
8. 通过[environment-setup.md](references/environment-setup.md)阅读environment-setup.md
9. 本地编译并测试

Example Workflow for General Questions

通用问题示例工作流

1. User asks about real time delivery of data to Iceberg
2. Confirm user's goals and use case
3. READ [best-practices.md](references/best-practices.md)
4. READ [iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. READ other reference files as needed
6. Answer question with loaded guidance
1. 用户询问数据实时投递至Iceberg的相关问题
2. 确认用户目标和使用场景
3. 阅读[best-practices.md](references/best-practices.md)
4. 阅读[iceberg-connector-guide.md](references/iceberg-connector-guide.md)
5. 根据需要阅读其他参考文件
6. 结合加载的指导回答问题

Reference Files

参考文件

  • You MUST use this skill and its reference files to answer any question on these topics.
  • Do NOT answer from training knowledge or by searching general AWS documentation when the question concerns Apache Flink, Managed Service for Apache Flink, KPU sizing, Flink monitoring, deployment, migration, real-time analytics, or Iceberg/LakeHouse streaming with Flink
    • You MUST load the relevant reference files below before taking other steps.
    • The reference files contain MSF-specific details (thresholds, statistics, namespaces, constraints) that differ from generic Flink guidance and are required for correct responses.
GoalReferenceWhen to Load
Best practicesbest-practices.mdAlways before writing code
Maven dependenciesdependency-management.mdNew project or adding connectors
Local dev environmentenvironment-setup.mdDocker-based local development
MSF architecturemsf-overview.mdKPU model and service constraints
MSF constraints and patternsmsf-constraints-and-patterns.mdMSF vs self-managed Flink, service-level vs application-level configuration separation, MSF-specific resource/network/storage limits, common MSF patterns
Quotas, ENI planning, MSF vs EMR, source/sink choicefoundation-operations.mdCapacity planning, service selection, architecture design, CLI/IAM/CloudWatch identifier disambiguation
IAM execution role, trust policy, action prefix, service principalfoundation-operations.mdWriting IAM policies for MSF — covers the
kinesisanalytics:
(no v2) action prefix,
kinesisanalytics.amazonaws.com
(no v2) trust principal, and the v2/non-v2 disconnect that is the most common source of permission and AssumeRole failures
Flink 2.x migrationflink-2x-migration.mdVersion upgrades, state compatibility
KPU sizingresource-optimization.mdRight-sizing, performance diagnosis, scaling
Scaling decisions on running appsscaling-decisions.mdIn-flight scaling matrix, cost/memory impact of scale changes, autoscaling behavior, anti-patterns
Cost estimationpricing-calculator.mdBudget planning, sizing-to-cost mapping, optimization levers
Application lifecycle opsapplication-lifecycle.mdStart/stop, deploy code, rollback, snapshot lifecycle, runtime properties, delete
Restart loop diagnosisfirst-fault-isolation.mdCrashing/restarting apps, finding original failure vs loop sustainers, Flink Dashboard live diagnosis
Checkpoint tuningcheckpoint-tuning.mdCheckpoint impact on KPU memory and CPU, frequency vs network bandwidth trade-offs, checkpoint duration exceeding interval, OOM/GC during checkpoints
Job graph designjob-graph-architecture.mdPerformance issues, splitting jobs
Job graph anti-patternsjob-graph-anti-patterns.mdData skew detection and mitigation, monolith job anti-pattern, high fan-out anti-pattern, removing multiple shuffles, when to split a large application
Monitoring and alarmsmonitoring-and-metrics.mdCloudWatch dashboards, alarms, metrics
Logginglogging-configuration.mdLog4j2, CloudWatch Logs setup
Kinesis connectorskinesis-connector-guide.mdKinesis source and sink builders, polling configuration and throttling (
READER_EMPTY_RECORDS_FETCH_INTERVAL
,
SHARD_GET_RECORDS_MAX
,
ReadProvisionedThroughputExceeded
,
LimitExceededException
), legacy connector migration
Kinesis Enhanced Fan-Out (EFO)kinesis-efo-guide.mdWhen to use EFO vs polling, EFO source configuration, consumer lifecycle (
JOB_MANAGED
vs
SELF_MANAGED
), parallelism vs shard count, IAM permissions, troubleshooting
Iceberg integration (write APIs, distribution modes, partitioning)iceberg-connector-guide.mdIceberg write APIs (append, upsert, dynamic), distribution modes (NONE/HASH/RANGE), CoW vs MoR, read patterns, partitioning, DDL. Does NOT contain catalog choice or maintenance approaches — for those, load
iceberg-tuning-and-operations.md
.
Iceberg tuning, operations, catalog choice, maintenanceiceberg-tuning-and-operations.mdProvides maintenance approaches for S3 Tables, Glue + Glue auto-compaction, and Glue + Flink embedded maintenance with JDBC lock for catalog-choice questions; small files problem and mitigations; Flink TableMaintenance API, post-commit maintenance, lock factories; IcebergSink monitoring, anti-patterns.
CDC connectorscdc-connector-guide.mdMySQL, PostgreSQL, Oracle, SQL Server, MongoDB CDC
IaC and deploymentiac-and-deployment.mdCloudFormation, CDK, Terraform, two-phase deployment
Serializationserialization-guide.mdPOJO, Avro, Kryo guidance
State managementstate-management.mdTTL, state types, migration safety
  • 回答任何相关主题的问题时,必须使用本技能及其参考文件。
  • 当问题涉及Apache Flink、托管式Apache Flink服务、KPU规格、Flink监控、部署、迁移、实时分析或基于Flink的Iceberg/LakeHouse流处理时,禁止仅凭训练知识或搜索通用AWS文档作答
    • 在采取其他步骤前,必须先加载以下相关参考文件。
    • 参考文件包含MSF特定细节(阈值、统计数据、命名空间、约束),这些细节与通用Flink指导不同,是提供正确响应的必要信息。
目标参考文件加载时机
最佳实践best-practices.md编写代码前必须加载
Maven依赖dependency-management.md新项目或添加连接器时
本地开发环境environment-setup.md基于Docker的本地开发场景
MSF架构msf-overview.mdKPU模型和服务约束相关问题
MSF约束与模式msf-constraints-and-patterns.mdMSF与自托管Flink对比、服务级与应用级配置分离、MSF特定资源/网络/存储限制、常见MSF模式
配额、ENI规划、MSF与EMR对比、源/接收器选择foundation-operations.md容量规划、服务选择、架构设计、CLI/IAM/CloudWatch标识符区分
IAM执行角色、信任策略、操作前缀、服务主体foundation-operations.md为MSF编写IAM策略时——涵盖
kinesisanalytics:
(无v2)操作前缀、
kinesisanalytics.amazonaws.com
(无v2)信任主体,以及v2/非v2的差异(这是权限和AssumeRole失败最常见的原因)
Flink 2.x迁移flink-2x-migration.md版本升级、状态兼容性
KPU规格resource-optimization.md规格选型、性能诊断、扩容
运行中应用的扩容决策scaling-decisions.md运行中扩容矩阵、扩容变更对成本/内存的影响、自动扩缩容行为、反模式
成本估算pricing-calculator.md预算规划、规格与成本映射、优化手段
应用生命周期运维application-lifecycle.md启动/停止、代码部署、回滚、快照生命周期、运行时属性、删除
重启循环诊断first-fault-isolation.md应用崩溃/重启、查找原始故障与循环诱因、Flink Dashboard实时诊断
检查点调优checkpoint-tuning.md检查点对KPU内存和CPU的影响、频率与网络带宽的权衡、检查点时长超过间隔、检查点期间的OOM/GC
作业图设计job-graph-architecture.md性能问题、作业拆分
作业图反模式job-graph-anti-patterns.md数据倾斜检测与缓解、单体作业反模式、高扇出反模式、移除多次Shuffle、何时拆分大型应用
监控与告警monitoring-and-metrics.mdCloudWatch仪表板、告警、指标
日志logging-configuration.mdLog4j2、CloudWatch Logs配置
Kinesis连接器kinesis-connector-guide.mdKinesis源和接收器构建器、轮询配置与限流(
READER_EMPTY_RECORDS_FETCH_INTERVAL
SHARD_GET_RECORDS_MAX
ReadProvisionedThroughputExceeded
LimitExceededException
)、旧版连接器迁移
Kinesis增强扇出(EFO)kinesis-efo-guide.mdEFO与轮询的适用场景、EFO源配置、消费者生命周期(
JOB_MANAGED
vs
SELF_MANAGED
)、并行度与分片数、IAM权限、故障排查
Iceberg集成(写入API、分发模式、分区)iceberg-connector-guide.mdIceberg写入API(追加、更新、动态)、分发模式(NONE/HASH/RANGE)、CoW vs MoR、读取模式、分区、DDL。不包含目录选择或维护方法——相关内容请加载
iceberg-tuning-and-operations.md
Iceberg调优、运维、目录选择、维护iceberg-tuning-and-operations.md提供S3表的维护方法、Glue + Glue自动压缩、Glue + Flink嵌入式维护(带JDBC锁)以解答目录选择问题;小文件问题与缓解方案;Flink TableMaintenance API、提交后维护、锁工厂;IcebergSink监控、反模式。
CDC连接器cdc-connector-guide.mdMySQL、PostgreSQL、Oracle、SQL Server、MongoDB CDC
IaC与部署iac-and-deployment.mdCloudFormation、CDK、Terraform、两阶段部署
序列化serialization-guide.mdPOJO、Avro、Kryo相关指导
状态管理state-management.mdTTL、状态类型、迁移安全性

Additional Resources

额外资源