Loading...
Loading...
Guides understanding and working with Apache Beam runners (Direct, Dataflow, Flink, Spark, etc.). Use when configuring pipelines for different execution environments or debugging runner-specific issues.
npx skill4agent add apache/beam runners| Runner | Location | Description |
|---|---|---|
| Direct | | Local execution for testing |
| Prism | | Portable local runner |
| Dataflow | | Google Cloud Dataflow |
| Flink | | Apache Flink |
| Spark | | Apache Spark |
| Samza | | Apache Samza |
| Jet | | Hazelcast Jet |
| Twister2 | | Twister2 |
PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(DirectRunner.class);
Pipeline p = Pipeline.create(options);options = PipelineOptions()
options.view_as(StandardOptions).runner = 'DirectRunner'
p = beam.Pipeline(options=options)--runner=DirectRunnerDataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setRunner(DataflowRunner.class);
options.setProject("my-project");
options.setRegion("us-central1");
options.setTempLocation("gs://my-bucket/temp");options = PipelineOptions([
'--runner=DataflowRunner',
'--project=my-project',
'--region=us-central1',
'--temp_location=gs://my-bucket/temp'
])--experiments=use_runner_v2--sdkContainerImage=gcr.io/project/beam_java11_sdk:customFlinkPipelineOptions options = PipelineOptionsFactory.as(FlinkPipelineOptions.class);
options.setRunner(FlinkRunner.class);
options.setFlinkMaster("[local]");options.setFlinkMaster("host:port");options = PipelineOptions([
'--runner=FlinkRunner',
'--flink_master=host:port',
'--environment_type=LOOPBACK' # or DOCKER, EXTERNAL
])SparkPipelineOptions options = PipelineOptionsFactory.as(SparkPipelineOptions.class);
options.setRunner(SparkRunner.class);
options.setSparkMaster("local[*]"); # or spark://host:portoptions = PipelineOptions([
'--runner=SparkRunner',
'--spark_master_url=local[*]'
])# Direct Runner
./gradlew :runners:direct-java:validatesRunner
# Flink Runner
./gradlew :runners:flink:1.18:validatesRunner
# Spark Runner
./gradlew :runners:spark:3:validatesRunner
# Dataflow Runner
./gradlew :runners:google-cloud-dataflow-java:validatesRunner@Rule public TestPipeline pipeline = TestPipeline.create();
// Set runner via system property
-DbeamTestPipelineOptions='["--runner=TestDataflowRunner"]'DOCKERLOOPBACKEXTERNALPROCESS./gradlew :runners:flink:1.18:job-server:runShadow./gradlew :runners:spark:3:job-server:runShadow| Option | Description |
|---|---|
| GCP project |
| GCP region |
| GCS temp location |
| GCS staging |
| Initial workers |
| Max workers |
| VM type |
| Option | Description |
|---|---|
| Flink master address |
| Default parallelism |
| Checkpoint interval |
| Option | Description |
|---|---|
| Spark master URL |
| Additional Spark config |
./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar./gradlew :runners:flink:1.18:job-server:shadowJar./gradlew :runners:spark:3:job-server:shadowJar-Dorg.slf4j.simpleLogger.defaultLogLevel=debug--targetParallelism=1--experiments=upload_graph