Add a batch write flow control example for Bigtable #9314

kongweihan · 2024-05-08T16:20:13Z

Description

Add a batch write flow control example for Bigtable

Checklist

I have followed Sample Format Guide
pom.xml parent set to latest shared-configuration
Appropriate changes to README are included in PR
[] These samples need a new API enabled in testing projects to pass (let us know which ones)
[] These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
Tests pass: mvn clean verify required
Lint passes: mvn -P lint checkstyle:check required
Static Analysis: mvn -P lint clean compile pmd:cpd-check spotbugs:check advisory only
[] This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
[] This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
Please merge this PR for me once it is approved

billyjacobson · 2024-05-09T19:23:49Z

bigtable/beam/batch-write-flow-control-example/pom.xml

@@ -0,0 +1,127 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+    Copyright 2021 Google LLC


can you change to 2024

billyjacobson · 2024-05-09T19:28:39Z

...am/batch-write-flow-control-example/src/main/java/bigtable/BatchWriteFlowControlExample.java

+
+    PCollection<Long> numbers = p.apply(generateLabel, GenerateSequence.from(0).to(numRows));
+
+    if (options.getUseCloudBigtableIo()) {


do we need the code showing both ways for this? I know we recommend customers use one of these over the other but can never remember which one. Ideally we just show the recommended way. Or if the goal is to just show how to activate the flow control on the client then we could keep both but maybe the data being written can be simplified to not have as much information for customers to take in.

For example, we could just write one row using each client and that can reduce the need to generate sequences and all the additional helper functions.

minherz

Hello,
Please address the following questions:

We ask to have a single code sample per file. This code looks like showing two code samples. What is this code sample demonstrates?
This code sample does not have regional tags. What documentation use it?
We do not host code samples without tests. What is a reason for lack of tests?

minherz · 2024-05-23T14:52:54Z

...am/batch-write-flow-control-example/src/main/java/bigtable/BatchWriteFlowControlExample.java

+    Pipeline p = Pipeline.create(options);
+
+    PCollection<Long> numbers = p.apply(generateLabel, GenerateSequence.from(0).to(numRows));
+
+    if (options.getUseCloudBigtableIo()) {
+      System.out.println("Using CloudBigtableIO");
+      PCollection<org.apache.hadoop.hbase.client.Mutation> mutations = numbers.apply(mutationLabel,
+          ParDo.of(new CreateHbaseMutationFn(options.getBigtableColsPerRow(),
+              options.getBigtableBytesPerCol())));
+
+      mutations.apply(
+          String.format("Write data to table %s via CloudBigtableIO", options.getBigtableTableId()),
+          CloudBigtableIO.writeToTable(new CloudBigtableTableConfiguration.Builder()
+              .withProjectId(options.getProject())
+              .withInstanceId(options.getBigtableInstanceId())
+              .withTableId(options.getBigtableTableId())
+              .withConfiguration(BigtableOptionsFactory.BIGTABLE_ENABLE_BULK_MUTATION_FLOW_CONTROL,
+                  "true")
+              .withConfiguration(BigtableOptionsFactory.BIGTABLE_BULK_MAX_REQUEST_SIZE_BYTES,
+                  "1048576")
+              .build()));
+    } else {
+      System.out.println("Using BigtableIO");
+      PCollection<KV<ByteString, Iterable<Mutation>>>
+          mutations = numbers.apply(mutationLabel,
+          ParDo.of(new CreateMutationFn(options.getBigtableColsPerRow(),
+              options.getBigtableBytesPerCol())));
+
+      BigtableIO.Write write = BigtableIO.write()
+          .withProjectId(options.getProject())
+          .withInstanceId(options.getBigtableInstanceId())
+          .withTableId(options.getBigtableTableId())
+          .withFlowControl(true);  // This enables batch write flow control
+
+      mutations.apply(
+          String.format("Write data to table %s via BigtableIO", options.getBigtableTableId()),
+          write
+      );
+    }
+
+    p.run();


this block of code is hard to read. Since it is a code sample it should be easy to understand. Please, reformat it so it will look like steps each of the steps calling apply method of the pipeline. See the dataflow-bigquery-read-tablerows sample as a reference.

minherz · 2024-05-23T14:53:39Z

...am/batch-write-flow-control-example/src/main/java/bigtable/BatchWriteFlowControlExample.java

+    Pipeline p = Pipeline.create(options);
+
+    PCollection<Long> numbers = p.apply(generateLabel, GenerateSequence.from(0).to(numRows));
+
+    if (options.getUseCloudBigtableIo()) {
+      System.out.println("Using CloudBigtableIO");
+      PCollection<org.apache.hadoop.hbase.client.Mutation> mutations = numbers.apply(mutationLabel,
+          ParDo.of(new CreateHbaseMutationFn(options.getBigtableColsPerRow(),
+              options.getBigtableBytesPerCol())));
+
+      mutations.apply(
+          String.format("Write data to table %s via CloudBigtableIO", options.getBigtableTableId()),
+          CloudBigtableIO.writeToTable(new CloudBigtableTableConfiguration.Builder()
+              .withProjectId(options.getProject())
+              .withInstanceId(options.getBigtableInstanceId())
+              .withTableId(options.getBigtableTableId())
+              .withConfiguration(BigtableOptionsFactory.BIGTABLE_ENABLE_BULK_MUTATION_FLOW_CONTROL,
+                  "true")
+              .withConfiguration(BigtableOptionsFactory.BIGTABLE_BULK_MAX_REQUEST_SIZE_BYTES,
+                  "1048576")
+              .build()));
+    } else {
+      System.out.println("Using BigtableIO");
+      PCollection<KV<ByteString, Iterable<Mutation>>>
+          mutations = numbers.apply(mutationLabel,
+          ParDo.of(new CreateMutationFn(options.getBigtableColsPerRow(),
+              options.getBigtableBytesPerCol())));
+
+      BigtableIO.Write write = BigtableIO.write()
+          .withProjectId(options.getProject())
+          .withInstanceId(options.getBigtableInstanceId())
+          .withTableId(options.getBigtableTableId())
+          .withFlowControl(true);  // This enables batch write flow control
+
+      mutations.apply(
+          String.format("Write data to table %s via BigtableIO", options.getBigtableTableId()),
+          write
+      );
+    }
+
+    p.run();


Does it intend for run asynchronously? Please, append waitUntilFinish() call to the result of the run().

kongweihan requested review from yoshi-approver and a team as code owners May 8, 2024 16:20

product-auto-label bot added samples Issues that are directly related to samples. api: bigtable Issues related to the Bigtable API. labels May 8, 2024

blunderbuss-gcf bot assigned danieljbruce May 8, 2024

kongweihan force-pushed the flow-control-example branch 5 times, most recently from 411e662 to 6e4f7db Compare May 8, 2024 17:35

Sita04 unassigned danieljbruce May 8, 2024

billyjacobson reviewed May 9, 2024

View reviewed changes

Add an example for batch write flow control

d563567

kongweihan force-pushed the flow-control-example branch from 6e4f7db to d563567 Compare May 17, 2024 18:36

Sita04 requested a review from minherz May 23, 2024 02:32

Sita04 assigned minherz May 23, 2024

minherz requested changes May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a batch write flow control example for Bigtable #9314

Add a batch write flow control example for Bigtable #9314

kongweihan commented May 8, 2024

billyjacobson May 9, 2024

billyjacobson May 9, 2024

minherz left a comment

minherz May 23, 2024

minherz May 23, 2024


		PCollection<Long> numbers = p.apply(generateLabel, GenerateSequence.from(0).to(numRows));

		if (options.getUseCloudBigtableIo()) {

Add a batch write flow control example for Bigtable #9314

Are you sure you want to change the base?

Add a batch write flow control example for Bigtable #9314

Conversation

kongweihan commented May 8, 2024

Description

Checklist

billyjacobson May 9, 2024

Choose a reason for hiding this comment

billyjacobson May 9, 2024

Choose a reason for hiding this comment

minherz left a comment

Choose a reason for hiding this comment

minherz May 23, 2024

Choose a reason for hiding this comment

minherz May 23, 2024

Choose a reason for hiding this comment