Implement Tombstone Message Handling for JDBC Sink Connector #302

Joel-hanson · 2024-04-24T10:54:20Z

Description:

This pull request addresses issue #165, focusing on enhancing the functionality of the JDBC sink connector to handle deletes on tombstone messages effectively. By implementing this feature, users can now delete rows corresponding to tombstone messages, which is particularly useful for scenarios involving Change Data Capture (CDC) from another database.

Changes:

Added support for handling tombstone messages in the JDBC sink connector.
Implemented the ability to delete rows based on tombstone messages.
Introduced a new parameter, delete.enabled, to control delete behavior.
Aligned functionality with the documented approach for processing tombstones, similar to Confluent JDBC driver behavior.

Related Issue(s):

Fixes Handle deletes on tombstone messages #165

Signed-off-by: Joel Hanson joelhanson025@gmail.com

Joel-hanson · 2024-05-02T05:27:59Z

@ivanyu, could you please take a look at this pull request when you get a chance? Your feedback would be really valuable. Thank you!

matuagarwal · 2024-05-02T06:04:15Z

@jeqo @ivanyu could you please review the PR

davidradl · 2024-05-16T10:32:06Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

@@ -109,25 +118,32 @@ public List<SinkRecord> add(final SinkRecord record) throws SQLException {
    }

    private void prepareStatement() throws SQLException {
-        final String sql;
-        log.debug("Generating query for insert mode {} and {} records", config.insertMode, records.size());


nit
Why are we deleting this debug line. It seems like it would be useful to have the records.size().

Maybe move the records.size() as a 3rd parameter for the other debug entry.

also after the change if there are no records or no tombstones we do not get any debug

davidradl · 2024-05-16T10:43:16Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

-        if (records.isEmpty()) {
-            log.debug("Records is empty");
+        if (records.isEmpty() && tombstoneRecords.isEmpty()) {
+            log.debug("Records are empty.");


nit
I am not sure about this debug.

the debug text should include Records and tombstone records are empty.

Do we want to have a debug is the records are empty as before and there are tombstones .

Do we want to have a debug is the records are empty as before and there are tombstones.

It seems redundant to include if statements solely for debugging empty records, as the prepare statement debug (mentioned previously) will already cover this aspect by not debugging the statement if no records are there.

davidradl · 2024-05-16T11:12:40Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

+            } else {
+                records.add(record);
+            }
+            if (records.size() >= config.batchSize || tombstoneRecords.size() >= config.batchSize) {
                log.debug("Flushing buffered records after exceeding configured batch size {}.",


nit
might be worth adding in the records.size() and tombstoneRecords.size() so we know which one caused the flush

Thanks @davidradl for the review, I have addressed most of the comments

- Added support for handling tombstone messages in the JDBC sink connector. - Implemented the ability to delete rows based on tombstone messages. - Introduced a new parameter, `delete.enabled`, to control delete behavior. - Aligned functionality with the documented approach for processing tombstones, similar to Confluent JDBC driver behavior. Signed-off-by: Joel Hanson <joelhanson025@gmail.com>

aindriu-aiven · 2024-05-17T13:23:33Z

Hey @Joel-hanson, thanks a million for your contribution I have this on my list for review on Monday, sorry for the delay I am just catching up on a few projects at the moment!

aindriu-aiven · 2024-05-20T10:17:43Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

+                records.add(record);
+            }
+            if (records.size() >= config.batchSize || tombstoneRecords.size() >= config.batchSize) {
+                log.debug("Flushing buffered records {} and tombstone records {} after exceeding the configured batch size of {}.",


Hey,
Just started going through the PR but there are a couple of checkstyle errors on line 102 & Line 124 for lines that exceed 120 characters.

C0urante

Thanks @Joel-hanson for the PR! I've left a few thoughts but overall it looks pretty good.

I'm unable to run the integration tests on my Apple Silicon Macbook, though. I've tried several things (including looking through Testcontainers issues and using an alternative Docker runtime) with no success.

Would it be possible to use Postgres either instead of or in addition to Oracle for the new integration tests? As Apple Silicon becomes more and more prevalent for devs the cost of having tests that can't be run locally increases.

C0urante · 2024-05-16T13:59:59Z

build.gradle.kts

@@ -164,6 +164,7 @@ dependencies {

    runtimeOnly("org.xerial:sqlite-jdbc:3.45.2.0")
    runtimeOnly("org.postgresql:postgresql:42.7.3")
+    runtimeOnly("com.oracle.database.jdbc:ojdbc8:19.3.0.0")


This is a pretty old driver version; the latest available in Maven Central is 23.4.0.24.05. Was there a specific reason you chose this one?

C0urante · 2024-05-16T17:14:12Z

src/main/java/io/aiven/connect/jdbc/dialect/DatabaseDialect.java

+    default String buildDeleteStatement(TableId table,
+                                        int records,
+                                        Collection<ColumnId> keyColumns) {
+        return buildDeleteStatement(table, records, keyColumns);


Won't this cause a stack overflow for custom DatabaseDialect implementations that don't override this method? We can throw an UnsupportedOperationException as an alternative.

C0urante · 2024-05-16T17:14:46Z

src/main/java/io/aiven/connect/jdbc/dialect/DatabaseDialect.java

+         * @throws SQLException if there is a problem binding values into the statement
+         */
+        default void bindTombstoneRecord(SinkRecord record) throws SQLException {
+            bindTombstoneRecord(record);


Same thought RE potential stack overflows

C0urante · 2024-05-20T12:38:39Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

@@ -86,12 +90,17 @@ public List<SinkRecord> add(final SinkRecord record) throws SQLException {
        }

        final List<SinkRecord> flushed;
-        if (currentSchemaPair.equals(schemaPair)) {
+        // Skip the schemaPair check for all tombstone records or the current schema pair matches
+        if (record.valueSchema() == null || currentSchemaPair.equals(schemaPair)) {


It's possible that a Converter instance may deserialize a record with a non-null schema but a null value, which still qualifies as a tombstone record. Can we check if record.value() == null instead of record.valueSchema() == null?

C0urante · 2024-05-20T12:44:59Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

+            } else {
+                records.add(record);
+            }
+            if (records.size() >= config.batchSize || tombstoneRecords.size() >= config.batchSize) {


Should this be if (records.size() + tombstoneRecords.size() >= config.batchSize)? Otherwise, for a batch size B, we may end up buffering B * 2 - 1 records before flushing (B tombstone records and B - 1 regular records, or vice-versa).

C0urante · 2024-05-20T13:01:28Z

src/main/java/io/aiven/connect/jdbc/sink/BufferedRecords.java

        }
        log.debug("Done executing batch.");
-        if (totalUpdateCount != records.size() && !successNoInfo) {
+        verifySuccessfulExecutions(totalSuccessfulExecutionCount, successNoInfo);


Should we invoke this once per executed batch (i.e., once for regular records and once for tombstone records)? Otherwise, the successNoInfo field gets reused across batches and may produce inaccurate results.

C0urante · 2024-05-20T13:07:10Z

src/main/java/io/aiven/connect/jdbc/sink/JdbcSinkConfig.java

+            if (getBoolean(DELETE_ENABLED)) {
+                log.error("Delete mode will enabled only if pk mode set to record_key");
+            }


IMO we should fail here instead of just logging an error. Bonus points if we implement this multi-property validation in the Connector::validate method, but I don't think it's necessary to go that far for this PR if you don't have the time.

Joel-hanson requested review from a team as code owners April 24, 2024 10:54

Joel-hanson mentioned this pull request Apr 25, 2024

Tombstone messages are being ignored (and not handled as per documentation) #287

Closed

Joel-hanson force-pushed the support-delete branch from 085a497 to d7e51af Compare May 2, 2024 05:23

davidradl reviewed May 16, 2024

View reviewed changes

Joel-hanson force-pushed the support-delete branch from d7e51af to 5e2a3c8 Compare May 17, 2024 07:52

Joel-hanson requested a review from davidradl May 17, 2024 07:53

aindriu-aiven reviewed May 20, 2024

View reviewed changes

C0urante reviewed May 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Tombstone Message Handling for JDBC Sink Connector #302

Implement Tombstone Message Handling for JDBC Sink Connector #302

Joel-hanson commented Apr 24, 2024

Joel-hanson commented May 2, 2024

matuagarwal commented May 2, 2024

davidradl May 16, 2024 •

edited

davidradl May 16, 2024

Joel-hanson May 17, 2024 •

edited

davidradl May 16, 2024

Joel-hanson May 17, 2024

aindriu-aiven commented May 17, 2024

aindriu-aiven May 20, 2024

C0urante left a comment

C0urante May 16, 2024

C0urante May 16, 2024

C0urante May 16, 2024

C0urante May 20, 2024

C0urante May 20, 2024

C0urante May 20, 2024

C0urante May 20, 2024

Implement Tombstone Message Handling for JDBC Sink Connector #302

Are you sure you want to change the base?

Implement Tombstone Message Handling for JDBC Sink Connector #302

Conversation

Joel-hanson commented Apr 24, 2024

Description:

Changes:

Joel-hanson commented May 2, 2024

matuagarwal commented May 2, 2024

davidradl May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Joel-hanson May 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aindriu-aiven commented May 17, 2024

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidradl May 16, 2024 •

edited

Joel-hanson May 17, 2024 •

edited