Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-3643 Fix to support NOBLOBs among binlog_row_image attributes. #5319

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Bue-von-hon
Copy link
Contributor

@Bue-von-hon Bue-von-hon commented Feb 26, 2024

Copy link

Hi @Bue-von-hon. Thank you for your valuable contribution.
Please author your commit(s) using an email linked to your GitHub account.

Copy link

Welcome as a new contributor to Debezium, @Bue-von-hon. Reviewers, please add missing author name(s) and alias name(s) to the COPYRIGHT.txt and Aliases.txt respectively.

Copy link

Hi @Bue-von-hon. Thank you for your valuable contribution.
Please author your commit(s) using an email linked to your GitHub account.

@Naros
Copy link
Member

Naros commented Feb 26, 2024

Hi @Bue-von-hon, thanks. I commented on your Jira about the test case coverages, do you think you could add some test cases as I described in the Jira, thanks.

Comment on lines 272 to 293
/**
* Determines whether the binlog format used by the database server is {@code binlog_row_image='NOBLOB'}.
*
* @return {@code true} if the {@code binlog_row_image} is set to {@code NOBLOB}, {@code false} otherwise
*/
public boolean isBinlogRowImageNoblob() {
try {
final String rowImage = queryAndMap("SHOW GLOBAL VARIABLES LIKE 'binlog_row_image'", rs -> {
if (rs.next()) {
return rs.getString(2);
}
// This setting was introduced in MySQL 5.6+ with default of 'FULL'.
// For older versions, assume 'FULL'.
return "FULL";
});
LOGGER.debug("binlog_row_image={}", rowImage);
return "NOBLOB".equalsIgnoreCase(rowImage);
}
catch (SQLException e) {
throw new DebeziumException("Unexpected error while connecting to the database and looking at BINLOG_ROW_IMAGE mode: ", e);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care for the fact we're introducing a second method that does nearly the identical behavior as isBinlogRowImageFull. Could we introduce a protected/private method like the following and then delegate both isBinlogRowImageFull() and isBinlogRowImageNoblob() to this:

    public boolean isBinlogRowImageFull() {
        return isBinlogRowImage("FULL");
    }
    
    public boolean isBinlogRowImageNoblob() {
        return isBinlogRowImage("NOBLOB");
    }
    
    protected boolean isBinlogRowImage(String expectedType) {
        try {
            final String rowImage = queryAndMap("SHOW GLOBAL VARIABLES LIKE 'binlog_row_image'", rs -> {
                if (rs.next()) {
                    return rs.getString(2);
                }
                // This setting was introduced in MySQL 5.6+ with default of 'FULL'.
                // For older versions, assume 'FULL'.
                return "FULL";
            });
            LOGGER.debug("binlog_row_image={}", rowImage);
            return expectedType.equalsIgnoreCase(rowImage);
        }
        catch (SQLException e) {
            throw new DebeziumException("Unexpected error while connecting to the database and looking at BINLOG_ROW_IMAGE mode: ", e);
        }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, do we need to call SHOW GLOBAL VARIABLES LIKE twice? IMHO only one method is enough, just the name should be updated like isBinlogRowImageFullOrNoBlob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken your feedback on board by integrating it into an existing method.
PTAL @Naros @jpechane

@Bue-von-hon Bue-von-hon marked this pull request as draft March 4, 2024 08:10
@Bue-von-hon
Copy link
Contributor Author

Hi @Bue-von-hon, thanks. I commented on your Jira about the test case coverages, do you think you could add some test cases as I described in the Jira, thanks.

sure i will!

Copy link

Hi @Bue-von-hon. Thank you for your valuable contribution.
Please author your commit(s) using an email linked to your GitHub account.

2 similar comments
Copy link

Hi @Bue-von-hon. Thank you for your valuable contribution.
Please author your commit(s) using an email linked to your GitHub account.

Copy link

Hi @Bue-von-hon. Thank you for your valuable contribution.
Please author your commit(s) using an email linked to your GitHub account.

@Bue-von-hon Bue-von-hon marked this pull request as ready for review April 2, 2024 01:47
@Bue-von-hon
Copy link
Contributor Author

Thank @Naros and @jpechane for your patience and understanding.
I realized that the way to save the table schema is implemented by just saving the create query.
So I chose to modify the create query for the blob/text columns. 😅
If you have a better way, please let me know.

@Bue-von-hon
Copy link
Contributor Author

FYI
You need to modify line 192 of pom.xml inside debezium-connector-mysql like below to test.
<docker.dbs>debezium/mysql-server-test-binlog-row-image-database</docker.dbs>

p.s. I've added a new docker environment for testing, so please understand.

@Bue-von-hon
Copy link
Contributor Author

@Naros @jpechane gentle ping

@jpechane
Copy link
Contributor

jpechane commented Apr 5, 2024

@Bue-von-hon Could the whole testsuite pass when running against the new MySQL container image?
In general I am fine with adding a new one but it is important to have the test running as part of CI. In that case it is necessary to add it to mysql-ci profile so it is automatically executed on Github.

Copy link

Hi @Bue-von-hon, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

…nnector/mysql/MySqlConnectorNoblobT.java

Co-authored-by: Jiri Pechanec <jpechane@redhat.com>
@Bue-von-hon
Copy link
Contributor Author

@Naros @jpechane I've resolved all the conflicts.
However, the usage of the UniqueDatabase class has changed, and it's taking some time to fix it. (Impact of DBZ-7693)
Previously, I instantiated the UniqueDatabase class directly with the new keyword, but now I can't do that.
So I changed the implementation to the MySqlUniqueDatabase class, which inherits from it, and now I get the error "java.lang.IllegalStateException: java.sql.SQLException: Access denied for user 'mysqluser'@'localhost' (using password: YES)".

I probably need to change the account information, but the MySqlUniqueDatabase class was just added, so I don't have a reference for it.
I'm still working on it, so please be patient.
Also, any additional information needed to troubleshoot the issue is always welcome.

Modification:
- Fix(BinlogSnapshotChangeEventSource): Move existing work to a new package.
- Fix(BinlogSourceTask): Add noblob condition.
- Fix(MySqlConnectorNoBlobIT): Fix name and Add Implement abstract methods to use the UniqueDatabase class.
Copy link

github-actions bot commented May 3, 2024

Hi @Bue-von-hon, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

@Naros
Copy link
Member

Naros commented May 3, 2024

Hi @Bue-von-hon and @jpechane, I have a radical idea here, and I'm curious what you both think of this. So my understanding here is that when binlog_row_image=noblob, this effectively tells the database to skip recording the blob and text columns in binlog events.

Could we tackle this from the opposite side, rather than trying to manipulate the CREATE TABLE and attempting to omit the column at the outset, could we do this just before event emission using a PostProcessor? For example, the PostProcessor could examine the relational Column type and compare that with the binlog configs and if the noblob setting is enabled, any TEXT or BLOB column type would be dropped from the Struct just before we construct the SourceRecord.

To me this seems to accomplish the same outcome but without overcomplicating the column mapping process between the SQL statements we get and the in-memory relational models maintained by Debezium.

@jpechane
Copy link
Contributor

jpechane commented May 6, 2024

@Naros Clever idea! Yes, let's do it this way.

@Bue-von-hon
Copy link
Contributor Author

Hi @Naros @jpechane leave information.
I'm trying to implement a way that if noblob mode is on when creating a SourceRecord, and the struct has a blob/text column, it will just ignore it.

It looks like there's going to be a major modification to the EventDispatcher class. (io.debezium.pipeline.EventDispatcher)
Below are the items that will be modified in EventDispatcher class:

  • StreamingChangeRecordReceiver#changeRecord
  • BufferingSnapshotChangeRecordReceiver#changeRecord
  • IncrementalSnapshotChangeRecordReceiver#completeSnapshot

However, the event dispatcher class or method being changed must have the noblob configuration information.
I'm thinking of using sourceInfo inside the OffsetContext.

p.s. of course, inside sourceInfo there is no relevant information yet, I need to add it.

@jpechane
Copy link
Contributor

@Bue-von-hon Do you need to modify the EventDispatcher? Or would it be possible to make a subclass in MySQL/MariaDB and add the modifications there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants