DBZ-6964 Temporary fix schema for MySQL "Geometry" types conflicting with Avro schema registries (Confluent, Apicurio). #5411

rk3rn3r · 2024-03-20T08:09:12Z

It seems that schema registries don't incorporate schema parameters correctly leading to that only the first row of a Geometry type is used as schema reference. If a single topic/table has multiple fields/rows of a geo type than only one schema is referenced for all fields, even when the real schema of those fields differ.

closes https://issues.redhat.com/browse/DBZ-6964

…with Avro schema registries (Confluent, Apicurio). It seems that schema registries don't incorporate schema parameters correctly leading to that only the first row of a Geometry type is used as schema reference. If a single topic/table has multiple fields/rows of a geo type than only one schema is referenced for all fields, even when the real schema of those fields differ. closes https://issues.redhat.com/browse/DBZ-6964

…d deduplication with a Map with a key based on the SMT's fully-qualified class name and version. + minor cleanup closes https://issues.redhat.com/browse/DBZ-7416

rk3rn3r · 2024-03-20T08:16:01Z

@jpechane Looking at the Cassandra PR that you mentioned in the last triaging call, I felt like it could make sense to not only do this for MySQL Geometry but maybe for other fields that potentially conflict?
There are some schema parameters that are handled by the different Avro registries (feels like some copied the other) like scale and precision values. The ones that we used to differ schemas of different fields are not incorporated in the cache key and storage, leading to those issues.
Maybe we can identify those datatype for all connectors?

In general I suggest this as a temporary fix as the schema name for those field look ugly.
Maybe it is even better to use the logical coordinate (server.schema.table.field) as schema names until it is fixed?
Wdyt?

What also might be missing is a Apicurio client test for Postgres. Should I add one?

rk3rn3r · 2024-03-20T08:22:27Z

debezium-connector-mysql/src/test/java/io/debezium/connector/mysql/MySqlGeometryIT.java

@@ -218,7 +231,7 @@ else if (i == 2) {
        }
    }

-    private DatabaseGeoDifferences databaseGeoDifferences(boolean mySql5) {
+    public static DatabaseGeoDifferences databaseGeoDifferences(boolean mySql5) {


I actually did not change this. This interface is still private, Github shows an outdated commit content here. 🤷

mfvitale · 2024-03-20T08:23:47Z

There are some schema parameters that are handled by the different Avro registries (feels like some copied the other) like scale and precision values.

Yes, issue was reported with https://issues.redhat.com/browse/DBZ-6836. Opened an issue to Confluent and Apicurio but no answer.

rk3rn3r · 2024-03-20T08:31:31Z

debezium-core/src/main/java/io/debezium/data/geometry/Geometry.java

        return SchemaBuilder.struct()
-                .name(LOGICAL_NAME)
+                .name(LOGICAL_NAME + "__" + columnName) // temporary fix for DBZ-6964


This is the essence of the change. I don't feel super happy about it and looking at the Cassandra PR debezium/debezium-connector-cassandra#121 there might be other fields affected in general.
Looking at the registry code it seems that all types that can collide, collide when there are other parameters used than things like precision and scale. For example DECIMAL types are handled here:
https://github.com/Apicurio/apicurio-registry/blob/f2d1f06cae0b709acf3f6d4edb982f5775b50fa9/utils/converter/src/main/java/io/apicurio/registry/utils/converter/avro/AvroData.java#L886-L896

Making logical names dynamic will break the JDBC sink connector. If we go with this approach, we'll need to add a check for "prefixed geometry logical names" to sanitize the incoming data to correctly lookup and resolve target columns accordingly.

@Naros I agree. But let me challenge this. Is this the right way to to it? Couldn't we use "__debezium.source.column.type" : "GEOMETRY", schema parameter instead? With this change we can guarantee that all parameters are stored in the registry. Then we can rely on our custom fields to decide on the Debezium datatype instead of relying on the default behavior of the registry. Reusing the name field from the schema is maybe not the best way to do it?

We could for Debezium events, but don't forget the JDBC sink can be used by non-Debezium producers.

We will need to look at the schema type no matter what for the latter use case; I just want to be mindful of any changes we make here and how that could impact performance, specifically given how hard Mario worked on improving the sink's performance metrics already; no need to walk that back.

@Naros and resolve target columns accordingly

I think the jdbc sink resolves the column name from field.name() not from the schema, does it? The issue will be with identifying the correct data type:

Looking at the code in iGeneralDatabaseDialect#getSchemaType()

if (!Objects.isNull(schema.name())) { final Type type = typeRegistry.get(schema.name()); if (!Objects.isNull(type)) { LOGGER.trace("Schema '{}' resolved by name from registry to type '{}'", schema.name(), type); return type; } } if (!Objects.isNull(schema.parameters())) { final String columnType = schema.parameters().get("__debezium.source.column.type"); if (!Objects.isNull(columnType)) { final Type type = typeRegistry.get(columnType);

The schema is loaded via the record's schema.name() (which will work because the schema comes from field.schema().
But in that first case typeRegistry.get(schema.name()); we would have to match only the first name or cleanup the key in the cache. I don't know, it is an unfortunate fix.

rk3rn3r · 2024-03-20T08:36:42Z

Thx @mfvitale! You can look at the linked code in my previous reply (the one after yours). It shows the code and which datatypes/fields get a "special" handling. It should be possible to fix the issue somewhere around there too to incorporate other parameters not only those "magical" ones. wdyt?

mfvitale · 2024-03-20T08:48:51Z

Thx @mfvitale! You can look at the linked code in my previous reply (the one after yours). It shows the code and which datatypes/fields get a "special" handling. It should be possible to fix the issue somewhere around there too to incorporate other parameters not only those "magical" ones. wdyt?

Honestly, I don't know. My first clue of the issue was about this lines. https://github.com/confluentinc/schema-registry/blob/cede655d600161767be861f34c47516c76923594/avro-data/src/main/java/io/confluent/connect/avro/AvroData.java#L1172

jpechane · 2024-03-21T10:48:38Z

@rk3rn3r Thanks for the PR. It is definitely omething that works but I am not keen in having it in the core - given we are effectivelly patching registry bug. Also it is unnecessary for JSON. Is there a chance the implementation can be change along the lines of https://debezium.zulipchat.com/#narrow/stream/348251-community-cassandra/topic/issues.20in.20pushing.20the.20row.20if.20we.20have.20Datatype.20.3A.20set.3Ctext.3E/near/421428439 and intead of having it in the core it would be an SMT?

rk3rn3r · 2024-03-21T14:28:52Z

@jpechane There is a MySQLSchemaFactory, similar to the fix done in the Cassandra PR, but I assume that, but it is imo not used at all at the moment. Should I move the fix there? Imo other data types will be affected too.

rk3rn3r added 2 commits March 20, 2024 09:08

DBZ-7416 Fix duplicate SMTs sometimes returned by Kafka Connect. Adde…

14c4719

…d deduplication with a Map with a key based on the SMT's fully-qualified class name and version. + minor cleanup closes https://issues.redhat.com/browse/DBZ-7416

rk3rn3r requested a review from jpechane March 20, 2024 08:09

rk3rn3r commented Mar 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-6964 Temporary fix schema for MySQL "Geometry" types conflicting with Avro schema registries (Confluent, Apicurio). #5411

DBZ-6964 Temporary fix schema for MySQL "Geometry" types conflicting with Avro schema registries (Confluent, Apicurio). #5411

rk3rn3r commented Mar 20, 2024 •

edited

rk3rn3r commented Mar 20, 2024

rk3rn3r Mar 20, 2024

mfvitale commented Mar 20, 2024 •

edited

rk3rn3r Mar 20, 2024

Naros Mar 20, 2024

rk3rn3r Mar 21, 2024 •

edited

Naros Mar 21, 2024

rk3rn3r Mar 21, 2024

rk3rn3r commented Mar 20, 2024

mfvitale commented Mar 20, 2024

jpechane commented Mar 21, 2024

rk3rn3r commented Mar 21, 2024

DBZ-6964 Temporary fix schema for MySQL "Geometry" types conflicting with Avro schema registries (Confluent, Apicurio). #5411

Are you sure you want to change the base?

DBZ-6964 Temporary fix schema for MySQL "Geometry" types conflicting with Avro schema registries (Confluent, Apicurio). #5411

Conversation

rk3rn3r commented Mar 20, 2024 • edited

rk3rn3r commented Mar 20, 2024

rk3rn3r Mar 20, 2024

Choose a reason for hiding this comment

mfvitale commented Mar 20, 2024 • edited

rk3rn3r Mar 20, 2024

Choose a reason for hiding this comment

Naros Mar 20, 2024

Choose a reason for hiding this comment

rk3rn3r Mar 21, 2024 • edited

Choose a reason for hiding this comment

Naros Mar 21, 2024

Choose a reason for hiding this comment

rk3rn3r Mar 21, 2024

Choose a reason for hiding this comment

rk3rn3r commented Mar 20, 2024

mfvitale commented Mar 20, 2024

jpechane commented Mar 21, 2024

rk3rn3r commented Mar 21, 2024

rk3rn3r commented Mar 20, 2024 •

edited

mfvitale commented Mar 20, 2024 •

edited

rk3rn3r Mar 21, 2024 •

edited