Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-7536 Add support for default value for text_ and varchar_ in post… #5308

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akanimesh7
Copy link
Contributor

@akanimesh7 akanimesh7 commented Feb 21, 2024

Solves https://issues.redhat.com/browse/DBZ-7536

Before

Newly added column definition

col2 text[] DEFAULT '{}'::text[] NOT NULL

Representation of the field in the generated Avro Schema

{
  "name": "col2",
  "type": {
    "items": [
      "null",
      "string"
    ],
    "type": "array"
  }
}

After

Empty array default

Newly added column definition

col2 text[] DEFAULT '{}'::text[] NOT NULL

Representation of the field in the generated Avro Schema

{
    "default": [],
    "name": "col2",
    "type": {
        "connect.default": [],
        "items": "string",
        "type": "array"
    }
}

Non Empty array default

Newly added column definition

col2 text[] DEFAULT '{a,b}'::text[] NOT NULL

Representation of the field in the generated Avro Schema

{
    "default": [
        "a",
        "b"
    ],
    "name": "col2",
    "type": {
        "connect.default": [
            "a",
            "b"
        ],
        "items": "string",
        "type": "array"
    }
}

Copy link
Contributor

@jpechane jpechane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akanimesh7 This is good start, thanks! Could you please take a look at the comments I left?

@@ -253,7 +253,7 @@ public SchemaBuilder schemaBuilder(Column column) {
case PgOid.INT4RANGE_ARRAY:
case PgOid.NUM_RANGE_ARRAY:
case PgOid.INT8RANGE_ARRAY:
return SchemaBuilder.array(SchemaBuilder.OPTIONAL_STRING_SCHEMA);
return SchemaBuilder.array(SchemaBuilder.STRING_SCHEMA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we keep the type optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say the DDL operation was
ALTER TABLE CUSTOMERS ADD COLUMN col2 text[] DEFAULT '{a,b}'::text[] NOT NULL;
So how it works is when the avro schema is created from the connect schema, The default value {"a","b"} for col2 is proposed to the avro schema. It verifies this default value. Refer this. Now since we have kept the type optional so this verification fails since the string "a" is not a null, which is the first type in the array of types in this union.

Having said that, i also see that there's a fix for this which has been merged 5 months before, but there has been no release after that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akanimesh7 OK, TBH I am really not fan of this change if it just by-pass a bug in Avro. To handle this properly we there should be an SMT developed that will flip the schema optionality if needed but I don't think this should go to the core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane Understood. In that case rather than doing it for all array types, can we just do it for text[] and varchar[] ? As soon as the newer version of avro is released, these two also can be corrected. Even earlier, these two types were not supported, hence there is no harm, rather this is just an addition.

On the other hand creation of SMT will need some more time and will also be temporary code that will come in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the problem is that you focus on it from Avro PoV only. But there are also converters for JSON and Protobuf. It does not feel correct to me to emit incorrect schema type for everybody to fix issue in Avro only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. Let's halt this one then until a new version of avro comes out with the bug fix.

@@ -173,6 +176,8 @@ private static Map<String, DefaultValueMapper> createDefaultValueMappers(Timesta
result.put("time", (c, v) -> timestampUtils.toLocalTime(extractDefault(v, "00:00")));
result.put("timestamp", (c, v) -> timestampUtils.toOffsetDateTime(extractDefault(v, "1970-01-01")));
result.put("timestamptz", (c, v) -> timestampUtils.toOffsetDateTime(extractDefault(v, "1970-01-01")).atZoneSameInstant(ZoneOffset.UTC));
result.put("_text", parseNullDefault((c, v) -> createStringArr(extractDefault(v, "{}"))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this issue for _text/_varchar only or are all array datatypes affected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that i think of it, looks like all array datatypes will be affected. Fixed only for these since this was impacting us. Will need a thorough review of other array datatypes.

On the same note the default mapper did not have any array data types to begin with. These are the first few additions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akanimesh7 Would it be possible to provide a sensible default - thus providing at least the empty List/Array for all array types and later on add support for the specific types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's possible. I'll work on that change.

@@ -188,6 +193,25 @@ private static Map<String, DefaultValueMapper> createDefaultValueMappers(Timesta
return result;
}

private static List<String> createStringArr(String arrayString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static List<String> createStringArr(String arrayString) {
private static List<String> parseArrayString(String arrayString) {

@akanimesh7
Copy link
Contributor Author

@jpechane Addressed comments. PTAL

@akanimesh7
Copy link
Contributor Author

@jpechane
Understood your concern and added a proposal here.
I can make the default value change for other array types once we agree on something for the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants