[PostgreSQL] Improve parsing support for postgresql data types #73

ntma · 2021-10-18T10:05:09Z

This PR will add:

a new property default_value_raw to the types/column.ts that will contain the default raw value inspected from schema definitions;
the dependency pg-types, that contains a good collection of parsers for PostgreSQL data types.

New types parsed:

timestamps/dates;
arrays in the format '{}';

Caveat

The module pg-types uses the data type oid to find the appropriate parser for the column type. In contrast, knex-schema-inspector uses information_schema.data_type. To bridge the pg-types approach to the knex-schema-inspector, this PR uses this map that may require to be updated in the future.

Tasks

add new property default_raw_value
update all dialects to take into account default_value_raw;
integrate pg-types to support pg native data types;
additional logic to parse arrays;
update unit tests;

Closes #72

rijkvanzanten · 2021-10-18T15:36:00Z

This is great @ntma! Way stabler approach to casting those default values 👍🏻

ntma · 2021-10-19T12:18:27Z

Glad you liked it @rijkvanzanten, but there are some caveats unfortunately. I'll try to post them as soon as possible and perhaps we can discuss a solution for them.

rijkvanzanten · 2021-10-19T15:39:31Z

Sounds good 👍🏻

ntma · 2021-10-19T16:00:48Z

The biggest caveat is that this PR works for the postgres built-in types, BUT array's are not considered a built-in. Hence their default values will be parsed as plain string's...

Still, I believe we can make this PR work with a few adjustments. Hope I can be clear in the explanation.

Intro for the solution

When retrieving column information, knex-schema-inspector queries the view information_schema.columns and parses the column type's from the data_type (pointer to code). But the postgres documentation states that the data_type translates into:

Data type of the array elements, if it is a built-in type, else USER-DEFINED (in that case, the type is identified in udt_name and associated columns).

Which means that data_type is unable to accurately identify all the data types in a schema (built-in and user-defined types).

To test this, run the following query for a table with a column of type array:

SELECT 
   table_name, 
   column_name, 
   data_type, -- this will return ARRAY for array's
   udt_name  -- this will return the actual array type
FROM 
   information_schema.columns
WHERE 
   table_name = 'YOUR_TABLE_HERE';

Possible solution

The idea is to start using the udt_name to identify column type's (or keep both but use the udt_name for the parsing).

The first step would be adding the udt_name to the query on information_schema.columns and join it with the following query to retrieve the data type Id:

SELECT
   oid, -- data type id
   typname -- udt_name in the information_schema
FROM pg_type

Then change the signature of parseDefaultValue from (type: string) to (columnDefault: string | null, type: integer) where:

columnDefault - column default value (ie: "ARRAY[]::json[]");
type - column type oid (ie: 199);

And finally use the oid to retrieve the appropriate parser.

rijkvanzanten · 2021-10-20T13:58:25Z

Woa, yeah that got way more complicated than I thought haha. Time to call in the cavalry. @Oreilles any thoughts on the matter?

Oreilles · 2021-10-20T14:34:39Z

knex-schema-inspector/lib/dialects/postgres.ts

Line 69 in 0f5f983

let [value, cast] = type.split('::');

This is still fundamentally wrong anyway, because it cannot handle expression, eg. NOW() + '2 hours'::interval or anything involving function or operators. Since there is no error handling, I assume that return parser(value) will crash with an expression.

Another thing I think of is that since Postgres has built-in type casting, the actual column type might differ from the default value expression return type, which could there again crash the parser.

So I think you'd still need to add something like

try {
       return parser(value);
catch(e) {
    return type
}

But I think it is a dark pattern to make the defaultValue property hold either a proper Javascript object or the original SQL expression. You'd need to check everywhere in further code if that defaultValue is usable, and if the column is already of a string-like type, you might not even be able to make the difference.

ntma · 2021-10-20T14:57:21Z

This is still fundamentally wrong anyway, because it cannot handle expression, eg. NOW() + '2 hours'::interval or anything involving function or operators. Since there is no error handling, I assume that return parser(value) will crash with an expression.

I didn't test it, but it will probably not crash because interval belongs to the set of built-ins in pg-types:
https://github.com/brianc/node-pg-types/blob/8594bc6befca3523e265022f303f1376f679b5dc/lib/builtins.js#L49

Which means we could get the proper parser for it:
https://github.com/brianc/node-pg-types/blob/8594bc6befca3523e265022f303f1376f679b5dc/lib/textParsers.js#L174

But yes, definitely needs a try/catch for some edge cases.

Another thing I think of is that since Postgres has built-in type casting, the actual column type might differ from the default value expression return type, which could there again crash the parser.

That's true... but I think you will need the column type in some cases. If we define a column of type boolean and default it to true, the default value will be parsed as a string because there is no casting.

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we simply return null if the parsing goes wrong?

Oreilles · 2021-10-20T15:01:53Z

I didn't test it, but it will probably not crash because interval belongs to the set of built-ins in pg-types:

It would not crash because the value is an interval, but because NOW() + '2 hours' will be parsed as an interval when that's an expression, not a string representation of a postgres interval.

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we simply return null if the parsing goes wrong?

This is misleading, because you then couldn't make the difference between a field that actually doesn't have a default value, and a field of which you just couldn't resolve the default value expression to a javascript object.

rijkvanzanten · 2021-10-20T15:08:39Z

Regarding the defaultValue @Oreilles , the point is always holding a proper Javascript object right? If so, shouldn't we
simply return null if the parsing goes wrong?

This is misleading, because you then couldn't make the difference between a field that actually doesn't have a default value, and a field of which you just couldn't resolve the default value expression to a javascript object.

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value 🤔

Oreilles · 2021-10-20T15:10:14Z

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value 🤔

I think that's the best solution

ntma · 2021-10-20T15:22:29Z

Wondering if we should return a "defaultValueRaw" that doesn't do any parsing magic next to the "defaultValue" that does a best effort to parse it into a workable JS value thinking

I think that's the best solution

And should we only focus built-in types for now?

rijkvanzanten · 2021-10-20T20:45:03Z

And should we only focus built-in types for now?

I think so 🤔 I'm having some trouble imagining how we'd even be able to support non-built-ins, as there's a theoretical infinite amount of them

ntma · 2021-10-21T21:06:44Z

I think so 🤔 I'm having some trouble imagining how we'd even be able to support non-built-ins, as there's a theoretical infinite amount of them

Alright, I'll make it simple then 👍

… types map for information_schema.data_type | updated unit tests

ntma · 2021-11-02T11:51:16Z

Hello guys,

Sorry for the delay in the development but I've been extremely busy in the last weeks.

So, with this last commit I added:

the default_value_raw property to the Column interface, which will contain the raw default value when parsing the schema information (code here);
a map to complement the built-in postgres data types to increase the parsing coverage of knex (mainly because aliases) (code here);
the multiple_defaults table in the unit tests (code here);

If everyone agrees with these changes I will update the remaining database dialects/unit tests to include the default_value_raw.

rijkvanzanten · 2021-11-02T14:43:53Z

Sorry for the delay in the development but I've been extremely busy in the last weeks.

No worries, same here!

If everyone agrees with these changes I will update the remaining database dialects/unit tests to include the default_value_raw.

Looks good to me! @Oreilles thoughts?

Oreilles · 2021-11-03T11:01:49Z

Looks good!
Would be nice to have a test for #72 too since this was the original reason for this PR.

ntma · 2021-11-03T15:33:52Z

@Oreilles Agreed! But parsing arrays was still not working because postgres parses the default value if it is an array. A value declared like this:

ARRAY['{"text":"Lorem Ipsum"}', '{"text":"Lorem Ipsum"}']::json[]

will look like the following in the information_schema table:

ARRAY['{"text":"Lorem Ipsum"}'::json, '{"text":"Lorem Ipsum"}'::json]

And so, to account arrays I added an extra case to the parsing logic (code here). It feels a bit naive considering that any , that doesn't act as the array separator will split the parts incorrectly (in which case null is returned). But validates these unit tests.

…_raw

…ue_raw

ntma · 2021-12-22T16:30:13Z

Hi guys,

I hate leaving unfinished businesses behind, so I gave another round to this PR. With the latest changes:

all dialects are using the new default_value_raw, that contains the unparsed column default;
arrays in the format of '{el1,el2,el3}' are parsable;
arrays in the format of ARRAY[el1,el2,el3] are not parsable, as in some cases the information_schema.data_type is not enough to identify the array element types. Empty arrays will still return their Javascript version though.
any unparsed value will return null.

For now, I think this is as far as this PR will go. A more complete version will probably require considerate changes to the code.

Any questions or additional fixes that are vital to merge this PR, please let me know 👍

garrylachman · 2023-02-13T23:43:59Z

lib/dialects/postgres.ts


-  let [value, cast] = type.split('::');
+    let [value, cast] = column_default.split('::');


If the value contains "::", i think you must search the last "::"

[postgres.ts] added pg-types lib to parse postgresql data types

28d8e63

ntma changed the title ~~[PostgreSQL] Completed parsing support for postgresql data types~~ WIP [PostgreSQL] Completed parsing support for postgresql data types Oct 18, 2021

[postgres] added default_value_raw to Column interface | builtin data…

3c37128

… types map for information_schema.data_type | updated unit tests

ntma added 2 commits November 3, 2021 15:08

[postgres] added unit tests for default values as json[]/jsonb[]

c8d9b2c

[postgres] added parsing function for default values of type ARRAY

38fc7d2

ntma and others added 5 commits November 5, 2021 12:36

[postgres] improved array parser to account the {} format

9f27b8c

[mysql] updated mysql dialect and unit tests to account default_value…

c1e3fe6

…_raw

[mysql] updated mysql dialect and unit tests to account default_value…

b6a597d

…_raw

[sqlite] updated sqlite dialect and unit tests to account default_val…

59aaa7b

…ue_raw

[oracle] updated oracle dialect and unit tests to account default_val…

fd24e2b

…ue_raw

ntma changed the title ~~WIP [PostgreSQL] Completed parsing support for postgresql data types~~ [PostgreSQL] Improve parsing support for postgresql data types Dec 21, 2021

[postgres] less complete but more stable solution for parsing arrays

395a440

rijkvanzanten mentioned this pull request Jan 18, 2022

New Item has changes even without any modifications directus/directus#11088

Closed

3 tasks

azrikahar mentioned this pull request Jun 6, 2022

Search does not work on postgres citext fields directus/directus#13726

Closed

garrylachman reviewed Feb 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PostgreSQL] Improve parsing support for postgresql data types #73

[PostgreSQL] Improve parsing support for postgresql data types #73

ntma commented Oct 18, 2021 •

edited

rijkvanzanten commented Oct 18, 2021

ntma commented Oct 19, 2021

rijkvanzanten commented Oct 19, 2021

ntma commented Oct 19, 2021 •

edited

rijkvanzanten commented Oct 20, 2021

Oreilles commented Oct 20, 2021

ntma commented Oct 20, 2021 •

edited

Oreilles commented Oct 20, 2021 •

edited

rijkvanzanten commented Oct 20, 2021

Oreilles commented Oct 20, 2021

ntma commented Oct 20, 2021

rijkvanzanten commented Oct 20, 2021

ntma commented Oct 21, 2021

ntma commented Nov 2, 2021

rijkvanzanten commented Nov 2, 2021

Oreilles commented Nov 3, 2021

ntma commented Nov 3, 2021 •

edited

ntma commented Dec 22, 2021

garrylachman Feb 13, 2023


		let [value, cast] = type.split('::');
		let [value, cast] = column_default.split('::');

[PostgreSQL] Improve parsing support for postgresql data types #73

Are you sure you want to change the base?

[PostgreSQL] Improve parsing support for postgresql data types #73

Conversation

ntma commented Oct 18, 2021 • edited

Caveat

Tasks

rijkvanzanten commented Oct 18, 2021

ntma commented Oct 19, 2021

rijkvanzanten commented Oct 19, 2021

ntma commented Oct 19, 2021 • edited

Intro for the solution

Possible solution

rijkvanzanten commented Oct 20, 2021

Oreilles commented Oct 20, 2021

ntma commented Oct 20, 2021 • edited

Oreilles commented Oct 20, 2021 • edited

rijkvanzanten commented Oct 20, 2021

Oreilles commented Oct 20, 2021

ntma commented Oct 20, 2021

rijkvanzanten commented Oct 20, 2021

ntma commented Oct 21, 2021

ntma commented Nov 2, 2021

rijkvanzanten commented Nov 2, 2021

Oreilles commented Nov 3, 2021

ntma commented Nov 3, 2021 • edited

ntma commented Dec 22, 2021

garrylachman Feb 13, 2023

Choose a reason for hiding this comment

ntma commented Oct 18, 2021 •

edited

ntma commented Oct 19, 2021 •

edited

ntma commented Oct 20, 2021 •

edited

Oreilles commented Oct 20, 2021 •

edited

ntma commented Nov 3, 2021 •

edited