Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) #126

mdesmet · 2022-09-09T11:40:03Z

Describe the feature

The following issue in: Trino trinodb/trino#6658 requires us to override the current_timestap macro for the Iceberg catalog.

Trino uses 3 digits of subsecond precision in Date and time functions, but Iceberg expects timestamp with 3 digits of subsecond precision.

As a result we can't create Iceberg table from Trino

create table iceberg.examples.test
as select now() as n;
Exception:
Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.

Once this issue is resolved, we can remove this documented limitation.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

RobbertDM · 2022-12-28T14:36:50Z

I just tried making a dbt snapshot with a dbt-trino connector that eventually end up in iceberg files.
Underlying, dbt snapshots use current_timestamp, so I also got the following error:

13:58:41    TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.", query_id=20221228_135841_01246_szyby)

I then modified the current_timestamp macro in the dbt-trino adapter by casting to timestamp(6), as in RobbertDM@6e8d13d

Do I get correctly from this thread and the one linked above that this problem will soon be resolved, and this modification to the current_timestamp will not be necessary anymore?

RobbertDM · 2022-12-29T15:36:38Z

I dove a bit deeper and had some additional issues with current_timestamp. When taking a snapshot, it would always put a wrong updated_at field in the snapshot table, always off by one hour. That is because it appears to take the UTC date and then cast it to timestamp(6) (which is the same as timestamp(6) without time zone).

When using with time zone as in https://github.com/starburstdata/dbt-trino/pull/204/files , the timestamps are nicely stored as UTC. I opened a PR #204.

mdesmet · 2022-12-29T19:25:25Z

Note that current_timestamp already returns timestamp with time zone. See trino docs.

To make it compatible with Iceberg you should just override the macro as mentioned in the docs.

{% macro trino__current_timestamp() %}
    current_timestamp(6)
{% endmacro %}

RobbertDM · 2022-12-30T09:44:13Z

Thanks! Indeed, the cast is unnecessary. Changed it in the PR.

Do you think it makes sense to have current_timestamp(6) there by default, so that not every iceberg user has to manually modify a macro file right after running pip install dbt-trino? Would it break anything to have current_timestamp(6) in there?

Or does the above thread (linked in your original post) resolve the issue altogether?

mdesmet · 2023-01-02T12:10:08Z

iMHO we shouldn't switch the default behaviour to suit Iceberg. Changing it will break behaviour for Delta and Hive.

Ideally it is solved in the engine itself. I have filed a PR on Trino to fix this but it still under review.

RobbertDM · 2023-01-02T14:43:17Z

Ok, makes sense. Thanks a lot for following up so quickly! :)

RobbertDM · 2023-01-27T15:21:30Z

I found another case where this timestamp(3) vs. timestamp(6) with time zone thing causes trouble. If you run dbt snapshot on a source table, then add a timestamp(6) with time zone column, run dbt snapshot again, it will get into this do create_columns statement at https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql#L44-L58

I believe that either there, or in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql#L11 , a part of this data type gets lost, because the query that's eventually run on starburst is alter table "iceberg"."dbt_snapshots"."my_snapshot" add column "extract_ts" timestamp with time zone.

So somewhere, this (6) gets lost 🤔
Of course, that results in

TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.", query_id=20230127_150706_02226_b7axv)

Edit:

I narrowed it down to:
adapter.get_missing_columns(staging_table, target_relation) is returning

columns[TrinoColumn(column='extract_ts', dtype='timestamp with time zone', char_size=6, numeric_precision=None, numeric_scale=None)]

While, when describe "iceberg"."dbt_robbert"."my_snapshot__dbt_tmp";, it says for that missing column timestamp(6) with time zone

Is this a bug in get_missing_columns? Should I make a new issue?

hovaesco · 2023-01-30T10:17:48Z

Good find. I don't think dbt-trino implements get_missing_columns it looks like it's being taken from dbt-core, so I assume that you should create an issue there.

RobbertDM · 2023-01-30T10:18:55Z

I will, thank you!

mdesmet · 2023-01-30T12:19:17Z

I don't think there is a bug in dbt-core. I think we now fill up char_size with the precision of time types while actually char_size is used for ensuring characters are not cut off and expanded. In the case of time types we should not alter the type.
@damian3031 : this is related to some code you changed recently

RobbertDM · 2023-01-30T13:22:56Z

Ah indeed, it seems like

dbt-trino/dbt/adapters/trino/column.py

Lines 32 to 36 in fa901fa

    
           data_type = match.group("type") 
        
           size_info = match.group("size") 
        
           data_type_suffix = match.group("type_suffix") 
        
           if data_type_suffix: 
        
               data_type += data_type_suffix

is the culprit.

I'm not sure why you're rebuilding the data type from the regex matches instead of passing on the raw one, @damian3031 ?

damian3031 · 2023-01-30T18:31:30Z

The reason is that from_description should return data_type and char_size (or scale and precision) as separate tuple elements. So, if raw_data_type argument is timestamp(6) with time zone, this method returns timestamp with time zone as data_type and 6 as char_size.
I think that it is correct implementation regarding purpose of this method.
I will take closer look at this specific use case, maybe we should somehow adjust this method or code which is invoking it.

mdesmet · 2023-01-30T23:40:01Z

I think it will be safer to not alter the type if not string or numerical. I think dbt only widens string types. The variable name charsize seems also to point to that.

damian3031 · 2023-01-31T09:10:54Z

So do you mean to extend this condition to sth like:

if raw_data_type.startswith(("array", "map", "row", "date", "time", "interval", "boolean")):
    return cls(name, raw_data_type)

@mdesmet?

mdesmet · 2023-02-07T12:49:07Z

Yes, i would also invert the logic to only do the size detection for numeric and string types.

RobbertDM · 2023-02-08T15:48:02Z

The reason is that from_description should return data_type and char_size (or scale and precision) as separate tuple elements. So, if raw_data_type argument is timestamp(6) with time zone, this method returns timestamp with time zone as data_type and 6 as char_size. I think that it is correct implementation regarding purpose of this method. I will take closer look at this specific use case, maybe we should somehow adjust this method or code which is invoking it.

Is there a spec somewhere that describes this? I find it a bit weird to chop the datatype in pieces 🤔

I'm using https://github.com/calogica/dbt-expectations/blob/main/macros/schema_tests/column_values_basic/expect_column_values_to_be_in_type_list.sql right now, and again failing because I have columns that are DECIMAL(20,4), but in there, they also just call adapter.get_columns_in_relation and access only the dtype fields.

Could it be an option to fill in char_size where needed, but always return raw_data_type as data_type?

RobbertDM · 2023-02-08T15:57:16Z

Btw I also have this problem with varchar(20).

RobbertDM · 2023-02-08T16:20:28Z

Hmmm, trino itself seems to indeed separate the datatype and precision:
https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/base/column.py#L124-L160

Then I guess dbt-expectations is wrong in only accessing dtype

RobbertDM · 2023-02-08T16:29:03Z

Apparently, the solution is to use column.data_type, not column.dtype.
See https://docs.getdbt.com/reference/dbt-classes#column and https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/base/column.py#L41

RobbertDM · 2023-02-08T16:45:46Z

^ this is me discovering the difference between data_type and dtype 🤦
In the snapshot logic, it already uses data_type so you can ignore my last 4 comments.
Sorry for the noise. Carry on :-)

mdesmet · 2023-07-29T10:06:47Z

Once this Trino pr is merged, we can remove those docs.

mdesmet added the enhancement New feature or request label Sep 9, 2022

hovaesco changed the title ~~Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP93)~~ Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) Sep 9, 2022

RobbertDM mentioned this issue Dec 29, 2022

Update current_timestamp macro to precision 6 with time zone #204

Closed

4 tasks

RobbertDM mentioned this issue Feb 27, 2023

Adapter methods return column types as is, except for decimal, varchar and char. #242

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) #126

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) #126

mdesmet commented Sep 9, 2022

RobbertDM commented Dec 28, 2022

RobbertDM commented Dec 29, 2022

mdesmet commented Dec 29, 2022

RobbertDM commented Dec 30, 2022

mdesmet commented Jan 2, 2023

RobbertDM commented Jan 2, 2023 •

edited

RobbertDM commented Jan 27, 2023 •

edited

hovaesco commented Jan 30, 2023

RobbertDM commented Jan 30, 2023

mdesmet commented Jan 30, 2023

RobbertDM commented Jan 30, 2023

damian3031 commented Jan 30, 2023

mdesmet commented Jan 30, 2023

damian3031 commented Jan 31, 2023

mdesmet commented Feb 7, 2023

RobbertDM commented Feb 8, 2023 •

edited

RobbertDM commented Feb 8, 2023

RobbertDM commented Feb 8, 2023 •

edited

RobbertDM commented Feb 8, 2023

RobbertDM commented Feb 8, 2023

mdesmet commented Jul 29, 2023

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) #126

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3) #126

Comments

mdesmet commented Sep 9, 2022

Describe the feature

Describe alternatives you've considered

Who will benefit?

Are you willing to submit PR?

RobbertDM commented Dec 28, 2022

RobbertDM commented Dec 29, 2022

mdesmet commented Dec 29, 2022

RobbertDM commented Dec 30, 2022

mdesmet commented Jan 2, 2023

RobbertDM commented Jan 2, 2023 • edited

RobbertDM commented Jan 27, 2023 • edited

hovaesco commented Jan 30, 2023

RobbertDM commented Jan 30, 2023

mdesmet commented Jan 30, 2023

RobbertDM commented Jan 30, 2023

damian3031 commented Jan 30, 2023

mdesmet commented Jan 30, 2023

damian3031 commented Jan 31, 2023

mdesmet commented Feb 7, 2023

RobbertDM commented Feb 8, 2023 • edited

RobbertDM commented Feb 8, 2023

RobbertDM commented Feb 8, 2023 • edited

RobbertDM commented Feb 8, 2023

RobbertDM commented Feb 8, 2023

mdesmet commented Jul 29, 2023

RobbertDM commented Jan 2, 2023 •

edited

RobbertDM commented Jan 27, 2023 •

edited

RobbertDM commented Feb 8, 2023 •

edited

RobbertDM commented Feb 8, 2023 •

edited