[Bug] dbt source freshness expected a timestamp but received a string #631

guilhermenoronha · 2024-04-25T18:57:33Z

Is this a new bug in dbt-athena?

I believe this is a new bug in dbt-athena
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I'm trying to run a freshness check on some of my tables using a timestamp column. However, I'm getting an error saying the check found a str column instead of a timestamp one. I believe this is a bug, because athena shows my column as a timestamp according to the following image.

The error I'm getting is also in the following image:

My dbt_athena package in in version 1.7.2. The query generated by the freshness code is below as the results it produced

select
      max(loaded_at) as max_loaded_at,
      cast(now() as timestamp) as snapshotted_at
    from "awsdatacatalog"."bronze_dev"."callbacks"

Expected Behavior

The dbt source freshness command should be able to run successfully on my source tables. Otherwise, it should point some error related with the SLA configured, not throw an error related with casting types.

Steps To Reproduce

Have a table on athena with a timestamp column (In my case the column was generated by a pyspark script executed on aws glue)
Configure the dbt freshness on the same table:
version: 2

sources:
  - name: "bronze_dev}"
    tables:
      - name: my_table
        freshness:
          error_after:
            count: 2
            period: day
        loaded_at_field: loaded_at

run the command dbt source freshness

Environment

- OS: Windows
- Python: ^3.11
- dbt: 1.7.9
- dbt-athena-community: 1.7.2

Additional Context

No response

The text was updated successfully, but these errors were encountered:

nicor88 · 2024-05-11T11:49:24Z

Leaving this for anyone picking this up.
The first attempt, we could overwrite collect_freshness macro, e.g. here

Alternatively, but a bit more complicated, on the python side, we can overwrite this method in the adapter:

    def calculate_freshness_from_metadata(
        self,
        source: BaseRelation,
        macro_resolver: Optional[MacroResolverProtocol] = None,
    ) -> Tuple[Optional[AdapterResponse], FreshnessResponse]:
....

sanromeo · 2024-05-30T14:01:22Z

From my side I can add than max(<ts_column>) as max_loaded_at doesn't work fine for dbt source freshness query. For large tables it comes to scan full data without partition predicate. It seems like for max, min functions in Iceberg tables don't using Iceberg metadata. From another side I can say that querying metadata scan almost nothing (few kb).

I propose override collect_freshness macro and depends on table_type and partition column use different queries for dbt source freshness command:
For non-partitioned Iceberg tables and all Hive tables use default query:

select
      max(<ts_column>) as max_loaded_at,
      {{ current_timestamp() }} as snapshotted_at
    from <source_table>

For partitioned Iceberg tables use the following query:

select
      max(data.<partitioned_by_ts_column>.max) as max_loaded_at
from "<source_schema>"."<source_table>$partitions"

guilhermenoronha added the bug Something isn't working label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] dbt source freshness expected a timestamp but received a string #631

[Bug] dbt source freshness expected a timestamp but received a string #631

guilhermenoronha commented Apr 25, 2024

nicor88 commented May 11, 2024

sanromeo commented May 30, 2024

[Bug] dbt source freshness expected a timestamp but received a string #631

[Bug] dbt source freshness expected a timestamp but received a string #631

Comments

guilhermenoronha commented Apr 25, 2024

Is this a new bug in dbt-athena?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Additional Context

nicor88 commented May 11, 2024

sanromeo commented May 30, 2024