Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[12.x] fix: update postgres grammar date format #51363

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

calebdw
Copy link
Contributor

@calebdw calebdw commented May 9, 2024

Hello!

PostgreSQL supports time[stamp] with time zone fields with a resolution of 1 microsecond. However, the default grammar date format is Y-m-d H:i:s which truncates DateTimeInterface objects that are passed to queries.

This results in invalid queries and records being saved to / returned from the database. For example:

$datetime = Carbon::now('America/Chicago'); // 2024-05-09 15:35:55.123456-05:00
// The timestamp is shifted by 5 hours and loses the microsecond resolution!
Model::create(['datetimetz' => $datetime); // saved in DB: 2024-05-09 15:35:55+00

This PR updates the PostgresGrammar date format to prevent the lose of information:

$datetime = Carbon::now('America/Chicago'); // 2024-05-09 15:35:55.123456-05:00
// The timestamp has been properly converted to UTC for storage!
Model::create(['datetimetz' => $datetime); // saved in db: 2024-05-09 20:35:55.123456+00

Thanks!

@calebdw calebdw force-pushed the postgres_db_timezone branch 3 times, most recently from 72d3ec3 to 567bc3f Compare May 9, 2024 21:29
@taylorotwell
Copy link
Member

Can we make this change on a patch release? Seems breaking?

@taylorotwell taylorotwell marked this pull request as draft May 10, 2024 19:24
@calebdw
Copy link
Contributor Author

calebdw commented May 10, 2024

@taylorotwell,

Postgres only converts the datetime to UTC when the field contains a timezone (e.g., timestamp with time zone), otherwise it just uses the datetime as-is:

timestamp(0) with time zone = '2024-05-09 20:00:00.123456-05:00' // saved as 2024-05-10 01:00:00+00
timestamp(0)                = '2024-05-09 20:00:00.123456-05:00' // saved as 2024-05-09 20:00:00
date                        = '2024-05-09 20:00:00.123456-05:00' // saved as 2024-05-09

Thus, the conversion should only affect fields with timezones, which if people are using then they would already have to be manually converting the Carbon instances to UTC or their datetimes would be incorrect.

Also, if the timestamp precision is not 6, then Postgres will round up the microseconds (per typical rounding rules) to the appropriate precision:

timestamp(0) with time zone = '2024-05-09 20:00:00.923456-05:00' // saved as 2024-05-10 01:00:01+00
timestamp(0)                = '2024-05-09 20:00:00.923456-05:00' // saved as 2024-05-09 20:00:01

This should be fine in Production, however, this might cause some tests to fail by a second difference if they're, say, comparing a now() instance saved to the database with Carbon's ->format()/ to*String (which truncates instead of rounding). The solution is simply enough though, just call setMicroseconds(0) before persisting / converting to a string / freezing time.

I suppose to be on the safe side it would be better to merge to 12.x.

@driesvints
Copy link
Member

I also think the best is to target this at 12.x

@calebdw calebdw changed the base branch from 11.x to master May 13, 2024 14:50
@calebdw calebdw changed the title [11.x] fix: update postgres grammar date format [12.x] fix: update postgres grammar date format May 13, 2024
@calebdw
Copy link
Contributor Author

calebdw commented May 13, 2024

Updated target to 12.x

@driesvints
Copy link
Member

Make sure to mark this PR as ready if you need another review.

@calebdw calebdw marked this pull request as ready for review May 13, 2024 14:57
@taylorotwell taylorotwell marked this pull request as draft May 13, 2024 17:17
@tpetry
Copy link
Contributor

tpetry commented May 21, 2024

I encountered the issue before too but changing it by default resulted in bc breaks in edge cases. As microseconds and timezones are not used often, I am not sure whether we should complicate it. Especially as MySQL and PostgreSQL will then from now on (with default settings) behave differently.

I solved this in my extended PostgreSQL driver by using traits which is an easy thing to implement. So anyone departing from the default can change the model.

But I personally would like to not add more behavioural differences between using MySQL and PostgreSQL.

@calebdw
Copy link
Contributor Author

calebdw commented Jun 12, 2024

@tpetry,

I understand your concern, but I believe that the benefits far outweigh the cons, improve the developer experience, and reduce the potential for developer mistakes.

Current Problem

As it currently stands, passing a string to Eloquent / QueryBuilder works exactly as expected, but passing a Carbon instance truncates the microseconds and timezone information causing serious data integrity issues. The Carbon instance contains all the information needed to fully define that timestamp---I should not lose information or wind up with an incorrectly serialized string when I pass a Carbon instance to Eloquent / QueryBuilder. I also should not experience different behaviour between passing a Carbon instance and passing an equivalent timestamp string.

Postgres Rounding Issue

changing it by default resulted in bc breaks in edge cases

You don't elaborate on what these edge cases are, but I suspect that you are referring to Postgres rounding the microseconds when using a non-default precision.

I just updated our enterprise application to override the Postgres Grammar getDateFormat about a month ago, with no major issues / fallout. To avoid this rounding issue, I converted all timestamps to timestamp with timezone using the default precision of 6 (microseconds). All Postgres timestamps are stored using 8 bytes regardless of the precision or if it's with/without timezone, as detailed in the Postgres docs. This means that there is no storage benefit to using a lower precision, and using the default precision (6) completely negates the rounding issue.

Automatic Timezone Conversion

The main driver behind this push was to ensure that all timestamps are consistently handled and converted to UTC by the framework/database (and not by the developer) before being persisted or queried. Keep in mind that the Postgres timestamp with timezone does not actually store the timezone in the database, it just means that Postgres will automatically convert a timestamp to UTC before persisting / querying. When using without timezone the timestamp is used as-is without conversion, so if you pass a timestamp in a different timezone than your database (which should be in UTC), it will be stored as-is and not converted (which is very bad!).

Before this, we had simply defaulted to timestamp(0) without timezone, however, this meant that the developer had to manually convert every Carbon instance to UTC before passing it to Eloquent / QueryBuilder. This was a great source of bugs which resulted in incorrect timestamps being persisted to the database and incorrect results being retrieved from the database if the developer forgot to manually convert the Carbon instance to UTC.

Moving all timestamps to timestamp with timezone greatly improved the developer experience, cleaned up code, and eliminated an entire class of bugs because now timezone conversion is handled by the framework/database instead of the developer.

Additional Thoughts

As microseconds and timezones are not used often, I am not sure whether we should complicate it.

That may be so, but they are used and it is worth the effort to ensure that they are handled correctly and consistently. I firmly believe that when using Postgres all timestamps should be timestamp with timezone with the default precision (6), I don't see any valid reason to deviate from this. This solves a lot of timezone conversion headaches and negates any rounding issues. Perhaps the Laravel documentation should be updated to reflect this best practice when using Postgres?

I solved this in my extended PostgreSQL driver by using traits which is an easy thing to implement.

I took a look at your traits, and you only override the Model getDateFormat which just affects how Model timestamp properties are mutated before being persisted to the database. This is a good start, but it does not solve the underlying issue and is only half the solution. You do not override the getDateFormat on the Postgres Grammar which means Carbon instances that are passed to Eloquent / QueryBuilder queries are still being serialized incorrectly which can result in faulty / buggy queries. This also creates inconsistent behaviour in how Carbon instances are serialized depending on the context (being filled in a Model or passed to a query) which will only further be a point of confusion for the developer.

But I personally would like to not add more behavioural differences between using MySQL and PostgreSQL.

Postgres and MySQL are different, and we should leverage these differences to their strengths rather than trying to find a poor middle ground.

Conclusion

By unilaterally adopting timestamptz/timestamp with timezone/timestamp(6) with timezone (all the same) for all of our timestamps, we:

  • leverage Postgres' automatic timezone conversion to UTC
  • eliminate the need for developers to manually convert Carbon instances to UTC
  • eliminate the rounding issue by using the default precision

The proposed change:

  • ensures that Carbon instances are serialized correctly
  • ensures a consistent behaviour between strings and Carbon instances that are passed to Eloquent / QueryBuilder

Timestamp Conversion Query

For anyone who is interested, here is the query I used to convert all timestamps across the board:

Note that we only use the public schema but this can easily be modified to work with multiple schemas.

<?php

$columns = DB::select(<<<'SQL'
    SELECT table_name, column_name
    FROM information_schema.columns
    WHERE table_schema = 'public'
        AND data_type IN ('timestamp without time zone', 'timestamp with time zone');
    SQL);

collect($columns)
    ->groupBy('table_name')
    ->each(function ($columns, $table): void {
        $alterStatement = collect($columns)
            ->map(fn ($column) => <<<SQL
                ALTER COLUMN {$column->column_name} SET DATA TYPE timestamp(6) with time zone
                SQL)
            ->join(', ');

        DB::statement(<<<SQL
            ALTER TABLE public.{$table}
            {$alterStatement};
            SQL);
    });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants