Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to_utc_datetime (?) #3899

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Fix to_utc_datetime (?) #3899

wants to merge 9 commits into from

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Mar 15, 2024

Changes

I'm not sure if the tests use some special date but it seems like the conversion was wrong as Santiago is not the same as GMT.

I noticed this problem in https://github.com/plausible/analytics/actions/runs/8289969596/job/22687289421?pr=3898

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

@ruslandoga ruslandoga requested a review from a team March 15, 2024 01:54

%Timex.AmbiguousDateTime{after: after_dt} ->
Timex.Timezone.convert(after_dt, "UTC")
{:gap, _before_dt, after_dt} ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases before dt might be preferred. Like maybe in where: ^to_utc_datetime(naive, site.timezone) <= timestamp so I've added a third argument :up | :down which defaults to :up (current behaviour)

@ruslandoga ruslandoga changed the title Fix timezones (?) Fix to_utc_datetime (?) Mar 15, 2024
@ruslandoga ruslandoga mentioned this pull request Mar 15, 2024
4 tasks
@@ -23,7 +23,7 @@ defmodule Plausible.TimezonesTest do
assert to_utc_datetime(~N[2022-09-11 00:00:00], "Etc/UTC") == ~U[2022-09-11 00:00:00Z]

assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
~U[2022-09-11 00:00:00Z]
~U[2022-09-11 04:00:00Z]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for spotting it @ruslandoga.

On CI the failure was different:

     left:  ~U[2022-09-11 03:00:00Z]
     right: ~U[2022-09-11 00:00:00Z]

As much as I'm in favor of using stdlib over Timex, I need better clarity on what happened and if we're doing the right thing here. Why has the test started failing? Was it updated TZ DB or DST change or something else entirely? Is stdlib using up to date information?

My confidence/knowledge about timezones is lacking unfortunately.

This was untested before 518cdb3.

America/Santiago was used here because it was problematic in a certain case - see 5c8f39a.

I expect more issues the coming days as lots of DSTs happen in March?

Paging @zoldar too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the test started failing?

My guess at the time of this PR was that I started using a new actions@cache key. Now I'm not sure.

I'll need to read up on those PRs and check Timex logic to answer the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On CI the failure was different:

Yeah, on CI it keeps switching between 03:00 and 04:00

I don't know the reason yet. But I'd like to blame tzdata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also #3873 done recently (and reverted via #3885)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, saw that one: #3811 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it updated TZ DB or DST change or something else entirely?

Seems like a bug in Timex.

master
assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
             ~U[2022-09-11 00:00:00Z]

Timex uses Tzdata in a way that doesn't resolve America/Santiago and Plausible falls back to UTC

Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
  Timex.Protocol.NaiveDateTime.to_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
    Timex.Timezone.convert(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
      Timex.Timezone.get("America/Santiago", ~N[2022-09-11 00:00:00]) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
        Timex.Timezone.name_of("America/Santiago") # "America/Santiago"
          Tzdata.zone_exists?("America/Santiago") # true
            Tzdata.zone_list() # [...]
        Timex.Timezone.get_info("America/Santiago", ~N[2022-09-11 00:00:00], :wall) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
          Timex.to_gregorian_seconds(~N[2022-09-11 00:00:00]) # 63830073600 
            Timex.Protocol.NaiveDateTime.to_gregorian_seconds(~N[2022-09-11 00:00:00]) # 63830073600
          Tzdata.zone_exists?("America/Santiago") # true
            Tzdata.zone_list() # [...]
          Timex.Timezone.resolve("America/Santiago", 63830073600, :wall) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
            Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) # []
              Tzdata.possible_periods_for_zone_and_time("America/Santiago", 63830073600, :wall) # {:ok, []}
              Tzdata.do_consecutive_matching([], match_fn, [], false) # []

Here nesting corresponds to position in the call stack. The calls go from top to bottom. After comment is the return value. The returned values (after #) go in the reverse direction (up).

The important function call that makes everything fail is Timex.Timezone.resolve. It doesn't check for gaps when Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) returns []

fix-timezones
assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
             ~U[2022-09-11 04:00:00Z]

DateTime uses Tzdata (Timex just relays to it in https://github.com/bitwalker/timex/blob/7f584cef5794a5eac320ba957d17370849b0b212/lib/timezone/database.ex#L37) differently and resolves the timezone. It checks for gaps when Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) returns []

Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
  DateTime.from_naive(~N[2022-09-11 00:00:00], "America/Santiago")
    Calendar.get_time_zone_database() # Timex.Timezone.Database
    DateTime.from_naive(~N[2022-09-11 00:00:00], "America/Santiago", Timex.Timezone.Database) # {:gap, #DateTime<2022-09-10 23:59:59.999999-04:00 -04 America/Santiago>, #DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>}
      Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:gap, {%{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}, ~N[2022-09-11 00:00:00]}, {%{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}, ~N[2022-09-11 01:00:00]}}
        Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) # []
        Tzdata.TimeZoneDatabase.gap_for_time_zone("America/Santiago", 63830073600) # {:gap, {%{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}, ~N[2022-09-11 00:00:00]}, {%{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}, ~N[2022-09-11 01:00:00]}}
          Tzdata.periods("America/Santiago") # {:ok, [...]}
      DateTime.from_naive_with_period(~N[2022-09-10 23:59:59.999999], "America/Santiago", %{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}) # #DateTime<2022-09-10 23:59:59.999999-04:00 -04 America/Santiago>
      DateTime.from_naive_with_period(~N[2022-09-11 01:00:00], "America/Santiago", %{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}) #DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>
  DateTime.shift_zone!(#DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>, "Etc/UTC", Timex.Timezone.Database)
  # etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timex is not checking for gaps when doing timezone conversion. This means all "winter time to summer time" gaps would be unhandled. Timex does handle "summer to winter" (ambiguous) time however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the test started failing?

tzdata's fault ;)

master

iex> Tzdata.tzdata_version()
"2021e" # this version didn't include gap for Santiago on 2022-09-11 (probably cause it's in the future?)
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 03:00:00Z]
iex> Tzdata.DataLoader.download_new()
{:ok, 451270, "2024a",
 "/var/folders/cr/fjw4jzx53ybbrl325z6nwd6h0000gn/T/tzdata_data/tmp_downloads/451270_80870041/",
 "Thu, 01 Feb 2024 18:40:48 GMT"}
iex> Tzdata.DataBuilder.load_and_save_table()
{:ok, 451270, "2024a"}
iex> Tzdata.EtsHolder.new_release_has_been_downloaded()
:ok
iex> Tzdata.tzdata_version()
"2024a"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 00:00:00Z]

So depending on when the tests for timezones run, tzdata might or might have not had enough time to update the database. If the version is 2021e, the tests fail with ~U[2022-09-11 03:00:00Z] != ~U[2022-09-11 00:00:00Z]. It showed up in my ci branch since I started using a new cache key and tzdata wasn't cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3811 is also partly at fault since it started using System.tmp_dir! for storing tzdata stuff which is not cached in CI, that causes the flipping in #3899 (comment)

fix-timezones

iex> Tzdata.tzdata_version()
"2021e"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 03:00:00Z]
iex> Tzdata.DataLoader.download_new()
{:ok, 451270, "2024a",
 "/var/folders/cr/fjw4jzx53ybbrl325z6nwd6h0000gn/T/tzdata_data/tmp_downloads/451270_80870041/",
 "Thu, 01 Feb 2024 18:40:48 GMT"}
iex> Tzdata.DataBuilder.load_and_save_table()
{:ok, 451270, "2024a"}
iex> Tzdata.EtsHolder.new_release_has_been_downloaded()
:ok
iex> Tzdata.tzdata_version()
"2024a"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 04:00:00Z]

Fixed in #3898

Examples:

iex> to_datetime_in_timezone(~U[2024-03-16 01:50:45.180928Z], "Asia/Kuala_Lumpur")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Asia/Kuala_Lumpur>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ date
Sat Mar 16 09:50:49 CST 2024
iex(1)> DateTime.utc_now
~U[2024-03-16 01:50:45.180928Z]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously:

iex> Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~U[2024-03-16 01:50:45.180928Z], "Asia/Kuala_Lumpur")
{:ok,
 %{
   zone_abbr: "+08",
   utc_offset: 28800,
   std_offset: 0,
   until_wall: :max,
   from_wall: ~N[1982-01-01 00:00:00]
 }}
iex> Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~U[2024-03-16 01:50:45.180928Z], "Etc/GMT-8")
{:ok,
 %{
   zone_abbr: "+08",
   utc_offset: 28800,
   std_offset: 0,
   until_wall: :max,
   from_wall: :min
 }}
iex> Plausible.Timezones.to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Etc/GMT-8")
#DateTime<2024-03-15 17:50:45.180928-08:00 -08 Etc/GMT+8>

I'm not sure what's happening here exactly. Why is the new timezone +8 and not -8? It seems to have gone in the other direction. I don't know if that's how GMT+X timezones are supposed to work though.

This PR:

iex> Plausible.Timezones.to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Etc/GMT-8")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Etc/GMT-8>

I think this makes more sense since Tzdata shows that Etc/GMT-8 and Asia/Kuala_Lumpur timezones are equivalent for that point in time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, it seems to be wrong still... Or maybe Timex is right and everything else is wrong :)


## Examples

iex> Plausible.Site.local_start_date(%Plausible.Site{stats_start_date: nil})
nil

iex> utc_start = ~N[2022-09-28 00:00:00]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site.starts_start_date is a date

iex> Plausible.Site.local_start_date(site)
~D[2022-09-27]

"""
@spec local_start_date(Site.t()) :: Date.t() | nil
def local_start_date(site) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this function is necessary. stats_start_date is already local thanks to

Timezones.to_date_in_timezone(datetime, site.timezone)
and imports are supposed to be UTC #3895 (comment) local time https://3.basecamp.com/5308029/buckets/35871979/card_tables/cards/7202028179#__recording_7227026473 as well. Either way, shifting timezones on a date feels wrong somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess it doesn't hurt since there is no data a day before the start date.

@ruslandoga ruslandoga mentioned this pull request Mar 19, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants