Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: time_bucket_gapfill using timezone returns incorrect data for daylight saving #6788

Closed
DiAifU opened this issue Mar 26, 2024 · 16 comments · Fixed by #6908
Closed

[Bug]: time_bucket_gapfill using timezone returns incorrect data for daylight saving #6788

DiAifU opened this issue Mar 26, 2024 · 16 comments · Fixed by #6908

Comments

@DiAifU
Copy link

DiAifU commented Mar 26, 2024

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Gapfill

What happened?

Hi everyone,

First of all thanks for the amazing work on TimescaleDb. I have an issue with the time_bucket_gapfill function using the timezone parameter when testing against the daylight saving dates in the Europe/Paris timezone.

The following query :

Parameters=[p0='PT1H' (DbType = Object), p1='Europe/Paris', p2='2024-02-29T23:00:00.0000000Z' (DbType = DateTime), p3='2024-03-31T22:00:00.0000000Z' (DbType = DateTime), p4='d21d06dd-fcf9-4c4b-9ed7-e64c3385af30', p5='2024-02-29T23:00:00.0000000Z' (DbType = DateTime), p6='2024-03-31T22:00:00.0000000Z' (DbType = DateTime)], CommandType='Text', CommandTimeout='30']
      SELECT t."Bucket" AS "Timestamp", t."Value"
      FROM (
          SELECT time_bucket_gapfill(@p0, "Timestamp", @p1, @p2, @p3) AS "Bucket",
          sum("Value") as "Value"
          FROM "TimeSeriesPoints"
          WHERE "TimeSeriesId" = @p4 AND "Timestamp" >= @p5 AND "Timestamp" < @p6
          GROUP BY "Bucket"
          ORDER BY "Bucket"
      ) AS t

returns the following results:

...
    {
      "Timestamp": "2024-03-31T00:00:00Z",
      "Value": 10
    },
    {
      "Timestamp": "2024-03-31T01:00:00Z",
      "Value": 10
    },
    {
      "Timestamp": "2024-03-31T01:00:00Z",
      "Value": null
    },
    {
      "Timestamp": "2024-03-31T02:00:00Z",
      "Value": 10
    },
    {
      "Timestamp": "2024-03-31T03:00:00Z",
      "Value": 10
    }
...

(I skipped the results before and after the 31st)

when the database contains exactly 10 for every 1h in UTC :
image

I would expect not to have this extra bucket, let me know if I'm doing something wrong otherwise. I also did not change the default timezone of the db, so it is in UTC. I already read the closed issues on the same subject and they all seem to mention that these bugs should be fully resolved in the version I'm using but that doesn't seem to be the case for me.

Thanks in advance for your help, let me know if you need more info.

EDIT: Did a few more tests, especially testing with setting the timezone to Europe/Paris first and here is the result if that can help:
image

Nicolas

TimescaleDB version affected

2.14.2

PostgreSQL version used

16.2

What operating system did you use?

Docker image pg16 (latest Ubunutu image)

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

- Pull docker image
- Create the hypertable
- Add data as in the screenshot
- Execute the given query
- See the results
@DiAifU DiAifU added the bug label Mar 26, 2024
@konskov
Copy link
Contributor

konskov commented Mar 27, 2024

hello @DiAifU , thank you for reaching out. It would help a lot if you could share reproduction steps that can be executed via psql, if you have been able to reproduce this via psql.
If not, could you share your table definition? And more specifically, what Postgres type is your "Timestamp" field?

@DiAifU
Copy link
Author

DiAifU commented Apr 1, 2024

Hi @konskov ,

Here is what I believe to be the minimum repro in pure SQL statements.

CREATE TABLE "TimeSeries" (
  "Id" uuid NOT NULL,
  "Name" character varying(250) NOT NULL,
  CONSTRAINT "PK_TimeSeries" PRIMARY KEY ("Id")
);


CREATE TABLE "TimeSeriesPoints" (
    "Timestamp" timestamp with time zone NOT NULL,
    "TimeSeriesId" uuid NOT NULL,
    "Value" double precision NOT NULL,
    CONSTRAINT "PK_TimeSeriesPoints" PRIMARY KEY ("TimeSeriesId", "Timestamp"),
    CONSTRAINT "FK_TimeSeriesPoints_TimeSeries_TimeSeriesId" FOREIGN KEY ("TimeSeriesId") REFERENCES "TimeSeries" ("Id") ON DELETE CASCADE
);

SELECT create_hypertable('"TimeSeriesPoints"', by_range('Timestamp'));

INSERT INTO "TimeSeries" ("Id", "Name")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', 'New TS');


INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-30T23:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T00:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T01:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T02:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T03:00:00Z', 10);


SELECT time_bucket_gapfill('PT1H', "Timestamp", 'Europe/Paris') AS "Bucket",
    sum("Value") as "Value"
FROM "TimeSeriesPoints"
WHERE "TimeSeriesId" = '97622f56-fded-435f-a914-ebf0d0b98c82' AND "Timestamp" >= '2024-03-30T23:00:00.0000000Z' AND "Timestamp" < '2024-03-31T04:00:00.0000000Z'
GROUP BY "Bucket"
ORDER BY "Bucket";

The output from that last SELECT statement contains this, which seems incorrect to me.

image

Thanks for your help.

Nicoas

@kiddhombre
Copy link

Hi everybody, I noticed the same behaviour. But later I found the same bug when I used generate_series together with timezone conversion. Maybe it helps you find the bug.

WRONG DST:
firts_wrong
second_wrong

OK DST:
first_ok
second_ok

@DiAifU
Copy link
Author

DiAifU commented Apr 19, 2024

Hi everyone,

I'm still having the issue. Do you think by chance anyone could look at it ? It would be a blocker for me if gapfilling function can't be reliable during time changes, so I'll have to find other solutions in these case, unless I'm missing something which would explain the behavior I'm facing.

Thank you for your help !

Nicolas

@intermittentnrg
Copy link

Is bug #6344 and PR #6507 relevant?

@intermittentnrg
Copy link

I also reported an issue, seems bad data happens after 2024-03-31 when DST happened in Europe.

#6844

@DiAifU
Copy link
Author

DiAifU commented Apr 23, 2024

@intermittentnrg I've seen that you've closed your issue but I don't understand how it got resolved for you. I'm still having it on my side whether I try on PG 16 or 14 (like you) and with and updated extension (2.14.2).

Any idea wht I could be doing wrong ?

Nicolas

@intermittentnrg
Copy link

My issue was resolved by upgrading to 2.14.

I originally ran 2.12 and upgraded one minor version at a time. I forgot to ALTER EXTENSION timescaledb UPDATE after the 2nd upgrade to 2.14.

Perhaps it's a different issue.

@DiAifU
Copy link
Author

DiAifU commented Apr 23, 2024

It seems to be different yes.

After another test on docker image pg15.5-ts2.12.2-all, using PostgreSQL 15 and Timescale 2.12.2, things are working as expected and I don't get duplicated timestamp using the above repro example. I believe the provided fix in 2.14 may have broken something else then.

Anyone can help ?

Thanks in advance.

Nicolas

@intermittentnrg
Copy link

Ok I can also reproduce this.

I got a different issue where a row is doubled and no null value.

When attempting to reproduce it for dbfiddle I instead got the same error as you. https://dbfiddle.uk/nEUn9PF-

It does not reproduce in dbfiddle tho as it's using timescale 2.11.

@intermittentnrg
Copy link

Ah the reason for the doubled value is because I use interpolate() which fills in the null value.

Here is dbfiddle with interpolate, which again doesn't occur in dbfiddle which uses 2.11. https://dbfiddle.uk/zd3LKk4f

@intermittentnrg
Copy link

dbfiddle now has timescale 2.14.
Here's new links which reproduce correctly:

Without interpolate: https://dbfiddle.uk/V9LUICgU
2024-03-31 01:00:00+00 is returned twice, once with null

With interpolate: https://dbfiddle.uk/n3xdDowM
2024-03-31 01:00:00+00 is returned twice with same value.

@svenklemm
Copy link
Member

Hmm there is definitely something off. I boiled down the example a bit further:

sven@dev[82029]=# WITH data (time) AS (VALUES
('2024-01-31 02:00:00+00'::timestamptz),
('2024-01-31 02:00:00+00'::timestamptz)
)
SELECT
  time_bucket_gapfill('1h'::interval, time, 'Europe/Stockholm')
FROM data
WHERE
  time BETWEEN '2024-03-31T0:00Z' AND '2024-03-31T02:00Z'
GROUP BY 1;
  time_bucket_gapfill   
------------------------
 2024-03-31 00:00:00+00
 2024-03-31 01:00:00+00
 2024-03-31 01:00:00+00
 2024-03-31 02:00:00+00
(4 rows)

@svenklemm
Copy link
Member

generate_series does produce the expected result:

sven@dev[82029]=# SELECT generate_series('2024-03-31T0:00Z','2024-03-31T02:00Z','1h'::interval) at time zone 'Europe/Stockholm';
      timezone       
---------------------
 2024-03-31 01:00:00
 2024-03-31 03:00:00
 2024-03-31 04:00:00
(3 rows)

@intermittentnrg
Copy link

intermittentnrg commented May 10, 2024

Slightly shorter

WITH data (time) AS (VALUES('2024-01-31 02:00:00+00'::timestamptz))
SELECT
  time_bucket_gapfill('1h', time, 'Europe/Stockholm', '2024-03-31T0:00Z', '2024-03-31T02:00Z')
FROM data
GROUP BY 1;
  time_bucket_gapfill   
------------------------
 2024-01-31 02:00:00+00
 2024-03-31 00:00:00+00
 2024-03-31 01:00:00+00
 2024-03-31 01:00:00+00
(4 rows)

or with zero rows:

SELECT       
  time_bucket_gapfill('1h', time, 'Europe/Stockholm', '2024-03-31T0:00Z', '2024-03-31T02:00Z')
FROM (SELECT NULL::timestamptz AS time LIMIT 0) s
GROUP BY 1;

@DiAifU
Copy link
Author

DiAifU commented May 14, 2024

@svenklemm Are docker images automatically updated after the merge ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants