Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counter_agg encounters counter reset when 2 hours merge because of Daylight Saving time change #769

Open
emizero opened this issue Aug 31, 2023 · 0 comments
Labels
feature-request And area of analysis that could be made easier

Comments

@emizero
Copy link

emizero commented Aug 31, 2023

Is your feature request related to a problem? Please describe.
With daylight savings in some time zone, time will shift in the fall as we fall back 1 hour (25 hour day), and in the spring as we spring forward 1 hour (23 hour day). Given a 25 hour day, 2 hours must be merged. On the US East, 1-2am is good and then 2-3am becomes 1-2am. Note that other time zones may have a different date and different time of the early morning.
Unfortunately, this shifting causes problems to counter_agg. Records from the merged 2 hours are no longer in the order in which they were received. This lead to counter resets throughout the 1-2am hour, and subsequently incorrect results for any functions that relies on the counter_agg results for any bucket that includes data in the 1-2am hour, and where applicable buckets around the latter.

Describe the solution you'd like
Thus the request to have counter_agg maintain the order in which data was received even when 2 hours are merged; or any reasonable approach that does not lead to counter resets.

Describe alternatives you've considered

  • Counter_agg on UTC time values, and ultimately in the last step shift to desired timezone. Works for hour buckets or smaller. Not application to week, month or year buckets.
  • Filter out an hour's worth of data (as in one of the hours that would be merged). Unknown how this impacts the result but there is no reset.

Additional context
Here is an example along with explored alternatives:

  • we use interpolated_delta and delta in combination with counter_agg.
  • PostgreSQL 15.4
  • TimescaleDB 1.17.0
create table test_table (
	device_id bigint not null,
	reading bigint not null,
	time_ts timestamp with time zone not null
);

create index test_table_time_ts_idx on test_table(time_ts desc);

--> single node hypertable
select create_hypertable('test_table', 'time_ts', chunk_time_interval => INTERVAL '1 day');

--> data

--delete from test_table; 
INSERT INTO test_table (device_id, reading, time_ts)	  
	SELECT 
                cast(random() * 4 as int) + 1, 
                generate_series(1,2678401),
                generate_series(
                    '2022-10-15 00:00:00',
	    	    '2022-11-15 00:00:00',
                    INTERVAL '1 second'
                ); 

[1] For hour buckets or smaller, we can run counter_agg on UTC data and then shift to the desired time zone later.

(1a) Interpolated_delta

-- Shift to desired time zone right away => bad result for Nov 6th at Midnight and 1am

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    interpolated_delta(
            counter_agg_results, ts, '1 hour'::interval,
            lag(counter_agg_results) OVER device_id_by_time,
            lead(counter_agg_results) OVER device_id_by_time
        )
FROM t
WINDOW device_id_by_time AS ( PARTITION BY device_id order by ts);
 device_id |         ts          | interpolated_delta 
-----------+---------------------+--------------------
...
         1 | 2022-11-05 22:00:00 |               3600
         1 | 2022-11-05 23:00:00 |               3600
         1 | 2022-11-06 00:00:00 |               4050
         1 | 2022-11-06 01:00:00 |          484920016
         1 | 2022-11-06 02:00:00 |               3600
         1 | 2022-11-06 03:00:00 |               3600
         1 | 2022-11-06 04:00:00 |               3600
         1 | 2022-11-06 05:00:00 |               3600
...

-- Delayed shift to desired time zone => Yields 2 separate records for the combined hours on Nov 6 at 1am

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    interpolated_delta(
            counter_agg_results, ts, '1 hour'::interval,
            lag(counter_agg_results) OVER device_id_by_time,
            lead(counter_agg_results) OVER device_id_by_time
        )
FROM t
WINDOW device_id_by_time AS ( PARTITION BY device_id order by ts);
 device_id |      timezone       | interpolated_delta 
-----------+---------------------+--------------------
...
         1 | 2022-11-05 23:00:00 |               3600
         1 | 2022-11-06 00:00:00 |               3600
         1 | 2022-11-06 01:00:00 |               3600
         1 | 2022-11-06 01:00:00 |               3600
         1 | 2022-11-06 02:00:00 |               3600
         1 | 2022-11-06 03:00:00 |               3600
         1 | 2022-11-06 04:00:00 |               3600
...

(1b) delta

-- Shift to desired time zone right away => Nov 6th at 1am has the wrong result

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |   delta   
-----------+---------------------+-----------
...
         1 | 2022-11-05 22:00:00 |      3574
         1 | 2022-11-05 23:00:00 |      3576
         1 | 2022-11-06 00:00:00 |      3596
         1 | 2022-11-06 01:00:00 | 484916858
         1 | 2022-11-06 02:00:00 |      3588
         1 | 2022-11-06 03:00:00 |      3595
         1 | 2022-11-06 04:00:00 |      3594
         1 | 2022-11-06 05:00:00 |      3574
...

-- Delayed shift to desired time zone => Yields 2 separate records for the combined hours.

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    delta(counter_agg_results)
FROM t;
 device_id |      timezone       | delta 
-----------+---------------------+-------
...
         1 | 2022-11-05 22:00:00 |  3574
         1 | 2022-11-05 23:00:00 |  3576
         1 | 2022-11-06 00:00:00 |  3596
         1 | 2022-11-06 01:00:00 |  3576
         1 | 2022-11-06 01:00:00 |  3592
         1 | 2022-11-06 02:00:00 |  3588
         1 | 2022-11-06 03:00:00 |  3595
         1 | 2022-11-06 04:00:00 |  3594
         1 | 2022-11-06 05:00:00 |  3574
...

[2] For 'week', 'month' or 'year', it would not make sense to bucket at UTC time zone and then shift to the desired time zone. The example in this case will stick to delta and a month bucket

-- Shift to desired time zone right away => Bad result for November

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |   delta   
-----------+---------------------+-----------
         1 | 2022-10-01 00:00:00 |   1483152
         1 | 2022-11-01 00:00:00 | 484182395
(2 rows)

-- Delayed shift to desired time zone => Who would want this?! Buckets remain aligned to UTC and not US Eastern!!!

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    delta(counter_agg_results)
FROM t;
 device_id |      timezone       |  delta  
-----------+---------------------+---------
         1 | 2022-09-30 20:00:00 | 1468776
         1 | 2022-10-31 20:00:00 | 1209589
(2 rows)

-- Shift to desired time zone right away,but eliminate data for one of the duplicate hours => What are repercussions for the omitted data?! And we expect degraded performance because of the added filter.

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1 and 
          time_ts at time zone 'US/Eastern' <> (time_ts + interval '1 hour') at time zone 'US/Eastern'
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |  delta  
-----------+---------------------+---------
         1 | 2022-10-01 00:00:00 | 1483152
         1 | 2022-11-01 00:00:00 | 1195184
(2 rows)
@emizero emizero added the feature-request And area of analysis that could be made easier label Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request And area of analysis that could be made easier
Projects
None yet
Development

No branches or pull requests

1 participant