Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constify cagg_watermark not working using window functions when querying a CAgg #6722

Closed
rovo89 opened this issue Mar 1, 2024 · 6 comments · Fixed by #6726
Closed

Constify cagg_watermark not working using window functions when querying a CAgg #6722

rovo89 opened this issue Mar 1, 2024 · 6 comments · Fixed by #6726

Comments

@rovo89
Copy link

rovo89 commented Mar 1, 2024

What type of enhancement is this?

Performance

What subsystems and features will be improved?

Continuous aggregate

What does the enhancement do?

The changes done in #6325 have greatly improved performance for various of my use-cases. However, at least one of them seems to have been missed as of 2.14.2. I hope it can be added in the same way.

Suppose there's a continuous real-time aggregate test. This trivial query shows that the watermark is replaced by a constant:

EXPLAIN SELECT bucket, avg FROM test

The output contains:

Filter: (bucket < '2024-03-02 00:00:00+00'::timestamp with time zone)

But if a window function is used:

EXPLAIN SELECT bucket, lead(avg) OVER (ORDER BY bucket) FROM strom_test;

The output says that the function is called:

Filter: (bucket < COALESCE(_timescaledb_functions.to_timestamp(_timescaledb_functions.cagg_watermark(27)), '-infinity'::timestamp with time zone))

Implementation challenges

No response

@rovo89 rovo89 added the enhancement An enhancement to an existing feature for functionality label Mar 1, 2024
@rovo89
Copy link
Author

rovo89 commented Mar 2, 2024

Just a thought, maybe it's too naiive: Instead of manipulating the query by replacing the function call with a constant value, would it be feasible introduce another function cagg_watermark_const marked as IMMUTABLE?

This would keep the possibility to query the latest value via cagg_watermark. The view sources would be changed to use the new function instead, ensuring that no matter how the view is used, it would always use the constant value, all heuristics could be removed.

Query results should be always correct because the watermark is just an optimization - if an older one is used, more real-time values have to be calculated. The only risk would be that some values have already been materialized but wouldn't be used. The impact of this depends on how long the value is cached for IMMUTABLE. No idea about that.

@rovo89
Copy link
Author

rovo89 commented Mar 2, 2024

CREATE OR REPLACE FUNCTION cagg_watermark_const(int4)
  RETURNS int8
  IMMUTABLE
  AS $$ SELECT _timescaledb_functions.cagg_watermark($1) $$
  LANGUAGE SQL;

Then take the generated view definition for a real-time continuous aggregate and create a new view from it which uses the immutable function. Queries using that view with window functions are much faster. The question is just for how long and for which scope the result of the function will be cached. If it's within the same session or even only for the same prepared statement, then the potential performance hit of using a too-low watermark should be neglectable.

@fabriziomello
Copy link
Contributor

Looks like this is a bug. We're preventing constify the cagg_watemark function call if there are Window Functions in the query tree: https://github.com/timescale/timescaledb/blob/main/tsl/src/continuous_aggs/planner.c#L196

The assumption for doing that is because nowadays we don't allow window functions in the cagg definition but in your example is when querying a cagg using window functions that should be fine to constify it.

Here a reproducible test case:

CREATE TABLE metrics(time timestamptz NOT NULL, device_id int, value float);
SELECT create_hypertable('metrics', 'time');

-- 10 chunks
INSERT INTO metrics
SELECT generate_series('2024-02-01 00:00:00-03'::timestamptz, '2024-02-01 00:00:00-03'::timestamptz + interval '9 months', interval '1 month'), 1, 1;

CREATE MATERIALIZED VIEW metrics_by_day WITH (timescaledb.continuous, timescaledb.materialized_only=false) AS
SELECT time_bucket('1 day', time) AS bucket, device_id, count(*), avg(value) FROM metrics GROUP BY 1, 2;

EXPLAIN SELECT avg(value) FROM metrics_by_day;

-- Constify is not working here
EXPLAIN SELECT bucket, lead(avg) OVER (ORDER BY bucket) FROM metrics_by_day;

@fabriziomello fabriziomello added bug and removed enhancement An enhancement to an existing feature for functionality labels Mar 3, 2024
@fabriziomello fabriziomello changed the title [Enhancement]: Constify cagg_watermark when using window functions Constify cagg_watermark not working using window functions when querying a CAgg Mar 3, 2024
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Mar 3, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes timescale#6722
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Mar 3, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes timescale#6722
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Mar 3, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes timescale#6722
@rovo89
Copy link
Author

rovo89 commented Mar 4, 2024

Thanks for the fast fix! Looking forward to using it in one of the next releases.

nowadays we don't allow window functions in the cagg definition

Do you know if there are any plans to support that in the future? My use-case is that I store a counter values (energy meter reading) and would like to calculate the usage per hour, defined as value(t) - value(t - 1h). I read this blog post and thought it might be a comparable case. No idea how window functions are actually implemented, but I assume they also carry some state from previous rows. If it was feasible to store the latest state with the continuous aggregate, it could be used as a starting point for materializing newly inserted rows.

fabriziomello added a commit that referenced this issue Mar 4, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes #6722
github-actions bot pushed a commit that referenced this issue Mar 4, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes #6722

(cherry picked from commit a1cd7c4)
@fabriziomello
Copy link
Contributor

Thanks for the fast fix! Looking forward to using it in one of the next releases.

You're welcome!

Do you know if there are any plans to support that in the future?

Supporting window functions are in our plans but I don't have an ETA yet.

timescale-automation pushed a commit that referenced this issue Mar 4, 2024
When querying a realtime Continuous Aggregate using window functions the
new planner optimization to constify the `cagg_watermark` function call
is not working because we're checking for window function usage on the
query tree but actually this is a current limitation when creating a new
Continuous Aggregate, but users can use window functions when querying
it.

Fixed it by removing the check for `query->hasWindowFuncs` to prevent
the process of constification of the `cagg_watermak` function call.

Fixes #6722

(cherry picked from commit a1cd7c4)
@martinhale
Copy link

martinhale commented Apr 12, 2024

Hey @fabriziomello!

I'm still seeing issues where this is really fast:

EXPLAIN ANALYSE SELECT
	rollup(heartbeat_agg)
FROM vp_heartbeats_daily_agg_2m
WHERE b >= '2023-05-01 00:00:00'
AND b < '2024-04-07 23:59:59'
AND vision_program_id = 6149;

but this is really slow:

EXPLAIN ANALYSE SELECT
	live_ranges(rollup(heartbeat_agg))
FROM vp_heartbeats_daily_agg_2m
WHERE b >= '2023-05-01 00:00:00'
AND b < '2024-04-07 23:59:59'
AND vision_program_id = 6149;

i.e. the addition of the live_ranges call leads to the bloated planning time.

I'm wondering if the above fix would address this or if I should open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants