Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dependency tracking of hierarchical caggs #6829

Open
leppaott opened this issue Apr 15, 2024 · 5 comments
Open

[Bug]: Dependency tracking of hierarchical caggs #6829

leppaott opened this issue Apr 15, 2024 · 5 comments

Comments

@leppaott
Copy link

What type of bug is this?

Locking issue

What subsystems and features are affected?

Continuous aggregate

What happened?

Apparently creating, deleting, creating materialized views with retention + compression policy deadlocks:

    CREATE MATERIALIZED VIEW "X"....
     
    CREATE INDEX "X_some_key" ....

    SELECT add_continuous_aggregate_policy('"X"', ....

    SELECT add_retention_policy('"X"', INTERVAL '2 weeks');

    ALTER MATERIALIZED VIEW "X"
    SET (timescaledb.compress = true, timescaledb.compress_segmentby = 'uuid');
    SELECT add_compression_policy('"X"', INTERVAL '4 days');

    DROP MATERIALIZED VIEW IF EXISTS "X" CASCADE;
DETAIL:  Process 313 waits for AccessExclusiveLock on advisory lock [23194,1046,0,29749]; blocked by process 360.
        Process 360 waits for ShareLock on relation 17402 of database 23194; blocked by process 313.

i.e. we're trying to migrate a cagg to a new version but while testing we still have the old cagg migration in place.

TimescaleDB version affected

2.13

PostgreSQL version used

16

What operating system did you use?

Docker

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

No response

How can we reproduce the bug?

----
@jnidzwetzki
Copy link
Contributor

Hello @leppaott,

Thanks for reaching out. I tried to reproduce the behavior in my local environment using TimescaleDB 2.13.0. However, it seems I missed something since I have not been able to reproduce the locking issue so far. Could you check the following steps if something is missing?

CREATE TABLE temperature (
  time timestamptz NOT NULL,
  sensor int,
  value float
);

SELECT create_hypertable('temperature', 'time');

INSERT INTO temperature
  SELECT time, ceil(random() * 100)::int, ceil(random() * 100)::int
    FROM generate_series('2000-01-01 0:00:00+0'::timestamptz,
                         '2000-01-01 23:59:59+0','1m') time;

CREATE MATERIALIZED VIEW cagg_1
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('5 minutes', time), sensor, avg(value) AS avg
    FROM temperature
    GROUP BY 1,2 ORDER BY 1,2;

CREATE INDEX "cagg_1_value" ON cagg_1(sensor);

SELECT add_continuous_aggregate_policy('cagg_1', 
  start_offset => INTERVAL '1 day',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 minute');

SELECT add_retention_policy('cagg_1', INTERVAL '2 weeks');

ALTER MATERIALIZED VIEW cagg_1
SET (timescaledb.compress = true, timescaledb.compress_segmentby = 'sensor');

SELECT add_compression_policy('cagg_1', INTERVAL '1 week');

DROP MATERIALIZED VIEW IF EXISTS cagg_1 CASCADE;

@leppaott
Copy link
Author

Quick modification on your post, forgot that it was hierarchical caggs:

CREATE TABLE temperature (
  time timestamptz NOT NULL,
  sensor int,
  value float
);

SELECT create_hypertable('temperature', 'time');

INSERT INTO temperature
  SELECT time, ceil(random() * 100)::int, ceil(random() * 100)::int
    FROM generate_series('2000-01-01 0:00:00+0'::timestamptz,
                         '2000-01-01 23:59:59+0','1m') time;

CREATE MATERIALIZED VIEW cagg_1
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('5 minutes', time) as time, sensor, avg(value) AS avg
    FROM temperature
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW cagg_2
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('15 minutes', time) as time, sensor, avg(avg) AS avg
    FROM cagg_1
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW cagg_3
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('30 minutes', time) as time, sensor, avg(avg) AS avg
    FROM cagg_2
    GROUP BY 1,2
    WITH NO DATA;

CREATE INDEX "cagg_1_value" ON cagg_1(sensor);

SELECT add_continuous_aggregate_policy('cagg_1', 
  start_offset => INTERVAL '1 day',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 minute');

SELECT add_continuous_aggregate_policy('cagg_2', 
  start_offset => INTERVAL '1 day',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '15 minute');

SELECT add_continuous_aggregate_policy('cagg_3', 
  start_offset => INTERVAL '1 day',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '30 minute');

SELECT add_retention_policy('cagg_1', INTERVAL '2 weeks');
SELECT add_retention_policy('cagg_2', INTERVAL '2 weeks');
SELECT add_retention_policy('cagg_3', INTERVAL '2 weeks');

ALTER MATERIALIZED VIEW cagg_1
SET (timescaledb.compress = true, timescaledb.compress_segmentby = 'sensor');

ALTER MATERIALIZED VIEW cagg_2
SET (timescaledb.compress = true, timescaledb.compress_segmentby = 'sensor');

ALTER MATERIALIZED VIEW cagg_3
SET (timescaledb.compress = true, timescaledb.compress_segmentby = 'sensor');

SELECT add_compression_policy('cagg_1', INTERVAL '1 week');
SELECT add_compression_policy('cagg_2', INTERVAL '1 week');
SELECT add_compression_policy('cagg_3', INTERVAL '1 week');

BEGIN;
DROP MATERIALIZED VIEW IF EXISTS cagg_1 CASCADE;
DROP MATERIALIZED VIEW IF EXISTS cagg_2 CASCADE;
DROP MATERIALIZED VIEW IF EXISTS cagg_3 CASCADE;
COMMIT;

But now I see this BEGIN-COMMIT block seems to hiding an issue with this.

If I just run:

DROP MATERIALIZED VIEW IF EXISTS cagg_1 CASCADE;
DROP MATERIALIZED VIEW IF EXISTS cagg_2 CASCADE;
DROP MATERIALIZED VIEW IF EXISTS cagg_3 CASCADE;

I get:

ERROR: cannot drop view cagg_2 because other objects depend on it
DETAIL: view _timescaledb_internal._partial_view_16 depends on view cagg_2
view _timescaledb_internal._direct_view_16 depends on view cagg_2
HINT: Use DROP ... CASCADE to drop the dependent objects too.

Which is quite weird as I tried to run drops one by one with actual caggs and seemed the first one would basically CASCADE delete all views and second ones would be unneeded basically utilizing IF EXISTS clause.

This doesn't seem to be the same random issue we faced originally but we did drop these hierachical caggs from bottom to top in same order and we didn't have this "depends on it" error...

@jnidzwetzki
Copy link
Contributor

Hi @leppaott,

Thanks for getting back. The created HCaggs have the following dependencies cagg_3 -> cagg_2 -> cagg_1. If you drop them within a transaction, PostgreSQL keeps track of the dependencies, knows that all dependent objects should be removed, and drops them in the right order. If you drop them individually, you have to keep track of these dependencies. So, the following order works:

tsdb2130=# DROP MATERIALIZED VIEW IF EXISTS cagg_3 CASCADE;
DROP MATERIALIZED VIEW
tsdb2130=# DROP MATERIALIZED VIEW IF EXISTS cagg_2 CASCADE;
DROP MATERIALIZED VIEW
tsdb2130=# DROP MATERIALIZED VIEW IF EXISTS cagg_1 CASCADE;
DROP MATERIALIZED VIEW
tsdb2130=#

If you drop cagg_1 before cagg_2, PotsgreSQL seems not to follow the CASCADE correctly:

DROP TABLE temperature;

CREATE TABLE temperature (
  time timestamptz NOT NULL,
  sensor int,
  value float
);

SELECT create_hypertable('temperature', 'time');

INSERT INTO temperature
  SELECT time, ceil(random() * 100)::int, ceil(random() * 100)::int
    FROM generate_series('2000-01-01 0:00:00+0'::timestamptz,
                         '2000-01-01 23:59:59+0','1m') time;

CREATE MATERIALIZED VIEW cagg_1
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('5 minutes', time) as time, sensor, avg(value) AS avg
    FROM temperature
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW cagg_2
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('15 minutes', time) as time, sensor, avg(avg) AS avg
    FROM cagg_1
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW cagg_3
  WITH  (timescaledb.continuous) AS
  SELECT time_bucket('30 minutes', time) as time, sensor, avg(avg) AS avg
    FROM cagg_2
    GROUP BY 1,2
    WITH NO DATA;

DROP MATERIALIZED VIEW IF EXISTS cagg_1 CASCADE;

NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to view _timescaledb_internal._partial_view_23
drop cascades to view _timescaledb_internal._direct_view_23
ERROR:  cannot drop view cagg_2 because other objects depend on it
DETAIL:  view _timescaledb_internal._partial_view_24 depends on view cagg_2
view _timescaledb_internal._direct_view_24 depends on view cagg_2
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

However, I compared this behavior with the PostgreSQL behavior when dependent materialized views are dropped. In this case, the drop of view_1 removes also view_2 and view_3.

CREATE MATERIALIZED VIEW view_1 AS
  SELECT time_bucket('5 minutes', time) as time, sensor, avg(value) AS avg
    FROM temperature
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW view_2 AS
  SELECT time_bucket('15 minutes', time) as time, sensor, avg(avg) AS avg
    FROM view_1
    GROUP BY 1,2
    WITH NO DATA;

CREATE MATERIALIZED VIEW view_3 AS
  SELECT time_bucket('30 minutes', time) as time, sensor, avg(avg) AS avg
    FROM view_2
    GROUP BY 1,2
    WITH NO DATA;

tsdb2130=# DROP MATERIALIZED VIEW view_1 CASCADE;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to materialized view view_2
drop cascades to materialized view view_3
DROP MATERIALIZED VIEW

It seems the dependency tracking of HCaggs is different from dependency tracking of materialized views. So, I will move this issue to the backlog.

@jnidzwetzki jnidzwetzki changed the title [Bug]: Deadlock detected when migrating continuous aggregates [Bug]: Dependency tracking of hierarchical caggs Apr 15, 2024
@leppaott
Copy link
Author

leppaott commented Apr 16, 2024

Yes that seemed to be a bug within a transaction as well.

But really now I managed to get the original issue back what I was mainly having issues with:

2024-04-16 05:46:32.263 UTC [252] ERROR:  deadlock detected
2024-04-16 05:46:32.263 UTC [252] DETAIL:  Process 252 waits for AccessExclusiveLock on advisory lock [23194,1014,0,29749]; blocked by process 270.
        Process 270 waits for ShareLock on relation 17402 of database 23194; blocked by process 252.
        Process 252: -- our migration....
        BEGIN;
        DROP MATERIALIZED VIEW IF EXISTS "X1" CASCADE;
        DROP MATERIALIZED VIEW IF EXISTS "X2" CASCADE;
        DROP MATERIALIZED VIEW IF EXISTS "X3" CASCADE;
        DROP MATERIALIZED VIEW IF EXISTS "Y1" CASCADE;
        DROP MATERIALIZED VIEW IF EXISTS "Y2" CASCADE;
        COMMIT;

        **Process 270: CALL _timescaledb_functions.policy_compression()**

I try to get this reproduced with that example too, but can this happen when cascade delete is trying to remove compression jobs but the compression job is running at the same time? I was thinking in this whole cagg removal I shouldn't need remove_compression_policy or remove_retention_policy as they seem to be deleted this way too.

@jnidzwetzki
Copy link
Contributor

Hi @leppaott,

Thanks for getting back and the reproduction case. To ensure that both issues are addressed properly, could you please create a new issue for the locking problem? By doing so, we can use this issue to track the dependency problem exclusively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants