GUACAMOLE-1573: Slow selection of users on scale on Postgres #711

Hepri · 2022-03-31T11:29:34Z

Replaced left join and MAX(start_date) with selecting last record in history by start_date

mike-jumper

Please format your commit message to include the JIRA issue (see established git history).

Also, from the corresponding issue in JIRA:

This solution work faster in all of the real scenarios. It can be slower only in case of selecting all of users.

This concerns me, as retrieving all users is definitely a real scenario. What happens when you do attempt to retrieve all users? If this results in N unique subqueries, rather than being turned into something efficient by the PostgreSQL server, we shouldn't do things this way.

What does PostgreSQL say for this query vs. the previous for EXPLAIN?

Hepri · 2022-04-04T11:00:04Z

Fixed commit message.

Currently requests accepts list of users, which usually means that number of passed user names is considered as small.
As for N unique subqueries, it's not true actually - postgres using Index Only Scan at final step of selection and it works as usual join.
I did some tests on real database, 963k users, 17.5m user_history records.

Here is the plan for selecting all users with left join + group by + max

Finalize GroupAggregate  (cost=14831818.70..16921374.20 rows=17301844 width=1373) (actual time=42046.398..45327.396 rows=963527 loops=1)
"  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
"  Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423 written=1026599"
  I/O Timings: read=32732.059
  ->  Gather Merge  (cost=14831818.70..16640219.23 rows=14418204 width=1373) (actual time=42046.380..44885.919 rows=963527 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423 written=1026599"
        I/O Timings: read=32732.059
        ->  Partial GroupAggregate  (cost=14830818.68..14975000.72 rows=7209102 width=1373) (actual time=39179.294..40950.738 rows=321176 loops=3)
"              Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
"              Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423 written=1026599"
              I/O Timings: read=32732.059
              ->  Sort  (cost=14830818.68..14848841.43 rows=7209102 width=1373) (actual time=38626.534..39682.340 rows=5905855 loops=3)
"                    Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
                    Sort Method: external merge  Disk: 1145272kB
                    Worker 0:  Sort Method: external merge  Disk: 639824kB
                    Worker 1:  Sort Method: external merge  Disk: 642448kB
"                    Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423 written=1026599"
                    I/O Timings: read=32732.059
                    ->  Merge Left Join  (cost=209055.40..1073389.42 rows=7209102 width=1373) (actual time=2177.709..32539.696 rows=5905855 loops=3)
                          Merge Cond: (guacamole_user.user_id = guacamole_user_history.user_id)
                          Buffers: shared hit=7827714 read=133911 dirtied=5910
                          I/O Timings: read=32732.059
                          ->  Nested Loop  (cost=0.85..245764.48 rows=398933 width=1365) (actual time=0.521..1734.830 rows=321176 loops=3)
                                Buffers: shared hit=4422640 read=19582 dirtied=1
                                I/O Timings: read=2302.272
                                ->  Parallel Index Scan using guacamole_user_pkey on guacamole_user  (cost=0.42..48764.29 rows=398933 width=1341) (actual time=0.233..608.004 rows=321176 loops=3)
                                      Buffers: shared hit=568501 read=15572 dirtied=1
                                      I/O Timings: read=1280.976
                                ->  Index Scan using guacamole_entity_pkey on guacamole_entity  (cost=0.42..0.49 rows=1 width=28) (actual time=0.003..0.003 rows=1 loops=963527)
                                      Index Cond: (entity_id = guacamole_user.entity_id)
                                      Filter: (type = 'USER'::guacamole_entity_type)
                                      Buffers: shared hit=3854139 read=4010
                                      I/O Timings: read=1021.296
                          ->  Index Only Scan using guacamole_user_history_user_id_start_date on guacamole_user_history  (cost=0.56..720887.62 rows=17301844 width=12) (actual time=0.023..28557.541 rows=17506069 loops=3)
                                Heap Fetches: 3571677
                                Buffers: shared hit=3405074 read=114329 dirtied=5909
                                I/O Timings: read=30429.786
Planning Time: 0.319 ms
Execution Time: 45976.878 ms

And here is the plan for nested query

Group  (cost=1911971.99..2510148.36 rows=957439 width=1373) (actual time=2859.696..5722.555 rows=963531 loops=1)
"  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
"  Buffers: shared hit=4973236 read=4785 dirtied=18, temp read=33234 written=33302"
  I/O Timings: read=537.094
  ->  Sort  (cost=1911971.99..1914365.59 rows=957439 width=1365) (actual time=2859.650..3068.616 rows=963531 loops=1)
"        Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
        Sort Method: external merge  Disk: 129712kB
"        Buffers: shared hit=918611 read=4472 dirtied=18, temp read=33234 written=33302"
        I/O Timings: read=532.826
        ->  Merge Join  (cost=0.99..108611.07 rows=957439 width=1365) (actual time=0.013..1899.534 rows=963531 loops=1)
              Merge Cond: (guacamole_user.entity_id = guacamole_entity.entity_id)
              Buffers: shared hit=918611 read=4472 dirtied=18
              I/O Timings: read=532.826
              ->  Index Scan using guacamole_user_single_entity on guacamole_user  (cost=0.42..53808.28 rows=957439 width=1341) (actual time=0.005..943.326 rows=963531 loops=1)
                    Buffers: shared hit=471018 read=4472 dirtied=1
                    I/O Timings: read=532.826
              ->  Index Scan using guacamole_entity_pkey on guacamole_entity  (cost=0.42..40865.32 rows=959338 width=28) (actual time=0.005..352.519 rows=963531 loops=1)
                    Filter: (type = 'USER'::guacamole_entity_type)
                    Buffers: shared hit=447593 dirtied=17
  SubPlan 1
    ->  Limit  (cost=0.56..0.62 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=963531)
          Buffers: shared hit=4054625 read=313
          I/O Timings: read=4.268
          ->  Index Only Scan Backward using guacamole_user_history_user_id_start_date on guacamole_user_history  (cost=0.56..35.83 rows=644 width=8) (actual time=0.002..0.002 rows=1 loops=963531)
                Index Cond: (user_id = guacamole_user.user_id)
                Heap Fetches: 30627
                Buffers: shared hit=4054625 read=313
                I/O Timings: read=4.268
Planning Time: 0.288 ms
Execution Time: 5796.778 ms

And there is the example on selecting 10 users:
Left join

HashAggregate  (cost=337.53..339.34 rows=181 width=1373) (actual time=0.263..0.267 rows=10 loops=1)
"  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
  Buffers: shared hit=126
  ->  Nested Loop Left Join  (cost=1.41..336.17 rows=181 width=1373) (actual time=0.033..0.210 rows=258 loops=1)
        Buffers: shared hit=126
        ->  Nested Loop  (cost=0.85..148.63 rows=10 width=1365) (actual time=0.024..0.114 rows=10 loops=1)
              Buffers: shared hit=80
              ->  Index Scan using guacamole_entity_name_scope on guacamole_entity  (cost=0.42..64.20 rows=10 width=28) (actual time=0.017..0.081 rows=10 loops=1)
"                    Index Cond: ((type = 'USER'::guacamole_entity_type) AND ((name)::text = ANY ('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
                    Buffers: shared hit=39
              ->  Index Scan using guacamole_user_single_entity on guacamole_user  (cost=0.42..8.44 rows=1 width=1341) (actual time=0.002..0.002 rows=1 loops=10)
                    Index Cond: (entity_id = guacamole_entity.entity_id)
                    Buffers: shared hit=41
        ->  Index Only Scan using guacamole_user_history_user_id_start_date on guacamole_user_history  (cost=0.56..12.31 rows=644 width=12) (actual time=0.003..0.006 rows=26 loops=10)
              Index Cond: (user_id = guacamole_user.user_id)
              Heap Fetches: 0
              Buffers: shared hit=46
Planning Time: 0.329 ms
Execution Time: 0.302 ms

Nested join

Group  (cost=148.79..155.04 rows=10 width=1373) (actual time=0.176..0.213 rows=10 loops=1)
"  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
  Buffers: shared hit=124
  ->  Sort  (cost=148.79..148.82 rows=10 width=1365) (actual time=0.163..0.165 rows=10 loops=1)
"        Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
        Sort Method: quicksort  Memory: 27kB
        Buffers: shared hit=80
        ->  Nested Loop  (cost=0.85..148.63 rows=10 width=1365) (actual time=0.034..0.137 rows=10 loops=1)
              Buffers: shared hit=80
              ->  Index Scan using guacamole_entity_name_scope on guacamole_entity  (cost=0.42..64.20 rows=10 width=28) (actual time=0.025..0.097 rows=10 loops=1)
"                    Index Cond: ((type = 'USER'::guacamole_entity_type) AND ((name)::text = ANY ('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
                    Buffers: shared hit=39
              ->  Index Scan using guacamole_user_single_entity on guacamole_user  (cost=0.42..8.44 rows=1 width=1341) (actual time=0.003..0.003 rows=1 loops=10)
                    Index Cond: (entity_id = guacamole_entity.entity_id)
                    Buffers: shared hit=41
  SubPlan 1
    ->  Limit  (cost=0.56..0.62 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=10)
          Buffers: shared hit=44
          ->  Index Only Scan Backward using guacamole_user_history_user_id_start_date on guacamole_user_history  (cost=0.56..35.83 rows=644 width=8) (actual time=0.004..0.004 rows=1 loops=10)
                Index Cond: (user_id = guacamole_user.user_id)
                Heap Fetches: 0
                Buffers: shared hit=44
Planning Time: 0.244 ms
Execution Time: 0.253 ms

As you can see cost is bigger in case of left join in both of cases

necouchman · 2022-08-06T12:47:38Z

While I generally lean toward using JOIN instead of nested queries in SQL, and I'm skeptical of situations like this, I think this actually makes sense - it is pre-sorting and eliminating potentially a bunch of extra data from the guacamole_user_history table that then doesn't have to be JOINed to the data pulled out of the guacamole_user table.

The only situation where I think this may end up being sub-optimal is if, for some reason, the data in the guacamole_user_history table is more fragmented or unsorted, and the ORDER BY start_date DESC and then LIMIT 1 cost ends up high, but even in those situations it still may be lower cost than having to pull out all of the guacamole_user_history records and JOINing them.

Hepri changed the base branch from master to staging/1.5.0 March 31, 2022 11:30

mike-jumper changed the base branch from staging/1.5.0 to master March 31, 2022 21:04

mike-jumper requested changes Mar 31, 2022

View reviewed changes

Hepri force-pushed the jira/1573 branch from 421eee8 to 88a96be Compare April 4, 2022 10:33

Hepri changed the title ~~GUACAMOLE-1488: Slow selection of users on scale on Postgres~~ GUACAMOLE-1573: Slow selection of users on scale on Postgres Apr 4, 2022

GUACAMOLE-1573: Slow selection of users on scale on Postgres

800e111

Hepri force-pushed the jira/1573 branch from 88a96be to 800e111 Compare April 4, 2022 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUACAMOLE-1573: Slow selection of users on scale on Postgres #711

GUACAMOLE-1573: Slow selection of users on scale on Postgres #711

Hepri commented Mar 31, 2022

mike-jumper left a comment

Hepri commented Apr 4, 2022 •

edited

necouchman commented Aug 6, 2022

GUACAMOLE-1573: Slow selection of users on scale on Postgres #711

Are you sure you want to change the base?

GUACAMOLE-1573: Slow selection of users on scale on Postgres #711

Conversation

Hepri commented Mar 31, 2022

mike-jumper left a comment

Choose a reason for hiding this comment

Hepri commented Apr 4, 2022 • edited

necouchman commented Aug 6, 2022

Hepri commented Apr 4, 2022 •

edited