Likely inefficient query #189

tarkhil · 2023-05-21T13:03:07Z

Hello

I'm running a dbmail site with a moderate load, and have some large mailboxes. I've found that some requests (roundcube-originated) are inefficient

SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, 
  TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, seq, m.message_idnr, 
  status, m.physmessage_id FROM dbmail_messages m 
  LEFT JOIN dbmail_physmessage p ON p.id = m.physmessage_id 
  WHERE m.mailbox_idnr = 15895 /*SEQ New*/ AND m.status < 2  ORDER BY m.seq DESC;

returns 20k+ messages which is apparently slow anyway. Can it be an ill-formed IMAP request forcing server to return all messages or is a limit clause omitted by dbmail?

The text was updated successfully, but these errors were encountered:

alan-hicks · 2023-05-21T16:35:27Z

Roundcube has good caching and 10k messages are returned in under a second.
You might want to profile the query to see why it's slow. The database structure is optimised for most queries so it scales well.

tarkhil · 2023-05-21T16:39:31Z

At least this one

SELECT max(message_idnr)+1 FROM dbmail_messages WHERE mailbox_idnr=71;

did not perform well until I've added index (mailbox_idnr, message_udnr), speeding it up about 100 times

Under a load, requests like

SELECT l.part_key,l.part_depth,l.part_order,l.is_header,TO_CHAR(ph.internal_date, 'YYYY-MM-DD HH24:MI:SS' ),
ENCODE(data::bytea,'escape') FROM dbmail_mimeparts p JOIN dbmail_partlists l ON p.id = l.part_id 
JOIN dbmail_physmessage ph ON ph.id = l.physmessage_id WHERE l.physmessage_id = $1 
ORDER BY l.part_key, l.part_order ASC, l.part_depth DESC

starts to perform in 16+ seconds. I'm still investigating. No disk overload, no memory shortage, plenty of idle CPU. Just 40 imapsyncs in parallel on a reasonably fast box

alan-hicks · 2023-05-21T16:48:57Z

There are indexes on both mailbox_idnr and mailbox_idnr (with status) so I'm not sure why you need to create a new index, it would be useful to get performance feedback.

You can check the tables and indexes that are created here:
https://github.com/dbmail/dbmail/tree/master/sql

tarkhil · 2023-05-21T16:55:45Z

dbmail=# explain analyze  SELECT max(message_idnr) FROM dbmail_messages WHERE mailbox_idnr=71;
                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=186.68..186.69 rows=1 width=8) (actual time=3351.883..3351.892 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..186.68 rows=1 width=8) (actual time=3351.867..3351.873 rows=1 loops=1)
           ->  Index Scan Backward using dbmail_messages_pkey on dbmail_messages  (cost=0.43..246037.49 rows=1321 width=8) (actual time=3351.861..3351.863 rows=1 loops=1)
                 Index Cond: (message_idnr IS NOT NULL)
                 Filter: (mailbox_idnr = 71)
                 Rows Removed by Filter: 6391717
 Planning Time: 0.338 ms
 Execution Time: 3351.940 ms
(9 rows)

until I've added an index. Sorry, did not keep explain analyze under the same load. but it has speed up to about 10 ms instantly.

According to Boguk, it happened because query optimizer had no idea on correlation between message_idnr and mailbox_idnr.

alan-hicks · 2023-05-21T16:59:16Z

Perhaps the imapsyncs are causing sequential reads rather than using indexes due to the query plan returning a large volume. Also 40 parallel syncs would put a high load on the connection pool, also disks can be slow with multiple random reads.

tarkhil · 2023-05-21T17:03:52Z

They are fast SSD. Should not be THAT slow and disks are not overloaded.

alan-hicks · 2023-05-21T17:10:28Z

Your query is taking a very long time. My database is small (80Gb) but this comparison is what I would have expected:

dbmail=# explain (analyze, buffers) SELECT max(message_idnr) FROM dbmail_messages WHERE mailbox_idnr=71;

                                                                         QUERY PLAN

Result (cost=1.26..1.27 rows=1 width=8) (actual time=0.103..0.105 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 1 (returns $0)
-> Limit (cost=0.42..1.26 rows=1 width=8) (actual time=0.097..0.098 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan Backward using dbmail_messages_pkey on dbmail_messages (cost=0.42..22146.64 rows=26329 width=8) (actual time=0.096..0.096 rows=1 loops=1)
Index Cond: (message_idnr IS NOT NULL)
Filter: (mailbox_idnr = 1)
Buffers: shared hit=4
Planning Time: 0.373 ms
Execution Time: 0.154 ms
(11 rows)

tarkhil · 2023-05-22T04:16:27Z

I have slightly larger base, about a terabyte.

CozC · 2023-08-31T19:10:50Z

The dbmail-imap daemon is creating on internal structure per folder that is used in all kind of actions, so it cannot, at this point, to be ommited or avoided. There is an option to alleviate the execution of that query by using ‘mailbox_update_strategy=2’.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Likely inefficient query #189

Likely inefficient query #189

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023

tarkhil commented May 21, 2023 •

edited

alan-hicks commented May 21, 2023 •

edited

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023 •

edited

tarkhil commented May 22, 2023

CozC commented Aug 31, 2023

Likely inefficient query #189

Likely inefficient query #189

Comments

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023

tarkhil commented May 21, 2023 • edited

alan-hicks commented May 21, 2023 • edited

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023

tarkhil commented May 21, 2023

alan-hicks commented May 21, 2023 • edited

tarkhil commented May 22, 2023

CozC commented Aug 31, 2023

tarkhil commented May 21, 2023 •

edited

alan-hicks commented May 21, 2023 •

edited

alan-hicks commented May 21, 2023 •

edited