Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likely inefficient query #189

Open
tarkhil opened this issue May 21, 2023 · 9 comments
Open

Likely inefficient query #189

tarkhil opened this issue May 21, 2023 · 9 comments

Comments

@tarkhil
Copy link

tarkhil commented May 21, 2023

Hello

I'm running a dbmail site with a moderate load, and have some large mailboxes. I've found that some requests (roundcube-originated) are inefficient

SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, 
  TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, seq, m.message_idnr, 
  status, m.physmessage_id FROM dbmail_messages m 
  LEFT JOIN dbmail_physmessage p ON p.id = m.physmessage_id 
  WHERE m.mailbox_idnr = 15895 /*SEQ New*/ AND m.status < 2  ORDER BY m.seq DESC;

returns 20k+ messages which is apparently slow anyway. Can it be an ill-formed IMAP request forcing server to return all messages or is a limit clause omitted by dbmail?

@alan-hicks
Copy link
Member

Roundcube has good caching and 10k messages are returned in under a second.
You might want to profile the query to see why it's slow. The database structure is optimised for most queries so it scales well.

@tarkhil
Copy link
Author

tarkhil commented May 21, 2023

At least this one

SELECT max(message_idnr)+1 FROM dbmail_messages WHERE mailbox_idnr=71;

did not perform well until I've added index (mailbox_idnr, message_udnr), speeding it up about 100 times

Under a load, requests like

SELECT l.part_key,l.part_depth,l.part_order,l.is_header,TO_CHAR(ph.internal_date, 'YYYY-MM-DD HH24:MI:SS' ),
ENCODE(data::bytea,'escape') FROM dbmail_mimeparts p JOIN dbmail_partlists l ON p.id = l.part_id 
JOIN dbmail_physmessage ph ON ph.id = l.physmessage_id WHERE l.physmessage_id = $1 
ORDER BY l.part_key, l.part_order ASC, l.part_depth DESC

starts to perform in 16+ seconds. I'm still investigating. No disk overload, no memory shortage, plenty of idle CPU. Just 40 imapsyncs in parallel on a reasonably fast box

@alan-hicks
Copy link
Member

alan-hicks commented May 21, 2023

There are indexes on both mailbox_idnr and mailbox_idnr (with status) so I'm not sure why you need to create a new index, it would be useful to get performance feedback.

You can check the tables and indexes that are created here:
https://github.com/dbmail/dbmail/tree/master/sql

@tarkhil
Copy link
Author

tarkhil commented May 21, 2023

dbmail=# explain analyze  SELECT max(message_idnr) FROM dbmail_messages WHERE mailbox_idnr=71;
                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=186.68..186.69 rows=1 width=8) (actual time=3351.883..3351.892 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..186.68 rows=1 width=8) (actual time=3351.867..3351.873 rows=1 loops=1)
           ->  Index Scan Backward using dbmail_messages_pkey on dbmail_messages  (cost=0.43..246037.49 rows=1321 width=8) (actual time=3351.861..3351.863 rows=1 loops=1)
                 Index Cond: (message_idnr IS NOT NULL)
                 Filter: (mailbox_idnr = 71)
                 Rows Removed by Filter: 6391717
 Planning Time: 0.338 ms
 Execution Time: 3351.940 ms
(9 rows)

until I've added an index. Sorry, did not keep explain analyze under the same load. but it has speed up to about 10 ms instantly.

According to Boguk, it happened because query optimizer had no idea on correlation between message_idnr and mailbox_idnr.

@alan-hicks
Copy link
Member

Perhaps the imapsyncs are causing sequential reads rather than using indexes due to the query plan returning a large volume. Also 40 parallel syncs would put a high load on the connection pool, also disks can be slow with multiple random reads.

@tarkhil
Copy link
Author

tarkhil commented May 21, 2023

They are fast SSD. Should not be THAT slow and disks are not overloaded.

@alan-hicks
Copy link
Member

alan-hicks commented May 21, 2023

Your query is taking a very long time. My database is small (80Gb) but this comparison is what I would have expected:

dbmail=# explain (analyze, buffers) SELECT max(message_idnr) FROM dbmail_messages WHERE mailbox_idnr=71;

                                                                         QUERY PLAN                                                                              

Result (cost=1.26..1.27 rows=1 width=8) (actual time=0.103..0.105 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 1 (returns $0)
-> Limit (cost=0.42..1.26 rows=1 width=8) (actual time=0.097..0.098 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan Backward using dbmail_messages_pkey on dbmail_messages (cost=0.42..22146.64 rows=26329 width=8) (actual time=0.096..0.096 rows=1 loops=1)
Index Cond: (message_idnr IS NOT NULL)
Filter: (mailbox_idnr = 1)
Buffers: shared hit=4
Planning Time: 0.373 ms
Execution Time: 0.154 ms
(11 rows)

@tarkhil
Copy link
Author

tarkhil commented May 22, 2023

I have slightly larger base, about a terabyte.

@CozC
Copy link
Member

CozC commented Aug 31, 2023

The dbmail-imap daemon is creating on internal structure per folder that is used in all kind of actions, so it cannot, at this point, to be ommited or avoided. There is an option to alleviate the execution of that query by using ‘mailbox_update_strategy=2’.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants