Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new index for the unique constraint on imapuid #326

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

drewler
Copy link
Contributor

@drewler drewler commented Mar 4, 2022

Queries such as this one and this one have a high impact on IO, related to the elevated number of messages returned plus the data per row (as all the columns are fetched).

This situation could be mitigated by using a covering index on ("account_id", "folder_id", "msg_uid"). Thing is, there's an index there already because of the UniqueContraint which creates an index behind the scenes... just in the wrong order. By reordering it and leaving msg_uid as the last column on the index, the unique constraint should double as a covering index resulting in a significative reduction in IO (around 75%).

This first migrations creates a new unique constraint (plus related index) using the new ordering. After that we'll use a second one to drop the old constraint.

@drewler drewler requested a review from squeaky-pl March 4, 2022 10:54
@squeaky-pl
Copy link
Contributor

This is on hold for now, commenting so the bot does not remind me every day.

@wojcikstefan
Copy link
Member

Is there a particular action item or some other thing that needs to happen before we can revisit this PR @drewler @squeaky-pl?

@drewler
Copy link
Contributor Author

drewler commented Jun 14, 2022

I don't have the screenshot showing the difference for the cluster where these changes had been applied, but comparing that cluster with another one with a similar load today, the number of rows read dropped by ~33%. One MAX query (before, the third worst performing query) is now instantaneous thanks to the index. As a result, IO usage drops and query latency improves. We didn't complete the upgrade because:

  • There was another query that should have seen its performance improved because of this index, but we couldn't notice any difference and didn't understand why.
  • Either for the query that did improve or for the one that didn't improve, we couldn't show any meaningful impact. In the end, we had started investigating this because of an alert related to CPU usage. While looking into that, we found about this IO performance issue. While performance improved, we couldn't demonstrate a meaningful impact so this has been kept in the freezer until we could investigate some more.

Guess we should either revert to the old index (we put the new one in place in one of the clusters) or update it for all clusters.

Query that improved:

SELECT MAX(imapuid.msg_uid)
FROM imapuid
WHERE imapuid.account_id =...
    AND imapuid.folder_id =...

Query that didn't improve:

-- we also tried FORCE INDEX on this one
SELECT imapuid.msg_uid
FROM imapuid
WHERE imapuid.account_id =...
    AND imapuid.folder_id =...

@squeaky-pl feel free to expand my takes (or correct them if there's something wrong).

@squeaky-pl squeaky-pl removed their request for review August 11, 2022 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants