Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default secondary sort may cause performance issues #3197

Open
sethboyles opened this issue Feb 21, 2023 · 2 comments
Open

Default secondary sort may cause performance issues #3197

sethboyles opened this issue Feb 21, 2023 · 2 comments

Comments

@sethboyles
Copy link
Member

sethboyles commented Feb 21, 2023

Someone discovered that a workaround for this slowdown was including a two column index:

alter table ccdb.app_usage_events add index ix_workaround (created_at desc, guid asc);

When app_usage_events has 5.7 million rows:

Before index:

SELECT count(*) AS `count` FROM `app_usage_events` LIMIT 1;
1 row in set (1.32 sec)

SELECT * FROM `app_usage_events` ORDER BY `app_usage_events`.`created_at` DESC, `app_usage_events`.`guid` ASC LIMIT 1 OFFSET 0;
1 row in set (13.58 sec)

After index:

SELECT count(*) AS `count` FROM `app_usage_events` LIMIT 1;
1 row in set (3.89 sec)

SELECT * FROM `app_usage_events` ORDER BY `app_usage_events`.`created_at` DESC, `app_usage_events`.`guid` ASC LIMIT 1 OFFSET 0;
1 row in set (0.00 sec)

We haven't personally done any testing on this performance improvement but it suggests we may want to revisit this line:

https://github.com/cloudfoundry/cloud_controller_ng/blob/main/lib/cloud_controller/paging/sequel_paginator.rb#L16

in which we automatically add a secondary sort to all queries to preserve a consistent order when returning results.

Here is the original commit: fedda54

We may not have considered the performance implications of always including a secondary sort on GUID without adding a two column index for each sortable field in each table. Perhaps we want to find another way to guarantee consistent results order, or only add this index for created_at and other common sorts.

@sethboyles sethboyles changed the title Revisit use of default secondary sort Default secondary sort may cause performance issues Feb 21, 2023
@philippthun
Copy link
Member

In the v2 endpoint for events, a combined ORDER BY clause with timestamp and id was used and there was a combined index for those columns...

I'm not aware of another method to achieve consistent ordering instead of using an ORDER BY clause... So I think we need to add combined indices for all valid "order_by" values plus "guid". We could also use "id" instead of "guid", not sure how this would effect performance.

@sethboyles - In your measurement shown above the SELECT count became slower. Is this query now consistently slower?

@sethboyles
Copy link
Member Author

sethboyles commented Feb 24, 2023

I didn't personally run these tests, so I'd have to set up a new db with test data to check about SELECT count. I concur that I don't know another way to guarantee consistent ordering. Looking at using ID as a secondary sort is something worth looking into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants