Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for a hashtag is slow #70

Open
eggpi opened this issue Jul 4, 2021 · 8 comments
Open

Searching for a hashtag is slow #70

eggpi opened this issue Jul 4, 2021 · 8 comments

Comments

@eggpi
Copy link
Contributor

eggpi commented Jul 4, 2021

As far as I can tell, searching for any hashtag seems to take multiple seconds and even times out sometimes. We should look into optimizing this.

As a simple first step, we could log the output of explain on the queryset used in searches to see if we're doing something funny like a full-table scan. For my own future reference, here's the documentation for interpreting that output.

It would also be good to experiment with a profiler to get a breakdown of the wall-clock time of a query. I think it's likely that the bottleneck is the database query (because the rest of the code is pretty standard use of Django), but we shouldn't take that for granted and this would allow us to identify other bad spots.

@Samwalton9 would you be able to try the above in the production environment please?

@Samwalton9
Copy link
Member

I might not find time for this at the moment, but I'd be happy to get you a copy of the production database for local testing, if that would be helpful.

@eggpi
Copy link
Contributor Author

eggpi commented Jul 12, 2021

Sure, let's try that. If I can reproduce it like that, it's even better.

@Samwalton9
Copy link
Member

https://drive.google.com/file/d/1iyX0EEmeJhqS_6jImG1Q4gu1_xLnAbmU/view?usp=sharing

Let me know when you've downloaded the file so I can reclaim some Google Drive storage space :)

@eggpi
Copy link
Contributor Author

eggpi commented Jul 12, 2021

Downloaded, thanks!

@eggpi
Copy link
Contributor Author

eggpi commented Jul 23, 2021

I think there are two main sources of slowness here:

  • Missing database indexes for some important queries.
  • Accidental evaluation of Django QuerySets. We attempt to use pagination for search results, but we end up running both a paginated query (SQL with OFFSET / LIMIT clauses) and its non-paginated, slow version.

I have a couple of fairly simple patches to address those issues and, in local experiments, they make a big difference: we essentially go from always timing out when searching for the "Trending tags", to serving the search within 3 seconds or so.

The one thing I'm not proposing we try yet is an optimization within Django's pagination (that's my f277fe8), as that one actually has behavior changes to the tool other than the speed.

I'll open a pull request with the more conservative patches, let's see how they behave in production.

@Samwalton9
Copy link
Member

Per #71 (comment) we've just made huge gains in tool speed!

@eggpi
Copy link
Contributor Author

eggpi commented Jul 28, 2021

It looks like there's still some bad performance when querying for many hashtags at once (example query).

I need to check whether this is some other unoptimized code path, but at this point we might need to consider patches that change the UI (e.g. break the pagination of results).

eggpi added a commit to eggpi/hashtags that referenced this issue Jul 28, 2021
We just want to see if we have results or not.

This helps with WikipediaLibrary#70 for the statistics page.
eggpi added a commit to eggpi/hashtags that referenced this issue Jul 28, 2021
We just want to see if we have results or not.

This helps with WikipediaLibrary#70 for the statistics page.
@eggpi
Copy link
Contributor Author

eggpi commented Jul 29, 2021

Things are looking a little better now I think. I'm seeing the test query load in around 20s, which is not great, but better than consisten 502s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants