-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching for a hashtag is slow #70
Comments
I might not find time for this at the moment, but I'd be happy to get you a copy of the production database for local testing, if that would be helpful. |
Sure, let's try that. If I can reproduce it like that, it's even better. |
https://drive.google.com/file/d/1iyX0EEmeJhqS_6jImG1Q4gu1_xLnAbmU/view?usp=sharing Let me know when you've downloaded the file so I can reclaim some Google Drive storage space :) |
Downloaded, thanks! |
I think there are two main sources of slowness here:
I have a couple of fairly simple patches to address those issues and, in local experiments, they make a big difference: we essentially go from always timing out when searching for the "Trending tags", to serving the search within 3 seconds or so. The one thing I'm not proposing we try yet is an optimization within Django's pagination (that's my f277fe8), as that one actually has behavior changes to the tool other than the speed. I'll open a pull request with the more conservative patches, let's see how they behave in production. |
Per #71 (comment) we've just made huge gains in tool speed! |
It looks like there's still some bad performance when querying for many hashtags at once (example query). I need to check whether this is some other unoptimized code path, but at this point we might need to consider patches that change the UI (e.g. break the pagination of results). |
We just want to see if we have results or not. This helps with WikipediaLibrary#70 for the statistics page.
We just want to see if we have results or not. This helps with WikipediaLibrary#70 for the statistics page.
Things are looking a little better now I think. I'm seeing the test query load in around 20s, which is not great, but better than consisten 502s. |
As far as I can tell, searching for any hashtag seems to take multiple seconds and even times out sometimes. We should look into optimizing this.
As a simple first step, we could log the output of explain on the queryset used in searches to see if we're doing something funny like a full-table scan. For my own future reference, here's the documentation for interpreting that output.
It would also be good to experiment with a profiler to get a breakdown of the wall-clock time of a query. I think it's likely that the bottleneck is the database query (because the rest of the code is pretty standard use of Django), but we shouldn't take that for granted and this would allow us to identify other bad spots.
@Samwalton9 would you be able to try the above in the production environment please?
The text was updated successfully, but these errors were encountered: