Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Data Indexing time #372

Open
abdulkadernsu opened this issue Sep 23, 2023 · 9 comments
Open

Large Data Indexing time #372

abdulkadernsu opened this issue Sep 23, 2023 · 9 comments

Comments

@abdulkadernsu
Copy link

I have 250k + product on my woocommerce site. I have implemented wp search with algolia for searching. The fact is it is taking too much time to get indexing. I have counted only "100 data get indexed in 30 sec around". How can I do fast indexing? Also if I update my products in future, how it will sync with algolia without reindexing again?
Thanks

@tw2113
Copy link
Member

tw2113 commented Sep 23, 2023

Are you going through the browser or are you going through something like command line which will probably process faster. You're still beholden to the server and resources available to it at any given time. I also know we have some filters available for stuff like batch sizes that could be tweaked.

Simply put, a huge amount of content is going to take awhile to get indexed, especially initially with an empty index.

Adding/updating individual products as you go along updates the indexes as you go along as well. The primary times when you'd need to do a bulk re-index would be if you change what data gets indexed. At that point, you'd need to re-index to retroactively push that data in for all your already indexed items.

@abdulkadernsu
Copy link
Author

Um using browser.. But will love to do with cmd line if it gives faster experience. But can I get a documentation to do indexing using cmd? + how can I increase the batch size?

@samfrank
Copy link

+1 for linking to the documentation on how to index via the command line - We are having the same issue

@samfrank
Copy link

@abdulkadernsu Hey! Just had a look at the wiki and I found some documentation on it https://github.com/WebDevStudios/wp-search-with-algolia/wiki/WP-CLI

@samfrank
Copy link

I dont know if it was faster, but I had an issue with WP Engine killing long processes after 60s so I couldn't index everything. This bypasses this issue

@tw2113
Copy link
Member

tw2113 commented Sep 25, 2023

Correct on the wiki URL for our documentation with WP-CLI

WP-Engine's advanced panel does definitely visually "time out" after that amount of time but I believe the actual commands keep running in the background. Ideally with WP-Engine you'd SSH in and use a proper terminal outside of the browser.

Filters: https://github.com/WebDevStudios/wp-search-with-algolia/wiki/Filter-Hooks#filters-reference

Specifically algolia_indexing_batch_size which defaults to integer 100

@samfrank
Copy link

Great to know, thanks @tw2113

@tw2113
Copy link
Member

tw2113 commented Oct 24, 2023

Based on a lot of my recent work and findings, I want to clarify that WP-CLI isn't necessarily going to be faster, but it is very useful for automation since you can run commands in cron jobs, and something like wp algolia re-index would be an easy one to do up, among many other combinations.

@tw2113
Copy link
Member

tw2113 commented Mar 12, 2024

@abdulkadernsu I wonder if some of the filters available in https://github.com/WebDevStudios/wp-search-with-algolia/wiki/Timeouts may help, especially on the cURL side of things.

I also know I have a new filter in the works that will help with configuring https://www.algolia.com/doc/api-reference/api-methods/configuring-timeouts/ that may be a secondary thing to try out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants