You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a "reindex all" command is issued to Metacat in k8s, and there are too many index workers deployed, they all overwhelm the single metacat instance with requests, which leads to errors including database connection pool exhaustion and other (unexplained) issues.
We need to reproduce and look at some of these overload mechanisms, then make changes to ensure metacat can tolerate the load from reindexing.
The text was updated successfully, but these errors were encountered:
Good observation. A lot of this overload is from the many /meta and /object API calls (and associated access control checks) needed to handle indexing. This is a well-known problem for us, and the point of our hashstore storage refactor is that dataone-indexer workers can get the files they need for indexing without making any API calls. In our new design, a call to reindex all will generate a lot of rabbitmq tasks that contain the job info needed for each indexing job, and the indexing workers can do their thing in parallel without hitting metacat with rest calls. I suspect when we get here, then our limiting bottlenecks will shift to 1) I/O limits from Ceph, and 2) solr write limits (although in theory we can shard this and provide horizontal scaling in solr too).
When a "reindex all" command is issued to Metacat in k8s, and there are too many index workers deployed, they all overwhelm the single metacat instance with requests, which leads to errors including database connection pool exhaustion and other (unexplained) issues.
We need to reproduce and look at some of these overload mechanisms, then make changes to ensure metacat can tolerate the load from reindexing.
The text was updated successfully, but these errors were encountered: