-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove IndexMap document list #3606
Remove IndexMap document list #3606
Conversation
PBENCH-1315 The production server, with "only" 108,728 indexed datasets (many more still haven't been migrated from the passthrough server), currently claims 84.1Gb of PostgreSQL storage just for the `IndexMap` table. Most of this consists of a list of each Opensearch document ID in order to allow using bulk update and delete operations to manage the index. This is straining the capacity of our RDU2 PostgreSQL server. As an alternative, this PR removes the document list and instead of the bulk update and delete operations uses `_delete_by_query` and `_update_by_query` searching for documents in the appropriate indices (which we still store in the `IndexMap`) by parent dataset resource ID. Along the way, I noticed that (oops) we were missing the `"authorization"` subdocument in some of our Elasticsearch documents, which would impact the authenticated search API behaviors. And I acted on a deprecation warning for a camelCase template keyword by replacing it with a snake_case alternative. __DRAFT__: I've completed the basic implementation, so that can be reviewed, along with the database model unit test, but the API unit test is still pending. (I'll continue working on that as I have time, but thought I might as well post what I've got so far.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, but I'm only a quarter of the way through.
1. Remove the nascent `test_datasets_delete.py` unit test; we have a priority to quickly defer this SQL fix to reduce our load on the RDU2 PostgreSQL. 2. I'd successfully run functional tests, but as I'd wanted to debug within the container I'd been running `runlocal --keep`, which skips the dataset `DELETE` and avoided exposing the fundamental bug that `ElasticBase` doesn't expose `_delete`. (Oops.) 3. Some general cleanup. There are probably some `INFO` log messages that could be removed, but for now I'd prefer that this subsystem remain a bit chatty in case we encounter any problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues with the code that you've changed. (I didn't really look at the code that you didn't change. 🙁) I've got a couple of quick, entirely ignorable suggestions (and another which I'm going to pre-resolve).
PBENCH-1315
The production server, with "only" 108,728 indexed datasets (many more still haven't been migrated from the passthrough server), currently claims 84.1Gb of PostgreSQL storage just for the
IndexMap
table. Most of this consists of a list of each Opensearch document ID in order to allow using bulk update and delete operations to manage the index. This is straining the capacity of our RDU2 PostgreSQL server.As an alternative, this PR removes the document list and instead of the bulk update and delete operations uses
_delete_by_query
and_update_by_query
searching for documents in the appropriate indices (which we still store in theIndexMap
) by parent dataset resource ID.Along the way, I noticed that (oops) we were missing the
"authorization"
subdocument in some of our Elasticsearch documents, which would impact the authenticated search API behaviors. And I acted on a deprecation warning for a camelCase template keyword by replacing it with a snake_case alternative.NOTE: In the interest of expediently deploying a fix for our SQL bloat in RDU2, this is missing unit testing for
update
anddelete
, both of which are tested (in indexed and non-indexed cases) by functional tests.