Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MongoDB indices for analysis fields #2079

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JSCU-CNI
Copy link
Contributor

This PR adds keyword indices for certain fields in the analysis collection. This massively improves load time when accessing an individual analysis result in CAPE on large MongoDB instances.

CAPE uses one large OR-query when loading an analysis result which is why we opted for a separate index for every field on which the aggregation is run. The query also looks for exact matches so the fields are indexed as keyword instead of text.

This massively improves load time when accessing an individual analysis result in CAPEv2 on large MongoDB instances.
"procmemory.file_ref",
]
for item in items:
mongo_create_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a very long time for a lot of CAPE installations - terabyte scale is common. Probably better to do this in the utils somewhere and emit a warning if the indices aren't found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly do you suggest we should put this?

Copy link
Contributor

@nbargnesi nbargnesi May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest as a new module in utils, mongodb_indices or something appropriately named.

In the startup module you can check the indices are there and emit a warning if they're missing.

Note too, the difference between doing it in startup and utils. Putting it in utils means we can add indices out-of-band, while CAPE continutes to run. Great way of making things faster incrementally without bringing CAPE down by touching startup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants