Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new index for BHA #34602

Closed
wants to merge 23 commits into from
Closed

Add a new index for BHA #34602

wants to merge 23 commits into from

Conversation

AmitPhulera
Copy link
Contributor

Product Description

Initial work to create a new index for BHA based of case search index.
This might not be functional, have put the code out there to get initial feedback on the approach.

Technical Summary

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

Migrations

  • The migrations in this code can be safely applied first independently of the code

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label May 10, 2024
@esoergel
Copy link
Contributor

esoergel commented May 13, 2024

I'm particularly interested to see how this integrates with pillows and other places that write. We spoke offline about this a bit already, but I was imagining this to live within the standard case search adapter, so nothing else needs to know about writing to multiple places.

The other thing I was thinking about while reviewing this is how this would work with more than one dedicated index. Like, in my mind it'd be ideal if there'd be a config option in localsettings like

DEDICATED_CASE_SEARCH_INDICES = {
    # domain_name: index_name
    'mydomain': 'case-search-mydomain',
    'anotherdomain': 'case-search-anotherdomain',
}

And then this could be used to direct writes as appropriate in the adapter. Otherwise it feels awfully boilerplate-y to have to make a replica adapter for each. We could also do likewise for reads (rather than the db based approach I floated in #34601), though I think we'll want the two independently configurable in any case.

It would also be nice if there's a way to do likewise for the migrations framework. This index applies only to one environment, so we shouldn't build and write to it in other settings. I'm not certain how best to manage that, but one thing that sounds appealing is treating this as a second case search index, in the sense that all migrations applied to the main case search index are also applied to this one by default. Then you'd want an initial action (in commcare-cloud?) to create the index when the settings file is changed, but in steady-state, it'd just use the same migrations as the main one.

@snopoke
Copy link
Contributor

snopoke commented May 15, 2024

Like @esoergel I was also expecting more of a settings approach rather than a new adapter but its possible that we would need a new adapter anyway because of how they are tied to migrations etc. Maybe we could create them dynamically though?

@AmitPhulera
Copy link
Contributor Author

I'm particularly interested to see how this integrates with pillows and other places that write. We spoke offline about this a bit already, but I was imagining this to live within the standard case search adapter, so nothing else needs to know about writing to multiple places.

Yeah that is correct atleast for the writes. ab2ba07 7013b42 7cb3b24 subsequently ovverrides index and bulk operations

For deletes I ended up modifying pillows 813a664 5cb8706 as we already have domain information here.

The other thing I was thinking about while reviewing this is how this would work with more than one dedicated index. Like, in my mind it'd be ideal if there'd be a config option in localsettings.

I have kept a plug ab2ba07#diff-7ba2ac2cd9c42581b497ab0c7cb5ea77288d3a8d7e8583cfc2c35a4da9e063b6R196-R199 which can essentially return an adapter based on any condition that seem fit.

I think we should use settings to figure out the adapter and CaseSearchConfig to determine if we have to multiplex writes. Two new attributes multiplex_writes, read_from_new_index can be added to CaseSearchConfig which can be modified at runtime as per requirement.

And regarding having it for multiple dedicated indices, it should be quite possible by the approach but yes it will add the overhead of the boilerplate and I think it should be okay. My idea of these indices is as of SQL tables having their own separate model classes. As per @snopoke's point we can definitely add more tooling to auto-generate the adapter classes automatically but the question again is how often will it be used.

This index applies only to one environment, so we shouldn't build and write to it in other settings

I have this on my radar and will implement it part of sepearte PR where our migration framework will support the option of having environment specific indices.

I'm not certain how best to manage that, but one thing that sounds appealing is treating this as a second case search index, in the sense that all migrations applied to the main case search index are also applied to this one by default.

This is a great point. I have a vague idea of how it could be done without changing things in commcare-cloud. Will have to think more about it.

@@ -70,20 +70,21 @@ def process_change(self, change):

if change.deleted and change.id:
doc = change.get_document()
domain = doc.get('domain')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is this? Will domain be None sometimes? I thought we didn't get the docs on deletions and would have to pull from change meta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very sure about this change and I was also not very sure how to add tests for it. Any inputs here would be super helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the document that is to be indexed in ES, it should always have a domain. But based on your suggestion I have added 3417fef#diff-308591ed80f437c63d421731473742a6d8fec03d88c9410b46d1d39704dcc66dR73-R75

@@ -179,6 +179,8 @@ def _get_domain_from_doc(self, doc):
`doc` can be CommcCareCase instance or dict. This util method extracts domain from doc.
This will fail hard if domain is not present in doc.
"""
if not doc:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to explicitly whitelist the the paths we know to be safe? This will result in skipping the multiplexing, so we should only do it if we know we can. Like if we know we're deleting the whole domain, then maybe we could skip the multiplexing logic (and delete the dedicated index through another means). That way we wouldn't accidentally skip writes if someone adds another code path that triggers this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not actually sure if I can confidently say I am aware of all the paths. But if we delete a domain which is on sub-index then we will have a couple of manual steps to do first i.e stop writes on that index by updating the config and index deletion can be taken care of then rather than automating it?

@AmitPhulera AmitPhulera mentioned this pull request May 22, 2024
3 tasks
@AmitPhulera
Copy link
Contributor Author

I am closing out this PR as I have split it up in #34663 and #34673

@mkangia mkangia deleted the ap/new-index-for-bha branch May 27, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reindex/migration Reindex or migration will be required during or before deploy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants