Add a new index for BHA #34602

AmitPhulera · 2024-05-10T15:37:13Z

Product Description

Initial work to create a new index for BHA based of case search index.
This might not be functional, have put the code out there to get initial feedback on the approach.

Technical Summary

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

Migrations

The migrations in this code can be safely applied first independently of the code

Rollback instructions

This PR can be reverted after deploy with no further considerations

Labels & Review

Risk label is set correctly
The set of people pinged as reviewers is appropriate for the level of risk of the change

…t=5 -c case_search_bha:case-search-bha-2024-05-10

esoergel · 2024-05-13T21:07:17Z

I'm particularly interested to see how this integrates with pillows and other places that write. We spoke offline about this a bit already, but I was imagining this to live within the standard case search adapter, so nothing else needs to know about writing to multiple places.

The other thing I was thinking about while reviewing this is how this would work with more than one dedicated index. Like, in my mind it'd be ideal if there'd be a config option in localsettings like

DEDICATED_CASE_SEARCH_INDICES = {
    # domain_name: index_name
    'mydomain': 'case-search-mydomain',
    'anotherdomain': 'case-search-anotherdomain',
}

And then this could be used to direct writes as appropriate in the adapter. Otherwise it feels awfully boilerplate-y to have to make a replica adapter for each. We could also do likewise for reads (rather than the db based approach I floated in #34601), though I think we'll want the two independently configurable in any case.

It would also be nice if there's a way to do likewise for the migrations framework. This index applies only to one environment, so we shouldn't build and write to it in other settings. I'm not certain how best to manage that, but one thing that sounds appealing is treating this as a second case search index, in the sense that all migrations applied to the main case search index are also applied to this one by default. Then you'd want an initial action (in commcare-cloud?) to create the index when the settings file is changed, but in steady-state, it'd just use the same migrations as the main one.

…ex when required

…iplex deletes if required

this will be the case when we have called bulk_delete and passed it ids, bulk_delete calls bulk. We only use it at one place in HQ which is to delete case search data for a entire domain in HQ which I don't think we will do

snopoke · 2024-05-15T14:39:32Z

Like @esoergel I was also expecting more of a settings approach rather than a new adapter but its possible that we would need a new adapter anyway because of how they are tied to migrations etc. Maybe we could create them dynamically though?

AmitPhulera · 2024-05-17T12:05:17Z

I'm particularly interested to see how this integrates with pillows and other places that write. We spoke offline about this a bit already, but I was imagining this to live within the standard case search adapter, so nothing else needs to know about writing to multiple places.

Yeah that is correct atleast for the writes. ab2ba07 7013b42 7cb3b24 subsequently ovverrides index and bulk operations

For deletes I ended up modifying pillows 813a664 5cb8706 as we already have domain information here.

The other thing I was thinking about while reviewing this is how this would work with more than one dedicated index. Like, in my mind it'd be ideal if there'd be a config option in localsettings.

I have kept a plug ab2ba07#diff-7ba2ac2cd9c42581b497ab0c7cb5ea77288d3a8d7e8583cfc2c35a4da9e063b6R196-R199 which can essentially return an adapter based on any condition that seem fit.

I think we should use settings to figure out the adapter and CaseSearchConfig to determine if we have to multiplex writes. Two new attributes multiplex_writes, read_from_new_index can be added to CaseSearchConfig which can be modified at runtime as per requirement.

And regarding having it for multiple dedicated indices, it should be quite possible by the approach but yes it will add the overhead of the boilerplate and I think it should be okay. My idea of these indices is as of SQL tables having their own separate model classes. As per @snopoke's point we can definitely add more tooling to auto-generate the adapter classes automatically but the question again is how often will it be used.

This index applies only to one environment, so we shouldn't build and write to it in other settings

I have this on my radar and will implement it part of sepearte PR where our migration framework will support the option of having environment specific indices.

I'm not certain how best to manage that, but one thing that sounds appealing is treating this as a second case search index, in the sense that all migrations applied to the main case search index are also applied to this one by default.

This is a great point. I have a vague idea of how it could be done without changing things in commcare-cloud. Will have to think more about it.

esoergel · 2024-05-21T21:15:18Z

corehq/ex-submodules/pillowtop/processors/elastic.py

@@ -70,20 +70,21 @@ def process_change(self, change):

        if change.deleted and change.id:
            doc = change.get_document()
+            domain = doc.get('domain')


How safe is this? Will domain be None sometimes? I thought we didn't get the docs on deletions and would have to pull from change meta.

I am not very sure about this change and I was also not very sure how to add tests for it. Any inputs here would be super helpful.

This was the document that is to be indexed in ES, it should always have a domain. But based on your suggestion I have added 3417fef#diff-308591ed80f437c63d421731473742a6d8fec03d88c9410b46d1d39704dcc66dR73-R75

esoergel · 2024-05-21T21:23:00Z

corehq/apps/es/case_search.py

@@ -179,6 +179,8 @@ def _get_domain_from_doc(self, doc):
        `doc` can be CommcCareCase instance or dict. This util method extracts domain from doc.
        This will fail hard if domain is not present in doc.
        """
+        if not doc:
+            return None


Is there a way to explicitly whitelist the the paths we know to be safe? This will result in skipping the multiplexing, so we should only do it if we know we can. Like if we know we're deleting the whole domain, then maybe we could skip the multiplexing logic (and delete the dedicated index through another means). That way we wouldn't accidentally skip writes if someone adds another code path that triggers this condition.

I am not actually sure if I can confidently say I am aware of all the paths. But if we delete a domain which is on sub-index then we will have a couple of manual steps to do first i.e stop writes on that index by updating the config and index deletion can be taken care of then rather than automating it?

AmitPhulera · 2024-05-23T11:26:45Z

I am closing out this PR as I have split it up in #34663 and #34673

AmitPhulera added 8 commits May 1, 2024 12:52

add new index name in es consts file

63e9abf

add case_search_bha adapter

6aea662

Merge remote-tracking branch 'origin/master' into ap/new-index-for-bha

2d0d53d

add bha adapter config to __init__.py

602c344

update index name in const.py

a05a83f

run ./manage.py make_elastic_migration --name add_new_index_for_bha -…

a9e71e4

…t=5 -c case_search_bha:case-search-bha-2024-05-10

update setting values for bha index

d59ecaa

override index method for ElasticCaseSearch

ab2ba07

AmitPhulera requested review from snopoke, millerdev, esoergel, calellowitz and gherceg May 10, 2024 15:37

dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label May 10, 2024

remove extra words that were somehow added

9979e80

AmitPhulera added 9 commits May 14, 2024 14:31

add util to get domain from object which can be dict or object

7013b42

add tests for index operation on case-search adapter

1449d03

add bulk method in case_search adapter to multiplex writes to bha ind…

7cb3b24

…ex when required

add docs to the newly added functions

9be99d5

support domain in delete doc method in pillow and add ability to mult…

813a664

…iplex deletes if required

pass domain in all the places where deletions are happening

5cb8706

add index_runtime name in migration file

1b53141

if doc is None return None

bc07ecd

this will be the case when we have called bulk_delete and passed it ids, bulk_delete calls bulk. We only use it at one place in HQ which is to delete case search data for a entire domain in HQ which I don't think we will do

fix failing tests

1676a6e

AmitPhulera added 3 commits May 15, 2024 20:55

fix migrations.lock file

71b736e

add check for doc is None

fb77aac

failing test fix

59f2b6e

esoergel reviewed May 22, 2024

View reviewed changes

add fn to decide if to multiplex or not

072bbbd

AmitPhulera mentioned this pull request May 22, 2024

Create New Index For BHA #34663

Merged

3 tasks

Merge remote-tracking branch 'origin/master' into ap/new-index-for-bha

0385b3f

AmitPhulera mentioned this pull request May 23, 2024

Sync writes/deletes on sub case search indices. #34673

Merged

3 tasks

AmitPhulera closed this May 23, 2024

mkangia deleted the ap/new-index-for-bha branch May 27, 2024 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new index for BHA #34602

Add a new index for BHA #34602

AmitPhulera commented May 10, 2024

esoergel commented May 13, 2024 •

edited

snopoke commented May 15, 2024

AmitPhulera commented May 17, 2024

esoergel May 21, 2024

AmitPhulera May 22, 2024

AmitPhulera May 23, 2024

esoergel May 21, 2024

AmitPhulera May 22, 2024

AmitPhulera commented May 23, 2024

Add a new index for BHA #34602

Add a new index for BHA #34602

Conversation

AmitPhulera commented May 10, 2024

Product Description

Technical Summary

Feature Flag

Safety Assurance

Safety story

Automated test coverage

QA Plan

Migrations

Rollback instructions

Labels & Review

esoergel commented May 13, 2024 • edited

snopoke commented May 15, 2024

AmitPhulera commented May 17, 2024

esoergel May 21, 2024

Choose a reason for hiding this comment

AmitPhulera May 22, 2024

Choose a reason for hiding this comment

AmitPhulera May 23, 2024

Choose a reason for hiding this comment

esoergel May 21, 2024

Choose a reason for hiding this comment

AmitPhulera May 22, 2024

Choose a reason for hiding this comment

AmitPhulera commented May 23, 2024

esoergel commented May 13, 2024 •

edited