Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CEP] System Forms Metadata Standardization #31873

Open
esoergel opened this issue Jul 6, 2022 · 6 comments
Open

[CEP] System Forms Metadata Standardization #31873

esoergel opened this issue Jul 6, 2022 · 6 comments
Assignees
Labels
CEP CommCare Enhancement Proposal

Comments

@esoergel
Copy link
Contributor

esoergel commented Jul 6, 2022

Abstract
A number of different parts of HQ create forms (pretty much all are case updates) for various purposes. As contrasted with forms submitted by CommCare mobile or webapps, these are generally thought of as "system forms". A recurring challenge is tagging these with useful information, and various conflicting mechanisms are currently in use.

References:
System forms in Reports
improve report filtering related to system forms #31654

Motivation
An ideal system would allow us to:

  • Programmatically identify all system forms
  • Identify system forms from a particular feature
  • Further identify which sub-feature submitted the form. This is usually dynamic, such as a specific auto update rule ID or DHIS2 integration
  • See what user triggered the action, where applicable
  • Store arbitrary unstructured information as needed

Specification
At the moment, 4 fields are commonly in use: xmlns, device_id, user_id, and username
The submit_case_blocks docstring has some current guidance around how these fields are used. Here are some features that submit system forms, and how they use each of these fields:

System feature User ID Username XMLNS Device ID
Case Importer Web user’s ID Web user username http://commcarehq.org/case corehq.apps.case_importer.do_import.do_import
Case Claim Mobile worker ID Mobile worker username http://commcarehq.org/case corehq.apps.ota.views.claim
Case Cleaning Web user ID Web user username http://commcarehq.org/case/edit corehq.apps.reports.views.edit_case
Update Rule system system http://commcarehq.org/hq_case_update_rule ""
COVID custom update rules system system http://commcarehq.org/hq_case_update_rule custom.covid.rules.custom_actions.{rule_name}
Deduplication system system http://commcarehq.org/hq_case_deduplication_rule__{name_slug}-{rule.case_type} CaseDeduplicationActionDefinition-update-cases
FHIR "" system http://commcarehq.org/x/fhir/engine-read FHIRImportConfig-{importer.pk} (but it varies)
DHIS2 "" system http://commcarehq.org/dhis2-integration ""
OpenMRS "" system http://commcarehq.org/openmrs-integration openmrs-atomfeed-{repeater.get_id} (and sometimes blank)
usercase "" system http://commcarehq.org/case corehq.apps.callcenter.sync_usercase._UserCaseHelper.
commtrack sms User ID "" Possibly http://openrosa.org/jr/xforms, unsure sms:{phone_number} or ""

Aside: I do see that anything that uses “submit_case_blocks” (which I’m pretty sure includes everything save commtrack and smsforms) uses the same template, which stores the form instance in a <system /> node, rather than <data /> as is normally used. This is a pretty reliable signal, and it is already queryable in elasticsearch. Still, I don't think it's necessary to build off of this in our desired end state.

The proposal is that we adopt the following conventions:

  • user_id - reserved for user where applicable, otherwise "system"
  • username - reserved for user where applicable, otherwise "system"
  • XMLNS - Identifies the feature. This must not be dynamic, and it must be registered with a human-readable name in SYSTEM_FORM_XMLNS_MAP
  • deviceID - Can be dynamic, used for storing further granularity about what portion of the feature it came from. This is also queryable

In this world, system forms could be identified conclusively with something like form.xmlns in SYSTEM_FORM_XMLNS_MAP.keys() in elasticsearch, postgres, or python. We would also be able to associate a human-readable name of the feature with the form based on its XMLNS (useful for reporting). Further granularity can be achieved by feature-specific filtering with deviceID as well. For example:

(FormES()
 .domain(domain)
 .xmlns(CASE_UPDATE_RULE_XMLNS)
 .device_id(rule.pk))

An option for further extension if needed is expanding beyond these four fields. CommCare uses appVersion, app_build_version, and commcare_version. I wouldn't want to abuse these beyond their stated meanings, though. We can also store essentially anything we want in the meta block, it just wouldn't be easily searchable in elasticsearch without switching to a subdocument model, though. Possible things we might want to store include:

  • Feature version (could use appVersion for this)
  • arbitrary logging/debugging information

Perhaps the most straightforward approach to storing arbitrary info is as normal form data, as is done here, which we could trivially enable in submit_case_blocks. An alternative would be adding an optional info arg to submit_case_blocks which accepts a dict and turns that into an info node inside the meta block. I'm interested in opinions there.

I imagine there's room for improving tooling to support this paradigm, plus certainly a docs change. One model that might be a good hook for standardization is the SystemFormMeta dataclass

Impact on users
I don't propose a large scale migration at this time, just alignment on a desired future state to guide feature development

Impact on hosting
no change

Backwards compatibility
None at this time, but backwards incompatibility is likely as features are migrated to the new standard.

Release Timeline
N/A

Open questions and issues
I'm particularly interested in hearing thoughts about how we can make this easier to stick to going forward

@esoergel esoergel added the CEP CommCare Enhancement Proposal label Jul 6, 2022
@esoergel esoergel assigned snopoke and kaapstorm and unassigned snopoke Jul 6, 2022
@dannyroberts
Copy link
Member

Just the existence of that table is pretty useful! Thanks for compiling this. My first thoughts:

  • Can you say a little more about the motivation? Not just the sort of abstract benefits, but like the future features you're hoping to enable through this and/or brief context of how it came up.
  • I see two aspects to the proposal: (1) is to simply organize, in code, more systematically what we already have by e.g. registering all system forms in SYSTEM_FORM_XMLNS_MAP. (2) is to make backwards-incompatible changes to how we store data. The tricky part about (2) of course will be that there will be a break in the data where older data won't respect the newer standards, and so queries can't assume the new standard; but in terms of each individual new form being able to show richer information to the user about its context, where it's okay if that doesn't apply to older forms, I could still imagine being useful. In contrast I see (1) as just simply useful in an uncomplicated way. If that distinction seems useful to you, perhaps you could organize the different parts of the proposal as falling into phase 1 and phase 2 or something like that to distinguish the uncomplicated clear improvements we can make to organization that would be immediately useful, versus the potentially backwards incompatible parts where there may be more tricky decisions to make. (If that doesn't seem like a useful way to organize it, that's fine, just a thought!)

@esoergel
Copy link
Contributor Author

esoergel commented Jul 7, 2022

Sure thing, thanks for the clarifying questions, @dannyroberts First off, motivations:

  • I've been in a number of conversations around how to do this properly, so the easiest outcome from this CEP is a documented decision on what the correct approach is, so we needn't deliberate this next time
  • I did some of the research for this doc a little while back in fielding a request to show system forms in standard reports. That's where I discovered how disjointed this is, and how hard it is to conclusively identify "system forms". Simon reworked our filters a bit to support that, though there's room to go further, such as by presenting known system forms as filter options (we don't have any plans to implement this at this time, though).
  • We're also deliberating ways to improve the UX of auto update rules, and these changes could support pre-filtering the submit history and manage forms reports to forms submitted by a particular rule. That'd be an easy way to get better user-facing logging, and possibly an "undo" functionality by bulk archival of those forms.
  • The most recent impetus is this ticket, where USH would like to be able to see which case update rule caused a particular case change. The proposal there is to use dynamic XMLNSs, but I think a static "case update rule" xmlns and a dynamic device ID would be better. I could also see us adding further context to the form body if we adopt that as a "normal" thing to do.

I do think your breakdown is a logical one. I'm actually hoping to avoid advocating here for any general breaking changes as you describe in point 2, as I think that's more of a feature-by-feature question. Most of the USH features that submit system forms would be fine with a break in continuity, but it'd be disruptive to make such a change for GA features. Everything except deduplication uses a static XMLNS, even if it doesn't uniquely identify those features. I'd love to see us move say, case claim in this direction, but that'd be a whole thing trying to figure that out.

I think the specific action items I'd like to see come out of this are the following non-breaking changes:

  • Document what we determine to be The Right Way
  • Organize XMLNSs into SYSTEM_FORM_XMLNS_MAP (or a successor enum type)
  • Use the readable feature names in some key places in reports

And then perhaps the following breaking change:

  • Add case update rule IDs to those forms as device ID.

On that point, the XMLNS is already feature-specific, and the device ID is unused, so this wouldn't be too scary a change. However, any new features built around that device ID would only support future data. That still seems like a net win, but I think that's a separate discussion - we'd have to get Product's 👍 before making such things GA.

@millerdev
Copy link
Contributor

millerdev commented Jul 8, 2022

This is a great outline of the different ways we use system forms. I like the idea of building conventions and tooling to make future system forms more consistent.

A couple of other examples of system forms:

One place where we created system forms was in the Forms & Cases Couch to SQL migration. System forms were used to patch cases to resolve otherwise irreconcilable case differences between Couch and SQL. Diff data and case updates were encoded in the body of the form (example). The code that reads and writes those forms has been deleted since the migration is complete, but the forms still exist as a permanent record of a few minor differences between Couch and SQL form processing.

Anther use of system forms introduced during the Forms & Cases Couch to SQL migration was "system actions," which were used to record and replay (on the SQL side) form archive and deletion events that occurred while the migration was in progress. The code supporting the creation of system actions still exists, although I'm not sure anything is using it.

@esoergel
Copy link
Contributor Author

esoergel commented Jul 8, 2022

Thanks for sharing those examples, Daniel - I didn't know that about the couch-to-sql migration forms. The more I think about it, the more I like using the form body for mostly unstructured logging, given that that's literally what xforms are built for. Thinking about how we can facilitate that, I guess a little validation is maybe all that's needed. We can add a form_data dict arg to submit_case_blocks and validate that the keys are valid XML identifiers and then escape the values like you did in that diff code.

@snopoke
Copy link
Contributor

snopoke commented Jul 20, 2022

To summarise the conversation thus far it seems the current suggestion is to:

  1. Use static XMLNS that is feature specific
  2. Use device ID to record a machine readable sub-feature ID e.g. update rule ID
  3. Use form data for unstructured data logging (feature specific)

I'm in agreement with all of that but would like to add a further suggestion.

Storing human readable name

The @name attribute of the root form element can be used for storing a human readable name for the form. This should correspond to the device ID and would allow quick display for the name without requiring DB lookup. It also preserves the value even if the corresponding 'device' is deleted. For example:

XMLNS Device ID @name UI display
http://commcarehq.org/hq_case_update_rule 123abc Close claim cases Case Rule > Close claim cases
http://commcarehq.org/hq_case_update_rule 456def Deactivate Checkin Case Rule > Deactivate Checkin cases

In future if more information is needed beyond just the name we could develop a "display block" that allowed adding more content to show to the user. I can see this being useful as a hover widget or similar:

image

@esoergel
Copy link
Contributor Author

@snopoke confirming that that summary is correct.

I also like your suggestion for including a name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CEP CommCare Enhancement Proposal
Projects
None yet
Development

No branches or pull requests

5 participants