Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adtl like complex logic in spreadsheets #28

Open
abhidg opened this issue May 10, 2024 · 6 comments
Open

adtl like complex logic in spreadsheets #28

abhidg opened this issue May 10, 2024 · 6 comments
Assignees

Comments

@abhidg
Copy link
Contributor

abhidg commented May 10, 2024

Think of how to encode https://github.com/globaldothealth/adtl like logic into spreadsheets.

adtl has the following options:

  • applying transformations (turns values to floats, boolean returns, word conversion, durations, start/end dates...)
  • Unit/date format conversion
  • combined types:
    • all
    • any
    • min
    • max
    • list/set
    • firstNonNull
@abhidg abhidg self-assigned this May 10, 2024
@abhidg abhidg added this to the dengue-pipeline milestone May 10, 2024
@pipliggins
Copy link
Collaborator

Currently:
"+" to concatenate fields
"if not" for if/else logic (i.e. enrolment_date if not admission_date fills in the enrolment date if admission_date is empty)

@abhidg
Copy link
Contributor Author

abhidg commented May 16, 2024

Could we do admission_date or enrolment_date - that way the first priority field is first, and we can chain lesser priority items with or - also fewer characters! Also maps 1:1 to Python's or chaining

@pipliggins
Copy link
Collaborator

I like the "or" syntax option more generally, and think it would work well in the 1:M tables, but not sure it would work well in this specific example.

A simplified mapping file would look like this

variable ... actualPeriod.start actualPeriod.end
date_enrol <FIELD> if not <date_adm_date>
date_adm_date <FIELD>+<date_adm_time>
date_adm_time <date_adm_date>+<FIELD>
outcome_date <FIELD>

and a data file like this

id admitted date_enrol date_adm_date date_adm_time
1 N 2024-05-15
2 Y 2024-05-16 2024-05-15 13:00

The data ingestion function iterates over the column headers for each row and finds the match in the mapping['variable'] column to find which keys(s) (the mapping file column headers) the data should be mapped to. In this case if a subject is enrolled but not admitted, actualPeriod.start should be date_enrol. If they have been admitted, then the enrolment date should be skipped over in favour of using date_adm_date and date_adm_time to create a start datetime.

Using the "or" notation would look like this to get the correct output

variable ... actualPeriod.start actualPeriod.end
date_enrol <date_adm_date>+<date_adm_time> or <FIELD>
date_adm_date <FIELD>+<date_adm_time>
date_adm_time <date_adm_date>+<FIELD>
outcome_date <FIELD>

Which means you end up with a lot of duplication and it looks more complex.

@abhidg
Copy link
Contributor Author

abhidg commented May 16, 2024

Perhaps another option would be to have a index showing the priority. Do we have other structures that we would have to port from adtl? any/all would be tricky to express in this scheme. There is also repetition with date_adm_date and date_adm_time entries being complementary

@pipliggins
Copy link
Collaborator

pipliggins commented May 17, 2024

Looking at the old parsers, 'all' was generally used to classify the timing of observations (admission/study/followup) in non-redcap datasets. I think for now we don't need to worry about it.

"Any" is already semi-implicitly provided - each row of the mapping file indicates a single response type for a single variable. If it's a case of
if.any = ["outcome"=1, "outcome"=2, "outcome"=3]
you put the same mapping on each row, the pipeline will detect a duplicate data entry once it's filled in once. This is harder if 'any' is dependent on fields other than the current one being mapped though.

Lists we can again do implicitly, if more than one response is provided to a FHIR attribute that isn't a duplicate, we can assume a list is being built. THe trick will be to make sure this doesn't happen when we don't want it.

"firstNonNull" was most commonly used for date/age hierarchies. In FHIR we won't be doing this except for cases as above - In general there will be a single place for enrolment date, admission date etc and we shouldn't be filling in dates with substitutes if a specific date hasn't been provided as FHIR doesn't require them. For other scenarios we could do as you suggested above and use "or" with position enforced e.g. "type-1" or "type-2" or "gestational".

"min" and "max" were never used, so I don't think we'll need those.

@pipliggins
Copy link
Collaborator

Unit conversion shouldn't be needed as we'll just store whatever was recorded. We might need a way to indicate the date format though, as FHIR expects Y-M-D. This is the default redcap output so no worries for now, but might be required later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants