adtl like complex logic in spreadsheets #28

abhidg · 2024-05-10T15:08:43Z

Think of how to encode https://github.com/globaldothealth/adtl like logic into spreadsheets.

adtl has the following options:

applying transformations (turns values to floats, boolean returns, word conversion, durations, start/end dates...)
Unit/date format conversion
combined types:
- all
- any
- min
- max
- list/set
- firstNonNull

pipliggins · 2024-05-16T10:01:43Z

Currently:
"+" to concatenate fields
"if not" for if/else logic (i.e. enrolment_date if not admission_date fills in the enrolment date if admission_date is empty)

abhidg · 2024-05-16T10:24:30Z

Could we do admission_date or enrolment_date - that way the first priority field is first, and we can chain lesser priority items with or - also fewer characters! Also maps 1:1 to Python's or chaining

pipliggins · 2024-05-16T11:03:49Z

I like the "or" syntax option more generally, and think it would work well in the 1:M tables, but not sure it would work well in this specific example.

A simplified mapping file would look like this

variable	actualPeriod.start	actualPeriod.end
date_enrol	<FIELD> if not <date_adm_date>
date_adm_date	<FIELD>+<date_adm_time>
date_adm_time	<date_adm_date>+<FIELD>
outcome_date		<FIELD>

and a data file like this

id	admitted	date_enrol	date_adm_date	date_adm_time
1	N	2024-05-15
2	Y	2024-05-16	2024-05-15	13:00

The data ingestion function iterates over the column headers for each row and finds the match in the mapping['variable'] column to find which keys(s) (the mapping file column headers) the data should be mapped to. In this case if a subject is enrolled but not admitted, actualPeriod.start should be date_enrol. If they have been admitted, then the enrolment date should be skipped over in favour of using date_adm_date and date_adm_time to create a start datetime.

Using the "or" notation would look like this to get the correct output

variable	actualPeriod.start	actualPeriod.end
date_enrol	<date_adm_date>+<date_adm_time> or <FIELD>
date_adm_date	<FIELD>+<date_adm_time>
date_adm_time	<date_adm_date>+<FIELD>
outcome_date		<FIELD>

Which means you end up with a lot of duplication and it looks more complex.

abhidg · 2024-05-16T12:08:52Z

Perhaps another option would be to have a index showing the priority. Do we have other structures that we would have to port from adtl? any/all would be tricky to express in this scheme. There is also repetition with date_adm_date and date_adm_time entries being complementary

pipliggins · 2024-05-17T09:49:33Z

Looking at the old parsers, 'all' was generally used to classify the timing of observations (admission/study/followup) in non-redcap datasets. I think for now we don't need to worry about it.

"Any" is already semi-implicitly provided - each row of the mapping file indicates a single response type for a single variable. If it's a case of
if.any = ["outcome"=1, "outcome"=2, "outcome"=3]
you put the same mapping on each row, the pipeline will detect a duplicate data entry once it's filled in once. This is harder if 'any' is dependent on fields other than the current one being mapped though.

Lists we can again do implicitly, if more than one response is provided to a FHIR attribute that isn't a duplicate, we can assume a list is being built. THe trick will be to make sure this doesn't happen when we don't want it.

"firstNonNull" was most commonly used for date/age hierarchies. In FHIR we won't be doing this except for cases as above - In general there will be a single place for enrolment date, admission date etc and we shouldn't be filling in dates with substitutes if a specific date hasn't been provided as FHIR doesn't require them. For other scenarios we could do as you suggested above and use "or" with position enforced e.g. "type-1" or "type-2" or "gestational".

"min" and "max" were never used, so I don't think we'll need those.

pipliggins · 2024-05-17T09:53:16Z

Unit conversion shouldn't be needed as we'll just store whatever was recorded. We might need a way to indicate the date format though, as FHIR expects Y-M-D. This is the default redcap output so no worries for now, but might be required later.

abhidg self-assigned this May 10, 2024

abhidg added this to the dengue-pipeline milestone May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adtl like complex logic in spreadsheets #28

adtl like complex logic in spreadsheets #28

abhidg commented May 10, 2024 •

edited by pipliggins

pipliggins commented May 16, 2024

abhidg commented May 16, 2024

pipliggins commented May 16, 2024

abhidg commented May 16, 2024 •

edited

pipliggins commented May 17, 2024 •

edited

pipliggins commented May 17, 2024

adtl like complex logic in spreadsheets #28

adtl like complex logic in spreadsheets #28

Comments

abhidg commented May 10, 2024 • edited by pipliggins

pipliggins commented May 16, 2024

abhidg commented May 16, 2024

pipliggins commented May 16, 2024

abhidg commented May 16, 2024 • edited

pipliggins commented May 17, 2024 • edited

pipliggins commented May 17, 2024

abhidg commented May 10, 2024 •

edited by pipliggins

abhidg commented May 16, 2024 •

edited

pipliggins commented May 17, 2024 •

edited