Support parallel processing using multiprocessing #17

abhidg · 2023-02-12T20:57:52Z

adtl should be able to utilise >1 core. Since it has to support groupBy operations, as well as output multiple tables simultaneously, an intermediate internal output format can be emitted, to be consumed directly by downstream code or by writers in various formats:

{ "table": "subject", "group": "some-id", "data": {}}
{ "table": "observation": "data": {}}
{ "table": "cases", "data": {}}

In all cases data should be a JSON object with keys as field names. It is upto groupBy consumers how to aggregate data from a table. By default adtl will provide a groupBy consumer that keeps the last not null value for a field in a group.

Once validation (#4) is added, the format will have additional keys valid (boolean) and errors (list) which will report whether the row is valid and the validation errors. It will be up to downstream whether to keep or filter these rows out.

The text was updated successfully, but these errors were encountered:

abhidg added the enhancement New feature or request label Feb 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parallel processing using multiprocessing #17

Support parallel processing using multiprocessing #17

abhidg commented Feb 12, 2023

Support parallel processing using multiprocessing #17

Support parallel processing using multiprocessing #17

Comments

abhidg commented Feb 12, 2023