Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel processing using multiprocessing #17

Open
abhidg opened this issue Feb 12, 2023 · 0 comments
Open

Support parallel processing using multiprocessing #17

abhidg opened this issue Feb 12, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@abhidg
Copy link
Contributor

abhidg commented Feb 12, 2023

adtl should be able to utilise >1 core. Since it has to support groupBy operations, as well as output multiple tables simultaneously, an intermediate internal output format can be emitted, to be consumed directly by downstream code or by writers in various formats:

{ "table": "subject", "group": "some-id", "data": {}}
{ "table": "observation": "data": {}}
{ "table": "cases", "data": {}}

In all cases data should be a JSON object with keys as field names. It is upto groupBy consumers how to aggregate data from a table. By default adtl will provide a groupBy consumer that keeps the last not null value for a field in a group.

Once validation (#4) is added, the format will have additional keys valid (boolean) and errors (list) which will report whether the row is valid and the validation errors. It will be up to downstream whether to keep or filter these rows out.

@abhidg abhidg added the enhancement New feature or request label Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant