-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate and add equipment availability data #1329
Conversation
Hi @tbhallett. I'll add the equipment availability resource file to this PR. There are a couple of steps of cleaning the data left. It should be ready by Friday. But the format will be as follows: <style> </style>
|
Hi @sakshimohan and @nchagoma503 -- this is going to be brilliant. Thank you so much. Just one comment.... for the column {'Chitipa', 'Karonga', 'Nkhata Bay', 'Rumphi', 'Mzimba', 'Likoma',
'Mzuzu City', 'Kasungu', 'Nkhotakota', 'Ntchisi', 'Dowa', 'Salima',
'Lilongwe', 'Mchinji', 'Dedza', 'Ntcheu', 'Lilongwe City',
'Mangochi', 'Machinga', 'Zomba', 'Chiradzulu', 'Blantyre',
'Mwanza', 'Thyolo', 'Mulanje', 'Phalombe', 'Chikwawa', 'Nsanje',
'Balaka', 'Neno', 'Zomba City', 'Blantyre City'} |
Also, please could I suggest that the destination for the ResourceFiles that will be created to be |
Hi @tbhallett, @nchagoma503. This PR is now ready for review. The resourcefile has two columns -
There are some rows with missing data. For example, data on several items of equipment was not recorded at level 0. I've left these as such. I'm not sure whether we can assume that missing data implies missing. For instance, 126 (Biochemistry analyser) is recorded as available in some district which for others there is no data. I'm not sure how to deal with this. Here is a summary of the data. The trend is as we would expect.
|
Thanks so much for this @sakshimohan I've just done a quick change to update (ever so slightly) the name of the resulting file. I've tried running the script, I think this comes from the aggregation using max on the groupby object. It could be the columns that are strings need to be coerced into floats. Does it run cleanly from top-to-bottom on your machine? |
Hi @tbhallett. Thanks for looking at this. The script did run cleanly for me but you're right that there were two string value columns ( I should note that I'm still using python 3.10 not 3.11. Will update this soon! |
Thanks @sakshimohan. Now runs perfectly on my machine too. |
…comes in the correct format (row for every item and facility_id)
…/estimate_equipment_availability
…it produces estimate for every item/facility using its own interplation method. Update ResourceFile that results
Thanks again for this @sakshimohan. I've added some stuff onto the end of your script that
I'll add some comments onto the edits about both these points. |
tests/test_equipment.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove checks on handling of item_codes/facility_id that are not in the availability data (we now insist that they all are).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recreated file from update script
src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py
Outdated
Show resolved
Hide resolved
src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py
Show resolved
Hide resolved
src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py
Outdated
Show resolved
Hide resolved
- interpolate within facility ID and equipment category (cost <=$1000, cost > $1000)
src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py
Outdated
Show resolved
Hide resolved
@tbhallett This should be ready to merge now. |
|
||
# Remaining missing data are for items_codes in facility_ids for which there is no information at all in the facility_level | ||
# Impute the availability for these items in facilities as ... the average of other items available at other facilities | ||
final_equipment_availability_export_full = final_equipment_availability_export_full.groupby("Item_Code").transform(lambda x: x.fillna(x.mean())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this average should be computed from across facility levels 0-3. (Otherwise it gets dragged down by the zeros at level 5).
I'll make that last change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Thanks, Tim. Just as an FYI, there isn't any data being imputed by this line of code as the series is complete at this stage. But I have deleted the imputation in case any new source of data requires this step.
* refactor the merging-in of equipment cost category, to avoid changing the dataframe (keeping the categories in separate series) * remove the unnecessary final check of interpolating just on item_code
Fixes #1305
This PR generates availability data on equipment using HHFA 2018-19 and maps it to TLO model equipment codes.