Generate and add equipment availability data #1329

sakshimohan · 2024-05-07T18:20:46Z

This PR generates availability data on equipment using HHFA 2018-19 and maps it to TLO model equipment codes.

sakshimohan · 2024-05-07T18:31:44Z

Hi @tbhallett. I'll add the equipment availability resource file to this PR. There are a couple of steps of cleaning the data left. It should be ready by Friday. But the format will be as follows:

Equipment_code	Facility_Level	District	Proportion of facilities which reported equipment as available	Proportion of facilities which reported equipment as available & functional	Proportion of facilities which reported equipment as available, functional, and calibrated/prepared
1	1a	Lilongwe	0.9	0.85	0.8
1	1b	Lilongwe	0.8	0.8	0.8
2	1a	Blantyre	0.9	0.9	0.8
2	1b	Blantyre	0.7	0.6	0.5
...	...	...	...	...	...

tbhallett · 2024-05-08T09:06:28Z

Hi @tbhallett. I'll add the equipment availability resource file to this PR. There are a couple of steps of cleaning the data left. It should be ready by Friday. But the format will be as follows:
<style> </style>
Equipment_code Facility_Level District Proportion of facilities which reported equipment as available Proportion of facilities which reported equipment as available & functional Proportion of facilities which reported equipment as available, functional, and calibrated/prepared
1 1a Lilongwe 0.9 0.85 0.8
1 1b Lilongwe 0.8 0.8 0.8
2 1a Blantyre 0.9 0.9 0.8
2 1b Blantyre 0.7 0.6 0.5
... ... ... ... ... ...

Hi @sakshimohan and @nchagoma503 -- this is going to be brilliant. Thank you so much.

Just one comment.... for the column District, it would be really handy if this could match as well as possible with the District names defined in resources/demography/ResourceFile_PopulationSize_2018Census.csv. These are:

{'Chitipa', 'Karonga', 'Nkhata Bay', 'Rumphi', 'Mzimba', 'Likoma',
'Mzuzu City', 'Kasungu', 'Nkhotakota', 'Ntchisi', 'Dowa', 'Salima',
'Lilongwe', 'Mchinji', 'Dedza', 'Ntcheu', 'Lilongwe City',
'Mangochi', 'Machinga', 'Zomba', 'Chiradzulu', 'Blantyre',
'Mwanza', 'Thyolo', 'Mulanje', 'Phalombe', 'Chikwawa', 'Nsanje',
'Balaka', 'Neno', 'Zomba City', 'Blantyre City'}

tbhallett · 2024-05-08T09:08:00Z

Also, please could I suggest that the destination for the ResourceFiles that will be created to be resources/healthsystem/infrastructure_and_equipment

sakshimohan · 2024-05-16T13:24:14Z

Hi @tbhallett, @nchagoma503. This PR is now ready for review. The resourcefile has two columns -

'available' - this represents whether equipment was recorded as available in HHFA
'functional' - this represents whether equipment was recorded as functional in HHFA
Data is available for 62 unique Item_code (s)/equipment.

There are some rows with missing data. For example, data on several items of equipment was not recorded at level 0. I've left these as such. I'm not sure whether we can assume that missing data implies missing. For instance, 126 (Biochemistry analyser) is recorded as available in some district which for others there is no data. I'm not sure how to deal with this.

Here is a summary of the data. The trend is as we would expect.

Overall summary ('available', 'functional')

Facility_level	Average of 'available'	Average of 'functional'
0	19.4%	16.4%
1a	58.2%	52.9%
1b	81.7%	78.1%
2	88.5%	85.2%
3	96.2%	94.8%
4	30.6%	27.7%
Grand Total	69.9%	65.9%

Summary of data by district and facility level ('available')

…nt name)

tbhallett · 2024-05-16T15:01:54Z

Thanks so much for this @sakshimohan

I've just done a quick change to update (ever so slightly) the name of the resulting file.

I've tried running the script, equipment_availability_estimation.py but get an error on line 99: TypeError: '>=' not supported between instances of 'float' and 'str'.

I think this comes from the aggregation using max on the groupby object. It could be the columns that are strings need to be coerced into floats.

Does it run cleanly from top-to-bottom on your machine?

… most items

sakshimohan · 2024-05-16T15:13:20Z

Hi @tbhallett. Thanks for looking at this. The script did run cleanly for me but you're right that there were two string value columns (calibrated and prepared) which were not really passing through the aggregate max command. I have removed these from the script altogether because there is very limited data on these for these to be of value (I was already not extracting them into the resource file). Hopefully the script should run on your system as well now?

I should note that I'm still using python 3.10 not 3.11. Will update this soon!

tbhallett · 2024-05-17T07:56:07Z

Hi @tbhallett. Thanks for looking at this. The script did run cleanly for me but you're right that there were two string value columns (calibrated and prepared) which were not really passing through the aggregate max command. I have removed these from the script altogether because there is very limited data on these for these to be of value (I was already not extracting them into the resource file). Hopefully the script should run on your system as well now?

I should note that I'm still using python 3.10 not 3.11. Will update this soon!

Thanks @sakshimohan. Now runs perfectly on my machine too.

…comes in the correct format (row for every item and facility_id)

…/estimate_equipment_availability

…it produces estimate for every item/facility using its own interplation method. Update ResourceFile that results

…ability

tbhallett · 2024-05-19T12:40:30Z

Thanks again for this @sakshimohan.

I've added some stuff onto the end of your script that

formats exported dataset into what is required for reading into the Equipment class
does the extrapolation to all item_codes and facilities (Rather than having this hidden in the Equipment class).

I'll add some comments onto the edits about both these points.

src/tlo/methods/equipment.py

tbhallett · 2024-05-19T12:42:25Z

tests/test_equipment.py

Remove checks on handling of item_codes/facility_id that are not in the availability data (we now insist that they all are).

tbhallett · 2024-05-19T12:42:39Z

.../healthsystem/infrastructure_and_equipment/ResourceFile_Equipment_Availability_Estimates.csv

recreated file from update script

src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py

- interpolate within facility ID and equipment category (cost <=$1000, cost > $1000)

src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py

src/tlo/methods/equipment.py

sakshimohan · 2024-05-20T14:36:30Z

@tbhallett This should be ready to merge now.

tbhallett · 2024-05-21T07:35:12Z

src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py

+
+# Remaining missing data are for items_codes in facility_ids for which there is no information at all in the facility_level
+# Impute the availability for these items in facilities as ... the average of other items available at other facilities
+final_equipment_availability_export_full = final_equipment_availability_export_full.groupby("Item_Code").transform(lambda x: x.fillna(x.mean()))


I think this average should be computed from across facility levels 0-3. (Otherwise it gets dragged down by the zeros at level 5).

I'll make that last change

That's a good point. Thanks, Tim. Just as an FYI, there isn't any data being imputed by this line of code as the series is complete at this stage. But I have deleted the imputation in case any new source of data requires this step.

* refactor the merging-in of equipment cost category, to avoid changing the dataframe (keeping the categories in separate series) * remove the unnecessary final check of interpolating just on item_code

add script to generate equipment availability resource file

ba64949

sakshimohan marked this pull request as ready for review May 7, 2024 18:20

sakshimohan marked this pull request as draft May 7, 2024 18:24

sakshimohan requested a review from tbhallett May 7, 2024 18:24

sakshimohan assigned sakshimohan and nchagoma503 May 8, 2024

sm2511 added 3 commits May 14, 2024 15:34

reshape data

91e4d07

extrapolate values from other survey responses

c59fdea

change source from .csv to .xlsx

92ff2e2

tbhallett added this to In progress in PR priorities via automation May 16, 2024

sm2511 added 6 commits May 16, 2024 12:42

merge with Equipment codes from the TlO model

185e537

merge with facility data from HHFA

d1593bb

match district names to TLo model

7c61e7c

Generate Resource File

0f7c46c

add description to script

229054f

Drop index column

6a5f11e

tbhallett marked this pull request as ready for review May 16, 2024 13:41

tbhallett moved this from In progress to Ready for EM review in PR priorities May 16, 2024

tbhallett added 3 commits May 16, 2024 14:41

Merge branch 'master' into sakshi/estimate_equipment_availability

7a4fa3a

update name of target file

ea33199

update content of target file (and remove file generated with differe…

f8444a9

…nt name)

remove 'calibrated' and 'prepared' columns as this is not capture for…

52f7d0f

… most items

do not do interpolation in the Equipment class: demand that the file …

1768b0f

…comes in the correct format (row for every item and facility_id)

tbhallett added 3 commits May 17, 2024 15:57

Merge remote-tracking branch 'refs/remotes/origin/master' into sakshi…

5277d72

…/estimate_equipment_availability

put interpolation of missing items/facility_ids into script, so that …

2bba29b

…it produces estimate for every item/facility using its own interplation method. Update ResourceFile that results

Merge branch 'refs/heads/master' into sakshi/estimate_equipment_avail…

78a5e7c

…ability

tbhallett reviewed May 19, 2024

View reviewed changes

tbhallett and others added 2 commits May 19, 2024 13:50

(whoops) now saving correct version of the dataframe!

85d58fc

Update interpolation method for item_codes not included in the HHFA

b32addf

- interpolate within facility ID and equipment category (cost <=$1000, cost > $1000)

sakshimohan commented May 20, 2024

View reviewed changes

src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py Outdated Show resolved Hide resolved

sakshimohan commented May 20, 2024

View reviewed changes

src/tlo/methods/equipment.py Show resolved Hide resolved

force Pr_available = 0 for level 5

26a231e

tbhallett reviewed May 21, 2024

View reviewed changes

tbhallett added 3 commits May 21, 2024 11:23

linting

11ab9dc

simplify final steps:

365c98e

* refactor the merging-in of equipment cost category, to avoid changing the dataframe (keeping the categories in separate series) * remove the unnecessary final check of interpolating just on item_code

more linting

215045f

tbhallett approved these changes May 21, 2024

View reviewed changes

tbhallett merged commit 54c3eb1 into master May 22, 2024
57 checks passed

tbhallett deleted the sakshi/estimate_equipment_availability branch May 22, 2024 06:35

tbhallett moved this from Ready for EM review to Done in PR priorities May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate and add equipment availability data #1329

Generate and add equipment availability data #1329

sakshimohan commented May 7, 2024 •

edited by tbhallett

sakshimohan commented May 7, 2024

tbhallett commented May 8, 2024

tbhallett commented May 8, 2024

sakshimohan commented May 16, 2024

tbhallett commented May 16, 2024 •

edited

sakshimohan commented May 16, 2024 •

edited

tbhallett commented May 17, 2024

tbhallett commented May 19, 2024

tbhallett May 19, 2024

tbhallett May 19, 2024

sakshimohan commented May 20, 2024

tbhallett May 21, 2024

sakshimohan May 21, 2024

Generate and add equipment availability data #1329

Generate and add equipment availability data #1329

Conversation

sakshimohan commented May 7, 2024 • edited by tbhallett

sakshimohan commented May 7, 2024

tbhallett commented May 8, 2024

tbhallett commented May 8, 2024

sakshimohan commented May 16, 2024

tbhallett commented May 16, 2024 • edited

sakshimohan commented May 16, 2024 • edited

tbhallett commented May 17, 2024

tbhallett commented May 19, 2024

tbhallett May 19, 2024

Choose a reason for hiding this comment

tbhallett May 19, 2024

Choose a reason for hiding this comment

sakshimohan commented May 20, 2024

tbhallett May 21, 2024

Choose a reason for hiding this comment

sakshimohan May 21, 2024

Choose a reason for hiding this comment

sakshimohan commented May 7, 2024 •

edited by tbhallett

tbhallett commented May 16, 2024 •

edited

sakshimohan commented May 16, 2024 •

edited