Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate and add equipment availability data #1329

Merged
merged 24 commits into from
May 22, 2024

Conversation

sakshimohan
Copy link
Collaborator

@sakshimohan sakshimohan commented May 7, 2024

Fixes #1305

This PR generates availability data on equipment using HHFA 2018-19 and maps it to TLO model equipment codes.

@sakshimohan sakshimohan marked this pull request as ready for review May 7, 2024 18:20
@sakshimohan sakshimohan marked this pull request as draft May 7, 2024 18:24
@sakshimohan sakshimohan requested a review from tbhallett May 7, 2024 18:24
@sakshimohan
Copy link
Collaborator Author

Hi @tbhallett. I'll add the equipment availability resource file to this PR. There are a couple of steps of cleaning the data left. It should be ready by Friday. But the format will be as follows:

<style> </style>
Equipment_code Facility_Level District Proportion of facilities which reported equipment as available Proportion of facilities which reported equipment as available & functional Proportion of facilities which reported equipment as available, functional, and calibrated/prepared
1 1a Lilongwe 0.9 0.85 0.8
1 1b Lilongwe 0.8 0.8 0.8
2 1a Blantyre 0.9 0.9 0.8
2 1b Blantyre 0.7 0.6 0.5
... ... ... ... ... ...

@tbhallett
Copy link
Collaborator

Hi @tbhallett. I'll add the equipment availability resource file to this PR. There are a couple of steps of cleaning the data left. It should be ready by Friday. But the format will be as follows:

<style> </style>

Equipment_code Facility_Level District Proportion of facilities which reported equipment as available Proportion of facilities which reported equipment as available & functional Proportion of facilities which reported equipment as available, functional, and calibrated/prepared
1 1a Lilongwe 0.9 0.85 0.8
1 1b Lilongwe 0.8 0.8 0.8
2 1a Blantyre 0.9 0.9 0.8
2 1b Blantyre 0.7 0.6 0.5
... ... ... ... ... ...

Hi @sakshimohan and @nchagoma503 -- this is going to be brilliant. Thank you so much.

Just one comment.... for the column District, it would be really handy if this could match as well as possible with the District names defined in resources/demography/ResourceFile_PopulationSize_2018Census.csv. These are:

{'Chitipa', 'Karonga', 'Nkhata Bay', 'Rumphi', 'Mzimba', 'Likoma',
'Mzuzu City', 'Kasungu', 'Nkhotakota', 'Ntchisi', 'Dowa', 'Salima',
'Lilongwe', 'Mchinji', 'Dedza', 'Ntcheu', 'Lilongwe City',
'Mangochi', 'Machinga', 'Zomba', 'Chiradzulu', 'Blantyre',
'Mwanza', 'Thyolo', 'Mulanje', 'Phalombe', 'Chikwawa', 'Nsanje',
'Balaka', 'Neno', 'Zomba City', 'Blantyre City'}

@tbhallett
Copy link
Collaborator

Also, please could I suggest that the destination for the ResourceFiles that will be created to be resources/healthsystem/infrastructure_and_equipment

@tbhallett tbhallett added this to In progress in PR priorities via automation May 16, 2024
@sakshimohan
Copy link
Collaborator Author

Hi @tbhallett, @nchagoma503. This PR is now ready for review. The resourcefile has two columns -

  1. 'available' - this represents whether equipment was recorded as available in HHFA
  2. 'functional' - this represents whether equipment was recorded as functional in HHFA
    Data is available for 62 unique Item_code (s)/equipment.

There are some rows with missing data. For example, data on several items of equipment was not recorded at level 0. I've left these as such. I'm not sure whether we can assume that missing data implies missing. For instance, 126 (Biochemistry analyser) is recorded as available in some district which for others there is no data. I'm not sure how to deal with this.

Here is a summary of the data. The trend is as we would expect.

  1. Overall summary ('available', 'functional')
<style> </style>
Facility_level Average of 'available' Average of 'functional'
0 19.4% 16.4%
1a 58.2% 52.9%
1b 81.7% 78.1%
2 88.5% 85.2%
3 96.2% 94.8%
4 30.6% 27.7%
Grand Total 69.9% 65.9%
  1. Summary of data by district and facility level ('available')
    Screenshot 2024-05-16 at 13 53 29

@tbhallett tbhallett marked this pull request as ready for review May 16, 2024 13:41
@tbhallett tbhallett moved this from In progress to Ready for EM review in PR priorities May 16, 2024
@tbhallett
Copy link
Collaborator

tbhallett commented May 16, 2024

Thanks so much for this @sakshimohan

I've just done a quick change to update (ever so slightly) the name of the resulting file.

I've tried running the script, equipment_availability_estimation.py but get an error on line 99: TypeError: '>=' not supported between instances of 'float' and 'str'.

I think this comes from the aggregation using max on the groupby object. It could be the columns that are strings need to be coerced into floats.

Does it run cleanly from top-to-bottom on your machine?

@sakshimohan
Copy link
Collaborator Author

sakshimohan commented May 16, 2024

Hi @tbhallett. Thanks for looking at this. The script did run cleanly for me but you're right that there were two string value columns (calibrated and prepared) which were not really passing through the aggregate max command. I have removed these from the script altogether because there is very limited data on these for these to be of value (I was already not extracting them into the resource file). Hopefully the script should run on your system as well now?

I should note that I'm still using python 3.10 not 3.11. Will update this soon!

@tbhallett
Copy link
Collaborator

Hi @tbhallett. Thanks for looking at this. The script did run cleanly for me but you're right that there were two string value columns (calibrated and prepared) which were not really passing through the aggregate max command. I have removed these from the script altogether because there is very limited data on these for these to be of value (I was already not extracting them into the resource file). Hopefully the script should run on your system as well now?

I should note that I'm still using python 3.10 not 3.11. Will update this soon!

Thanks @sakshimohan. Now runs perfectly on my machine too.

…comes in the correct format (row for every item and facility_id)
…it produces estimate for every item/facility using its own interplation method. Update ResourceFile that results
@tbhallett
Copy link
Collaborator

Thanks again for this @sakshimohan.

I've added some stuff onto the end of your script that

  • formats exported dataset into what is required for reading into the Equipment class
  • does the extrapolation to all item_codes and facilities (Rather than having this hidden in the Equipment class).

I'll add some comments onto the edits about both these points.

src/tlo/methods/equipment.py Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove checks on handling of item_codes/facility_id that are not in the availability data (we now insist that they all are).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recreated file from update script

tbhallett and others added 2 commits May 19, 2024 13:50
- interpolate within facility ID and equipment category (cost <=$1000, cost > $1000)
@sakshimohan
Copy link
Collaborator Author

@tbhallett This should be ready to merge now.


# Remaining missing data are for items_codes in facility_ids for which there is no information at all in the facility_level
# Impute the availability for these items in facilities as ... the average of other items available at other facilities
final_equipment_availability_export_full = final_equipment_availability_export_full.groupby("Item_Code").transform(lambda x: x.fillna(x.mean()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this average should be computed from across facility levels 0-3. (Otherwise it gets dragged down by the zeros at level 5).

I'll make that last change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Thanks, Tim. Just as an FYI, there isn't any data being imputed by this line of code as the series is complete at this stage. But I have deleted the imputation in case any new source of data requires this step.

* refactor the merging-in of equipment cost category, to avoid changing the dataframe (keeping the categories in separate series)
* remove the unnecessary final check of interpolating just on item_code
@tbhallett tbhallett merged commit 54c3eb1 into master May 22, 2024
57 checks passed
@tbhallett tbhallett deleted the sakshi/estimate_equipment_availability branch May 22, 2024 06:35
@tbhallett tbhallett moved this from Ready for EM review to Done in PR priorities May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Equipment Availability Estimates
3 participants