Skip to content

vsoch/hospital-chargemaster-analysis

Repository files navigation

Hospital Chargemaster Analysis

This is a small analysis using the hospital chargemaster Dinosaur Dataset. See the notebook for a quick example that shows there is interesting signal in the data for one hospital. We can build a linear model, perform feature selection with Lasso, and try to predict prices based on descriptor terms.

https://vsoch.github.io/datasets/assets/img/avocado.png

General Goals

For this analysis, we want to try predicting price for a given item (possibly for a given hospital) based on the chargemaster data. For example, I would expect items with the terms "brain" or "heart" to be more expensive than general medications like Advil (ibuprofen).

The approach we will take is to try a simple linear regression. I don't want to do the ultimate analysis, but rather to show you that the data is interesting.

  1. We first start with data from one hospital. This is to keep the data frame size reasonable to share on GitHub, and also speedy to run on my tiny local machine.
  2. We will then do stop word removal and make all terms lowercase.
  3. Then we will create a sparse data frame of words (columns) by the unique identifiers (rows). We can use scikit-learn to create this data frame.
  4. The first model we will train is linear regression (possibly with lasso to get more zero entries).

Given over one hundred hospitals, there are definitely more interesting models to build and things to try! And you need validation. I leave this up to you, dear data scientist.

1. Data Preparation

The data required for the dummy demo is provided in the repository, and here is how I produced them:

git clone https://www.github.com/vsoch/hospital-chargemasters
cd hospital-chargemasters

And use the script 1.prepare-data.py to read in the latest datasets

About

example (fun!) analysis with hospital chargemaster data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published