Skip to content

BoXHED/BoXHED2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BoXHED2.0

Boosted eXact Hazard Estimator with Dynamic covariates v2.0 (BoXHED2.0, pronounced 'box-head') is a software package for nonparametrically estimating hazard functions via gradient boosted trees. BoXHED2.0 accommodates both time-static and time-dependent covariates.

Please refer to Pakbin et al. (2023) for details, which is a major extension of BoXHED1.0 (Wang et al. 2020). The theoretical underpinnings for BoXHED is provided in Lee, Chen, Ishwaran (2021).

What’s new (over BoXHED1.0):

  • Allows for survival data beyond right censoring, including recurrent events, and cause-specific hazards in competing risks settings
  • Significant speedup from data preprocessing and C++ codebase
  • Multicore CPU and GPU support
  • Integrated TreeSHAP support for interpretable explanations of estimated log-hazard values

Suggested citations

Prerequisites

The software was developed and tested in Linux, Mac OS, and Windows10 environments. The requirements are the following:

Setting up BoXHED2.0

  1. [Windows users only] Install Visual Studio 17 2022 toolset. During installation, under the "Workloads" tab select "Desktop Development with C++" in the "Desktop and Mobile" section. Make the following selections in the menu on the right: sc__

  2. Set up a dedicated virtual environment for BoXHED2.0. This ensures that BoXHED2.0 will not interfere with any existing XGBoost packages. This implementation uses python 3.8. In this example we use Anaconda Prompt to open a terminal. First, create a virtual environment called boxhed2:

conda create -n boxhed2 python=3.8

then activate it

conda activate boxhed2
  1. Install the version dependencies by pasting the following lines into your terminal:
pip install matplotlib==3.7.1
pip install pillow==9.4.0
pip install numpy==1.24.3
pip install scikit-learn==1.2.2
pip install pytz==2023.3
pip install pandas==1.5.3
pip install cmake==3.26.3
pip install py3nvml==0.2.7
pip install tqdm==4.65.0
pip install threadpoolctl==3.1.0
pip install scipy==1.10.1
pip install joblib==1.2.0
pip install chardet==5.2.0
pip install slicer==0.0.7
pip install numba==0.57.1
pip install cloudpickle==2.2.1
pip install --force-reinstall --upgrade python-dateutil
pip install jupyter

If there are any issues with the pip installation for any of the packages above, you can use conda install to install them instead.

  1. [Mac users only] Install OpenMP 11.1.0 to enable multithreaded CPU operation:
wget https://raw.githubusercontent.com/chenrui333/homebrew-core/0094d1513ce9e2e85e07443b8b5930ad298aad91/Formula/libomp.rb
brew unlink libomp
brew install --build-from-source ./libomp.rb

Without OpenMP, BoXHED2.0 will only use a single CPU core, which slows down training and fitting. Also, if OpenMP is not present, setting the variable nthread in the tutorial to a value other than 1 may result in a runtime error.

  1. Download one of the following pre-built zipped packages for your operating system:

and place the unzipped contents into the directory returned by the following command:

python -c "import sys; site_packages = next(p for p in sys.path if all([k in p for k in ['boxhed2', 'site-packages']])); print('\n'*2); print(site_packages); print('\n'*2)"

For example, the command line above may return the following directory:

/home/grads/d/j.doe/anaconda3/envs/boxhed2/lib/python3.8/site-packages/

After placing the unzipped contents into this directory, the following folders should exist:

/home/grads/d/j.doe/anaconda3/envs/boxhed2/lib/python3.8/site-packages/boxhed/
/home/grads/d/j.doe/anaconda3/envs/boxhed2/lib/python3.8/site-packages/boxhed_kernel/
/home/grads/d/j.doe/anaconda3/envs/boxhed2/lib/python3.8/site-packages/boxhed_prep/
/home/grads/d/j.doe/anaconda3/envs/boxhed2/lib/python3.8/site-packages/boxhed_shap/
  1. Download the files in this repository and put them in a directory called BoXHED2.0. Then go to the directory:
cd BoXHED2.0
  1. Run BoXHED2_tutorial.ipynb for a demonstration of how to fit a BoXHED hazard estimator:
jupyter notebook BoXHED2_tutorial.ipynb

For Mac users, Apple's security system may complain about the precompiled components of BoXHED2.0. In that case, the instructions on this page will be helpful.