Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try flow hmc in covtype dataset #277

Open
wants to merge 51 commits into
base: master
Choose a base branch
from

Conversation

fehiepsi
Copy link
Member

@fehiepsi fehiepsi commented Aug 8, 2019

Resolves #417. This PR tracks the progress of using flow hmc in covtype dataset.

Problem setting

  • Data is randomly split into train set with 400,000 data points and test set with 181,012 data points (about 31%). Each data point has 55 features, which is normalized.
  • 400,000 data points are divided into 40 shards, each has 10,000 data points.
  • Using logistic regression model

Some observations

  • With full training data, NUTS takes 5s to give 1 sample in GPU and 30s in CPU (for 1 device). Hence it is infeasible to train NUTS in CPU. In GPU, it will take a day to get 10000 samples, so I would defer this work to later. Anyway, this speed is 200x faster than the speed in embarassing parallel paper (which used Stan 2013 and took 15minutes for 1 sample in CPU - for now, Stan might took 2-3minutes to generate 1 sample (estimated based on edward paper). To my knowledge, our NUTS implementation using JAX is the fastest one for this dataset.
  • Using subposterior method, I can get all subposteriors (4 chains of 2500 samples for each of 40 shards) for just more than 1 hour in CPU (with 4 cores). This shows a huge benefit of subposterior methods. I wish I can have 40 CPU cores and get all subposteriors in just 10minutes. :D
  • Both consensus and parametric methods give 77.1% accuracy on the testset. This is better than the result in embarrassing paper (about 75.5% accuracy). Another benchmark result which I can find is with libsvm, where it also gets 77.1% accuracy. Maybe to be fair, it is better to just compare HMC / NeutraHMC / ParallelHMC / FlowHMC.
  • The code is so simple to write, predicting using vmap is so fast and convenient,...
  • I don't expect flow hmc will give better mixing rate for this dataset but it might be helpful for merging subposteriors (using consensus/parametric). Though our caching mechanism will help a lot, I hope that IAF transform will not add much overhead for running MCMC.

Tasks

  • Get subposteriors and report the result: 77.1% accuracy
  • Train AutoIAFNormal
  • Get Neutra samples and compare the result. FIXME using iaf transform in hmc makes NUTS pretty slow. It took 10 minutes to get samples from 1 shard. This is so slow comparing to using vanilla HMC. My last bet is to implement bnaf to see if it helps.
  • Run NUTS (in GPU) with full training dataset and compare the result
  • Organize the results in a joined notebook, remove summary tables, add some additional metrics such as cross entropy loss.

@fehiepsi fehiepsi added the WIP label Aug 8, 2019
@martinjankowiak
Copy link
Collaborator

@fehiepsi this is awesome, such compact elegant code!

some suggestions for future iterations:

  • also compute test LLs
  • it might be nice to plot some of the subposterior sufficient statistics in some histograms. e.g. coefs[21] seems to be hitting two different modes in the different subposteriors. similarily with coefs[28], which seems to revert to the prior in some subposteriors but not others
  • it'd be interesting to see what happens when you make the logistic regression a small bayesian neutral network, e.g. 55 -> 5 -> 1 instead of 55 -> 1
  • it'd be nice to see how results vary with the number of shards
  • when you do subposteriors + flows it'd be interesting to see a comparison between doing the merging in the warped space (which we expect to be more normal) versus doing the merging in the unwarped space

@fehiepsi
Copy link
Member Author

fehiepsi commented Aug 8, 2019

Thanks a lot @martinjankowiak ! All your suggestions are reasonable and wouldn't take much effort to incorporate. I'll address them after finishing the tasks. :)

@fehiepsi
Copy link
Member Author

Though through this experiment, we can see that using FlowHMC significantly improves ESS/s of parallel methods, we will attempt to have a small example instead. This is more or less research work so I would like to close for now.

@fehiepsi fehiepsi closed this Jan 14, 2020
@fehiepsi fehiepsi reopened this Sep 1, 2020
@fehiepsi fehiepsi added this to the 0.5.1 milestone Jan 16, 2021
@fehiepsi fehiepsi mentioned this pull request Mar 1, 2021
9 tasks
@fehiepsi fehiepsi modified the milestones: 0.5.1, 0.7 Mar 7, 2021
@fehiepsi fehiepsi removed this from the 0.7 milestone Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tutorial or example on embarrassingly parallel/consensus MCMC
2 participants