Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a set of benchmark dataset #127

Open
Optimox opened this issue Jun 4, 2020 · 5 comments
Open

Create a set of benchmark dataset #127

Optimox opened this issue Jun 4, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@Optimox
Copy link
Collaborator

Optimox commented Jun 4, 2020

Feature request

I created some Research Issues that would be interesting to work on. But it's hard to tell if an idea is a good idea without having a clear benchmark on different dataset.

So it would be great to have a few notebooks that could run on different datasets in order to monitor performances uplift of a new implementation.

What is the expected behavior?
The idea would be to run this for each improvement proposal and see whether it helped or not.

How should this be implemented in your opinion?
This issue could be closed little by little by adding new notebooks that each perform a benchmark on one well known dataset.

Or maybe it's a better a idea to incorporate tabnet to existing benchmarks like Catboost Benchmark : https://github.com/catboost/benchmarks

Are you willing to work on this yourself?
yes of course, but any help would be appreciated!

@Optimox Optimox added the enhancement New feature or request label Jun 4, 2020
@ddofer
Copy link

ddofer commented Jul 25, 2020

I can help with this. It'd be best to run this with a testing framework, to allow for the tests or CI to check if changes to the models/code (e.g. defaults or improvements) break/reduce performance

@athewsey
Copy link
Contributor

Anecdotally, I recently noticed a drop in accuracy (or maybe convergence speed) on Forest Cover Type when upgrading version of PyTorch... Would be interested to see whether others experience same and understand whether there's some issue that needs addressing or it's just a statistical variation.

Stopping at 200 epochs, observed test accuracy of:

  • 95.584% on PyTorch v1.4.0
  • 88.128% on PyTorch v1.5.1
  • 92.840% on PyTorch v1.6.0

@Optimox
Copy link
Collaborator Author

Optimox commented Aug 12, 2020

@athewsey thanks for reporting that, it seems quite a lot for just changing the torch version. Have you been experimenting this on the latest release just by changing the pytorch version? I understand that random seeds could change from one version to another, but after 200 epochs there should not be such a gap.

@Hartorn @eduardocarvp did you notice such strong changes when monitoring tabnet scores?

@athewsey
Copy link
Contributor

@Optimox those figures were I believe all using develop code as of my recent PR #164. Taking a random 80/10/10 Training/Validation/Test split of Forest Cover Type, and just trying the different PyTorch framework versions via AWS' provided deep learning container images on SageMaker - so all with Python 3.6, Ubuntu 16.04, and (if I interpret the container versioning correctly) CUDA 10.1... But there's a chance there are some small, relevant library differences between them. All the training was run on an ml.p3.2xlarge instance, so backed by 1xV100 GPU.

Appreciate the library versions aren't as controlled as they could be between tests, and will try to re-run on a fully controlled/local env with only PyTorch different if possible - but it's tricky as my current workflow is mostly set up for using those pre-built images. Just thought it was worth mentioning for consideration in this ticket's priority, and as I hadn't seen discussion about cross-version benchmarking/accuracy checks elsewhere on the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants