Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running your benchmarks from beginning to end #35

Open
vinhdizzo opened this issue Feb 29, 2016 · 1 comment
Open

running your benchmarks from beginning to end #35

vinhdizzo opened this issue Feb 29, 2016 · 1 comment

Comments

@vinhdizzo
Copy link

Hey Szilard,

I'd like to replicate your code from beginning to end perhaps on Google Compute Engine (GCE), mainly to test out GCE with Vagrant. Do you know have a sense of how long the entire process would take assuming a similar server size as what you used on EC2?

Is there a convenient way to run all your scripts in from folder 0 to 4? That is, is there a master script that executes them all?

I notice that the results are written out to the console. Do you have a script that scrapes all the AUC's for your comparison analysis?

Thanks!

@szilard
Copy link
Owner

szilard commented Feb 29, 2016

Hi Vinh:

That would be great. I'm a big fan of reproducible data analysis/research and it would be nice to have this project in a fully automated format (installation, run, presentation of results etc.). This project grew very organically and I spent a lot of time with experimentation, many iterations etc. therefore I did not want to invest time in making it fully automated/reproducible, but if you want to take on the task, I'll be happy to help a bit.

To answer your questions:

  1. Do you know have a sense of how long the entire process would take assuming a similar server size as what you used on EC2?

Dunno, the runs depend on the tool/algo, but based on my results maybe you can now take a step back and prioritize/simplify etc.

  1. Is there a convenient way to run all your scripts in from folder 0 to 4? That is, is there a master script that executes them all?

No, though the scripts run out of the box, no weird configs etc.

  1. I notice that the results are written out to the console. Do you have a script that scrapes all the AUC's for your comparison analysis?

No, but it would probably not be difficult for you to log the results in a file.

On the other hand the repo contains all the code needed to get the results and the code base is relatively small (since it uses mostly high level APIs)

I've seen several projects that automated some simple benchmarks of their own ML tool, but unfortunately almost everyone is focusing on their own tool only. A fully automated benchmark for various tools (maybe similar to the famous TPC benchmarks in the SQL world) would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants