Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add starter Data Science Project #133

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

dmitrypolo
Copy link
Contributor

One of the other things that @isms mentioned was the ability to include a starter project for users who are just getting started out. There is now an option for that, that was added in the cookiecutter.json. When creating a new project it will prompt you for that option. If you decide to include the starter project it will have code in the pre-defined locations, specifically the src/data and src/models directories. The Makefile also includes an option to make the whole data pipeline from end to end to be more robust. This will grab data from a URL, train it, split it, fit the data to a model, and pickle the model. Lastly it makes predictions and displays them to the user. Included in the Data Science starter project is also unit tests which revolve around testing the actual logic of the functions to give a beginner an idea of how to write unit tests involving patching and fixtures. Similarly if a user decides to opt out of the starter project all those files with code are emptied out and returned blank. There is code in the hooks directory which accomplishes this post gen. Please let me know if you have any questions or feedback, thanks!

@isms
Copy link
Contributor

isms commented Jul 31, 2018

@dmitrypolo Any objection to swapping out the data set for a different one?

@isms isms closed this Jul 31, 2018
@isms isms reopened this Jul 31, 2018
@dmitrypolo
Copy link
Contributor Author

@isms no objections, do you have anything specific in mind?

@isms
Copy link
Contributor

isms commented Jul 31, 2018

Since iris is pretty played out, how about the UC Irvine ML blood donations? It mirrors our blood donations competition.

@dmitrypolo
Copy link
Contributor Author

I will reconvene with the team and get back to you shortly, thanks!

@isms
Copy link
Contributor

isms commented Aug 6, 2018

@dmitrypolo @johnkarlen Is it this one or #135 that we should be looking at? Would love to get this merged this week!

@dmitrypolo
Copy link
Contributor Author

@isms this one, I will adjust some stuff based on your other comments, and finish modifying tests for the new dataset, will get back to shortly

@dmitrypolo
Copy link
Contributor Author

@isms please review and let me know

@mattarderne
Copy link

This is really useful and should be included, very useful to see how the template is intended to be used

@joel-aws
Copy link

This is really useful and should be included, very useful to see how the template is intended to be used

Agreed. I heavily referenced this PR ~2 years ago to figure out best practices for using the CC template... It would still be super useful to have it merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants