Skip to content

learn-co-students/dsc-3-final-project-online-ds-sp-000

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Module 3 Final Project

Introduction

In this lesson, we'll review all the guidelines and specifications for the final project for Module 3.

Objectives

  • Understand all required aspects of the Final Project for Module 3
  • Understand all required deliverables
  • Understand what constitutes a successful project

Final Project Summary

Congratulations! You've made it through another intense module, and now you're ready to show off your newfound Machine Learning skills!

All that remains for Module 3 is to complete the final project!

The Project

For this project, you're going to select a dataset of your choosing and create a classification model. You'll start by identifying a problem you can solve with classification, and then identify a dataset. You'll then use everything you've learned about Data Science and Machine Learning thus far to source a dataset, preprocess and explore it, and then build and interpret a classification model that answers your chosen question.

Selecting a Data Set

We encourage you to be very thoughtful when identifying your problem and selecting your data set--an overscoped project goal or a poor data set can quickly bring an otherwise promising project to a grinding halt.

To help you select an appropriate data set for this project, we've set some guidelines:

  1. Your dataset should work for classification. The classification task can be either binary or multi-categorical, as long as it's a classification model.

  2. Your dataset needs to be of sufficient complexity. Try to avoid picking an overly simple dataset. We want to see all the steps of the Data Science Process in this project--it's okay if the dataset is mostly clean, but we expect to see some preprocessing and exploration. See the following section, Data Set Constraints, for more information on this.

  3. On the other end of the spectrum, don't pick a problem that's too complex, either. Stick to problems that you have a clear idea of how you can use machine learning to solve it. For now, we recommend you stay away from overly complex problems in the domains of Natural Language Processing or Computer Vision--although those domains make use of Supervised Learning, they come with a lot of other special requirements and techniques that you don't know yet (but you'll learn soon!). If you're chosen problem feels like you've overscoped, then it probably is. If you aren't sure if your problem scope is appropriate, double check with your instructor!

  4. Serious Bonus Points if some or all of the data is data you have to source yourself through web scraping or interacting with a 3rd party API! Having projects that show off your ability to source data effectively make you look that much more impressive when showing your work off to potential employers!

Data Set Constraints

When selecting a data set, be sure to take into consideration the following constraints:

  1. Your data set can't be one we've already worked with in any labs.
  2. Your data set should contain a minimum of 1000 rows.
  3. Your data set should contain a minimum of 10 predictor columns, before any one-hot encoding is performed.
  4. Your instructor must provide final approval on your data set.

Problem First, or Data First?

There are two ways that you can about getting started: Problem-First or Data-First.

Problem-First: Start with a problem that you want to solve with classification, and then try to find the data you need to solve it. If you can't find any data to solve your problem, then you should pick another problem.

Data-First: Take a look at some of the most popular internet repositories of cool data sets we've listed below. If you find a data set that's particularly interesting for you, then it's totally okay to build your problem around that data set.

There are plenty of amazing places that you can get your data from. We recommend you start looking at data sets in some of these resources first:

The Deliverables

There will be four deliverables for this project:

  1. A well documented Jupyter Notebook containing any code you've written for this project and comments explaining it. This work will need to be pushed to your GitHub repository in order to submit your project.

  2. A short Keynote/PowerPoint/Google Slides presentation (delivered as a PDF export) that gives a brief overview of your problem/dataset, and each step of the OSEMN process. Make sure to also add and commit this pdf of your non-technical presentation to your repository with a file name of presentation.pdf.

  3. A blog post (800-1500 words) about one element of the project - it could be the EDA, the feature selection, the choice of visualizations or anything else technical relating to the project. It should be targeted at your peers - aspiring data scientists.

  4. A Video Walkthrough of your non-technical presentation. Some common video recording tools used are Zoom, Quicktime, and Nimbus. After you record your presentation, publish it on a service like YouTube or Google Drive, you will need a link to the video to submit your project.

Jupyter Notebook Must-Haves

For this project, your jupyter notebook should meet the following specifications:

Organization/Code Cleanliness

  • The notebook should be well organized, easy to follow, and code is commented where appropriate.
    • Level Up: The notebook contains well-formatted, professional looking markdown cells explaining any substantial code. All functions have docstrings that act as professional-quality documentation.
  • The notebook is written to technical audiences with a way to both understand your approach and reproduce your results. The target audience for this deliverable is other data scientists looking to validate your findings.

Process, Methodology, and Findings

  • Your notebook should contain a clear record of your process and methodology for exploring and preprocessing your data, building and tuning a model, and interpreting your results.
  • We recommend you use the OSEMN process to help organize your thoughts and stay on track.

Blog Post Must-Haves

Your blog post should clearly explain your process and results, including:

  • An explanation of the problem you're trying to solve and the dataset you choose for it
  • Well documented examples of code and visualizations (when appropriate)

NOTE: This blog post is your way of showcasing the work you've done on this project--chances are it will soon be read by a recruiter or hiring manager! Take the time to make sure that you craft your story well, and clearly explain your process and findings in a way that clearly shows both your technical expertise and your ability to communicate your results!

Submitting your Project

You’re almost done! In order to submit your project for review, include the following links to your work in the corresponding fields on the right-hand side of Learn.

  1. GitHub Repo: Now that you’ve completed your project in Jupyter Notebooks, push your work to GitHub and paste that link to the right. (If you need help doing so, review the resources here.) Reminder: Make sure to also add and commit a pdf of your non-technical presentation to the repository with a file name of presentation.pdf.
  2. Blog Post: Include a link to your blog post.
  3. Record Walkthrough: Include a link to your video walkthrough.

Hit "I'm done" to wrap it up. You will receive an email in order to schedule your review with your instructor.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •