Skip to content

Latest commit

 

History

History
88 lines (57 loc) · 3.3 KB

data-science.md

File metadata and controls

88 lines (57 loc) · 3.3 KB

Data Science

Motivation

Data science is a sexy job. The salaries are high, the work is interesting, and there’s significant prestige that comes with the title.

A data scientist will:

  • Analyze Data
  • Clean Data using Pandas & Numpy - Gaining insights
  • Build models on data
  • Bill James applied data analysis to baseball

    • Who are the top performers?
    • How can you best predict future performance?
  • Netflix uses data analysis to recommend movies.

data lifecycle

Objectives

Participants will be able to:

  • Create a Jupyter Notebook to begin data analysis
  • Perform exploratory data analysis (EDA)
  • Understand the purpose and methods of cleaning data
  • Understand the methods of analyzing a dataset

Specific Things to Learn

  • Accessing Jupyter Notebooks
  • Importing libraries such as pandas and NumPy into Jupyter Notebooks
  • Techniques for exploratory data analysis (EDA)
  • Identifying missing or erroneous data for possible cleaning
  • Using pandas and NumPy to analyze a dataset

Materials

Lesson

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining.

LifeCycle of Data Science

data lifecycle

  • Tools like Pandas, Numpy, Hadoop, Spark etc comprise an important part of the data science toolbox. It is up to the data scientist to figure out which tool to use in different circumstances (as well as how to use the tool correctly) in order to solve analytically open-ended problems.

Common Mistakes / Misconceptions

  • Access to More Data Translates to Higher Accuracy
  • Data Science and Business Intelligence Are the Same
  • You Must Have Access to Lots of Data

Guided Practice

Independent Practice

Check for Understanding

  • What are the advantages of using NumPy Array?

  • What differentiates data science from other analytical fields (business intelligence, etc)?

  • Assignments:

Supplemental Materials