Skip to content

core-skills/04-getting-to-know-the-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORE Skills Data Science Springboard - Day 4 - Getting to Know the Tools

Binder

The aim of today's session will be to introduce methods to make sure that you're starting with quality data. As all data science methods are garbage in/garbage out you need to make sure you can explore new datasets quickly to assess whether your approach is viable. We will work towards building a basic exploratory data analysis framework with a checklist of things you should be looking out for.

You should aim to get familiar with pandas interface for manipulating (munging) tabular data, learn how to create and interpret basic summary statistics, how to identify appropriate QA/QC, and have a basic understanding of 'tidy data' and data formats.

Pre-session Reading & Resources

This week we're going to be looking at exploratory data with new datasets. You'll find that this process takes around 60-90% of any data science project so it's worth (a) getting good at it and (b) looking at ways to make this process easier. One approach is to put data in a tidy form as soon as possible.

We're also going to be using some more advanced methods that pandas offers - you should aim to get as familiar with these as you can as it really is the swiss-army knife of data munging in Python. If you've used R before there will be a number of things that feel very familiar. There are a number of good technical resources online for getting your head around pandas which you might like to stack away for reference:

About

Day 4 - Getting to Know the Tools: Data Munging and Exploratory Data Analysis

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •