Data Science Wrangling

HarvardX: PH125.6x | Data Science: Wrangling

Abstract

This is the sixth course in the HarvardX Professional Certificate in Data Science, a series of courses that prepare you to do data analysis in R, from simple computations to machine learning. We assume that you have taken the preceding five courses in the series or have equivalent knowledge of R programming. We recommend that you complete the first five courses in the series (Data Science: R Basics, Data Science: Visualization, Data Science: Probability, Data Science: Inference and Modeling, and Data Science: Productivity Tools) before taking this course.

Using a combination of guided introduction through short video lectures and more independent in-depth exploration, you will get to practice your new R skills on real-life applications.

In this course, we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

In a data science project, data are often not easily accessible. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form are called data wrangling.

The class notes for this course series can be found in Professor Irizarry's freely available Introduction to Data Science book. The textbook is also freely available in PDF format on Leanpub. This course corresponds to textbook Chapter 20 through Chapter 26.

The bookdown-version of this course is available on this Github Page

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
bookdown		bookdown
files		files
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
Cheat Sheets.md		Cheat Sheets.md
Data_Science_Wrangling.Rmd		Data_Science_Wrangling.Rmd
Data_Science_Wrangling.Rproj		Data_Science_Wrangling.Rproj
Data_Science_Wrangling.docx		Data_Science_Wrangling.docx
Data_Science_Wrangling.epub		Data_Science_Wrangling.epub
Data_Science_Wrangling.html		Data_Science_Wrangling.html
Data_Science_Wrangling.pdf		Data_Science_Wrangling.pdf
README.md		README.md
murders.csv		murders.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bookdown

bookdown

files

files

images

images

.DS_Store

.DS_Store

.gitignore

.gitignore

Cheat Sheets.md

Cheat Sheets.md