Data Wrangling in SQL & Other Tools

Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data

This repository contains materials for my talk at the Data Wranglers DC meetup on June 4, 2014.

A slide deck (./slides) in Apple Keynote, PDF and HTML formats
Sample data in CSV format (./csv), courtesy of tilling
A set of SQL scripts (./sql) that create the local PostgreSQL database used for the examples and perform the simple linear model analysis example
An RMarkdown document (./R), published on RPubs, that uses the data from the database to perform the analysis in R and compare with the SQL results
An iPython notebook document (./python) that uses the data from the database to perform the example analysis, compare the results across SQL and R, and plot the resulting linear models

Where do I start?

I recommend that anyone wishing to understand what I've done should tackle these pieces in order, starting with the slide deck.

Future Work

Given time and maturity of database libraries, I hope to add a parallel example in Julia soon.

Disclaimer

This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
R		R
csv		csv
python		python
slides		slides
sql		sql
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

csv

csv

python

python

slides

slides

sql

sql

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Data Wrangling in SQL & Other Tools

Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data

Contents

Where do I start?

Future Work

Disclaimer

About

Releases

Packages

Languages

nihonjinrxs/dwdc-june2014

Folders and files

Latest commit

History

Repository files navigation

Data Wrangling in SQL & Other Tools

Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data

Contents

Where do I start?

Future Work

Disclaimer

About

Resources

Stars

Watchers

Forks

Languages