Skip to content

nihonjinrxs/dwdc-june2014

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Wrangling in SQL & Other Tools

Scripting Reproducible and Understandable Data Wrangling and Analysis Pipelines with Tabular and Relational Data

This repository contains materials for my talk at the Data Wranglers DC meetup on June 4, 2014.

Contents

The talk consists of several major directions:

  • A slide deck (./slides) in Apple Keynote, PDF and HTML formats
  • Sample data in CSV format (./csv), courtesy of tilling
  • A set of SQL scripts (./sql) that create the local PostgreSQL database used for the examples and perform the simple linear model analysis example
  • An RMarkdown document (./R), published on RPubs, that uses the data from the database to perform the analysis in R and compare with the SQL results
  • An iPython notebook document (./python) that uses the data from the database to perform the example analysis, compare the results across SQL and R, and plot the resulting linear models

Where do I start?

I recommend that anyone wishing to understand what I've done should tackle these pieces in order, starting with the slide deck.

Future Work

Given time and maturity of database libraries, I hope to add a parallel example in Julia soon.

Disclaimer

This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.

About

Slides and materials from my talk for Data Wranglers DC on June 4, 2014

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published