This repository contains materials for my talk at the Data Wranglers DC meetup on August 6, 2014, which is a follow-on to my talk at the Data Wranglers DC meetup on June 4, 2014. Materials for the prior talk are in the GitHub Repo nihonjinrxs/dwdc-june2014.
The talk consists of two major directions:
- Using more advanced SQL techniques in a database system (examples in PostgreSQL) to script auto-updating computations
- Using SQL on data frames in R and in Python (also maybe Julia?)
Folders are as follows:
- A slide deck (
./slides
) in Apple Keynote, PDF and HTML formats - A set of SQL scripts (
./sql
) that create the local PostgreSQL database objects demonstrating creation and use of views, custom functions and indexes for use in data analysis - An RMarkdown document (
./R
), published on RPubs, that demonstrates usingsqldf
in R to perform SQL queries on data frames as if they are tables - An IPython notebook (
./python
), available at IPython nbviewer, that demonstrates usingsqldf
from thepandasql
package to perform SQL queries on Pandas DataFrame objects as if they are tables - An IJulia notebook document (
./julia
) that demonstrates usingsqldf
from theSQLite.jl
package in Julia to perform SQL queries on data frames as if they are tables (in progress, and not working yet)
I recommend that anyone wishing to understand what I've done should start with the prior talk materials, then tackle these pieces in order, starting with the slide deck.
Given time, I hope to get sqldf
working in Julia as well - being a young language, it's a little finicky at the moment. Also, a few examples of SQL views with INSERT and UPDATE rules and a SQL trigger or two would be a nice addition.
This work and the opinions expressed here are my own, and do not purport to represent the views of my current or former employers.