Skip to content

Latest commit

 

History

History
320 lines (161 loc) · 9.38 KB

links.md

File metadata and controls

320 lines (161 loc) · 9.38 KB

Links

Preface

Data Science Venn Diagram

1. Introduction

OkCupid Questions

Facebook on coordinated migration

Facebook on NFL fandom

Target's predictive modeling

Making government more effective

Helping homelessness

Improving public health

2. A Crash Course in Python

http://python.org

Anaconda

pip

IPython

the Zen of Python

official Python tutorial

official IPython tutorial

IPython videos and presentations

Python for Data Analysis

3. Visualizing Data

matplotlib

seaborn

D3.js

Bokeh

ggplot

4. Linear Algebra

Linear Algebra, from UC Davis

Linear Algebra, from Saint Michael's College

Linear Algebra Done Wrong

SciPy linear algebra module

5. Statistics

Non-obvious tricks for computing medians

Almost "average squared deviation from the mean"

"angrily accused of experimenting on your users"

SciPy stats

pandas

StatsModels

OpenIntro Statistics

OpenStax Introductory Statistics

6. Probability

the Monty Hall Problem

error function

binary search

SciPy stats

Introduction to Probability

7. Hypothesis and Inference

continuity correction

P-hacking

"The Earth Is Round (p < .05)"

conjugate priors

Coursera -- Data Analysis and Statistical Inference

8. Gradient Descent

Active Calculus

scikit-learn stochastic gradient descent

9. Getting Data

running Python scripts without the Python command

opening csv files in binary mode

BeautifulSoup

requests

GitHub API

http://www.pythonapi.com/

http://www.pythonforbeginners.com/development/list-of-python-apis/

http://www.programmableweb.com/

Twython

https://apps.twitter.com/

Twitter Search API

unicode

Twitter Streaming API

scrapy

pandas

10. Working With Data

pandas

Python for Data Analysis

scikit-learn matrix decomposition

11. Machine Learning

prevalence of "Luke"

prevalence of leukemia

harmonic mean

Coursera -- Machine Learning

Caltech -- Machine Learning

The Elements of Statistical Learning

12. Nearest Neighbors

the length represented by a degree of longitude

scikit-learn nearest neighbor models

13. Naive Bayes

SpamAssassin public corpus

7-Zip

the Porter stemmer

"A Plan for Spam"

"Better Bayesian Filtering"

scikit-learn Naive Bayes

14. Simple Linear Regression

15. Multiple Regression

scikit-learn linear model

StatsModels

16. Logistic Regression

scikit-learn logistic regression

scikit-learn support vector machines

libsvm

17. Decision Trees

Twenty Questions

scikit-learn decision trees

scikit-learn ensembles

http://en.wikipedia.org/wiki/Decision_tree_learning

18. Neural Networks

Coursera -- Neural Networks for Machine Learning

Neural Networks and Deep Learning

pybrain

19. Clustering

RGB color model

SciPy

20. Natural Language Processing

"What is Data Science"

Natural Language Toolkit

NLTK book

gensim

21. Network Analysis

Centrality

NetworkX

Gephi

22. Recommender Systems

Crab

Graphlab recommender toolkit

Netflix prize

23. Databases

SQLite

MySQL

PostgreSQL

MongoDB

NoSQL

24. Map-Reduce

Hadoop

Elastic MapReduce

mrjob

Spark

Storm

25. Go Forth And Do Data Science

IPython

NumPy

pandas

scikit-learn

many, many scikit-learn examples

matplotlib examples

matplotlib gallery

seaborn

D3.js

D3 gallery

Bokeh

Data.gov

r/datasets and r/data

Amazon public data sets

100 Interesting Data Sets

Kaggle

Hacker News

Hacker News Story Classifier

Seattle Real-Time 911

social network analysis of fire trucks

machine learning on t-shirts