"Machine Learning Resources"
-
Rules of Machine Learning: Best Practices for ML Engineering
-
MIT Deep Learning Basics: Introduction and Overview with TensorFlow
-
Deep Learning in a nutshell [part 1][part 2][part 3][part 4]
-
Attention:
-
Recommender Systems
- Collaborative Filtering
- Content Based
- Social Recommenders
-
Bagging (Bootstrap Aggregating)
-
Boosting Methods
-
Adaboost AKA Adaptive Boosting:
- Types:
- Discrete AdaBoost AKA AdaBoost.M1
- AdaBoost.SAMME
- AdaBoost.SAMME.R
- AdaBoost.R2
- Weights are adaptive. Incorrect predictions get larger and larger weights, and correct weights get smaller and smaller weights
- Each classifier has an adaptive weight as well
- Uses decision stumps, or decision trees of height 1 that are weak learners (accuracy of > 50%)
- Steps:
- Initialize sample weights to be 1/N
- Fit m classifier h_m on training data weighted by w_i
- compute the classifier's error e_m = sum( w_i * (predicted - actual) ) / sum(w_i)
- compute the classifier's weight a_m = log( (1-e_m)/e_m )
- w_i <- w_i * exp(a_m * e_m)
- FINAL CLASSIFIER = H(x) = sign( sum(a_m x h_m(x) ) )
- Caveats:
- weak learner too complex -> overfitting
- weak learner too weak -> underfitting
- susceptible to noise
- can be viewed as a coordinate based gradient descent algorithm
- A Step by Step Adaboost Example: With numerical examples and python code
- Types:
-
Gradient Boosting:
- Regression and Classification
- Optimizes a loss function (in contrast to AdaBoost)
- ensemble of weak models
- Next Model = Previous Model + learning_rate * some_multiplier * h(x) -- learning rate is to apply shrinkage
- The goal is to make Next Model = Previous Model + h(x) = actual value, hence, h(x) is fit on the residual of (y-Previous Model)
- Example Algorithms:
- Gradient Tree Boosting:
- Sequential built of trees (unlike random forests)
- Week Trees built non randomly.
- Prediction is fast and memory efficient
- Learning rate parameter (RF don't have it.) Higher LR: more emphasis on correcting the previous tree (more complex) and vice versa
clf = GradientBoostingClassifier(learning_rate=0.01, max_depth=2).fit(X_train, y_train)
- To avoid overfitting reduce max depth and learning rate and n_estimators
- No feature scaling needed
- Hard to interpret; training computationally expensive; tunning hard; bad for text classification and other problems with high dimensional sparse features
- Gradient Tree Boosting:
-
Fast scalable GBM:
- XGBoost
- LightGBM
- CatBoost
-
Naive Bayes Classifiers
-
Random Forests
-
Manifold Learning
-
- Google Machine Learning Crash Course
- 6.S191: Introduction to Deep Learning
- Geoffrey Hinton's Neural Networks for Machine Learning
- Data Visualization
- Modern Regression - CUM
-
- Lucid is a collection of infrastructure and tools for research in neural network interpretability.
- distill
- skymind A.I. Wiki
- Tensorflow Playground
- Shan Carter's website
- Kevin Quealy's website
- Pandas Trick by Kevin Markham