Articles

"Machine Learning Resources"

Articles

50 years of Data Science
Ten Simple Rules for Better Figures
The Foundations of Algorithmic Bias
Rules of Machine Learning: Best Practices for ML Engineering
The Unreasonable Effectiveness of Recurrent Neural Networks
Vanilla LSTM with numpy
http://colah.github.io/
- Understanding LSTM Networks
An overview of gradient descent optimization algorithms
Vanilla LSTM with numpy
LSTM
MIT Deep Learning Basics: Introduction and Overview with TensorFlow
Deep Learning in a nutshell [part 1][part 2][part 3][part 4]
Neural Machine Translation (seq2seq) Tutorial
Non negative matrix factorization
design patters in scikit learn
Attention:
Recommender Systems
- Collaborative Filtering
- Content Based
- Social Recommenders
Bagging (Bootstrap Aggregating)
Boosting Methods
- Adaboost AKA Adaptive Boosting:
  - Types:
    - Discrete AdaBoost AKA AdaBoost.M1
    - AdaBoost.SAMME
    - AdaBoost.SAMME.R
    - AdaBoost.R2
  - Weights are adaptive. Incorrect predictions get larger and larger weights, and correct weights get smaller and smaller weights
  - Each classifier has an adaptive weight as well
  - Uses decision stumps, or decision trees of height 1 that are weak learners (accuracy of > 50%)
  - Steps:
    1. Initialize sample weights to be 1/N
    2. Fit m classifier h_m on training data weighted by w_i
      1. compute the classifier's error e_m = sum( w_i * (predicted - actual) ) / sum(w_i)
      2. compute the classifier's weight a_m = log( (1-e_m)/e_m )
      3. w_i <- w_i * exp(a_m * e_m)
    3. FINAL CLASSIFIER = H(x) = sign( sum(a_m x h_m(x) ) )
  - Caveats:
    - weak learner too complex -> overfitting
    - weak learner too weak -> underfitting
    - susceptible to noise
  - can be viewed as a coordinate based gradient descent algorithm
  - A Step by Step Adaboost Example: With numerical examples and python code
- Gradient Boosting:
  - Regression and Classification
  - Optimizes a loss function (in contrast to AdaBoost)
  - ensemble of weak models
  - Next Model = Previous Model + learning_rate * some_multiplier * h(x) -- learning rate is to apply shrinkage
  - The goal is to make Next Model = Previous Model + h(x) = actual value, hence, h(x) is fit on the residual of (y-Previous Model)
  - Example Algorithms:
    - Gradient Tree Boosting:
      - Sequential built of trees (unlike random forests)
      - Week Trees built non randomly.
      - Prediction is fast and memory efficient
      - Learning rate parameter (RF don't have it.) Higher LR: more emphasis on correcting the previous tree (more complex) and vice versa
      - clf = GradientBoostingClassifier(learning_rate=0.01, max_depth=2).fit(X_train, y_train)
      - To avoid overfitting reduce max depth and learning rate and n_estimators
      - No feature scaling needed
      - Hard to interpret; training computationally expensive; tunning hard; bad for text classification and other problems with high dimensional sparse features
- Fast scalable GBM:
  - XGBoost
  - LightGBM
  - CatBoost
- Naive Bayes Classifiers
- Random Forests
- Manifold Learning

Books

Courses

Repos

Natural Language Processing Best Practices & Examples
Pytorch Implementation
lucid
- Lucid is a collection of infrastructure and tools for research in neural network interpretability.
oxford-cs-deepnlp-2017

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Methods		Methods
img		img
linear-algebra		linear-algebra
notebooks		notebooks
pure-python		pure-python
General-Notes.md		General-Notes.md
README.md		README.md
cloud.md		cloud.md
statistics.md		statistics.md
tensorflow.md		tensorflow.md
visualizations.md		visualizations.md

mrshootingstar/machine-learning-resources

Folders and files

Latest commit

History

Repository files navigation

Articles

Books

Courses

Repos

Twitter

Datasets

Videos

Websites

Other directories

Checklist

Lecture Notes

About

Resources

Stars

Watchers

Forks

Languages