Skip to content

hmchen47/DataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science, AI, Machine Learning

Data Science

Machine Learning

Mathematics

Misc

Reference Cards

Probability and Statistics

Topic Sub-topics
Statistics Help Cards
General Notations Inference Scenarios $\chi^2$ Test
Basic Formula Summary Measures Probability Rules Discrete R.V. Binomial R.V.
Normal R.V. Binomial Approx. Sample Proportion Sample Means
Inference Population Proportion Population Mean One-Way ANOVA
Regression Linear Model Parameter Estimators Residuals Correlation
Estimate of $\sigma$ Sample Slope Sample Intercept
CI of Mean Response Prediction for Mean Response
Probability
Basics Concepts Tree
Sets & Counting Basics Relations Operations Counting
Disjoint Union General Union Cartesian Product Cartesian Power
Probability Basics Uniform Space Events Experiments
Axioms Inequalities Conditional Independence
Sequential Total Bayes' Rule
Random Variables Discrete R.V. Continuous R.V. Functions of R.V. CDF
Expectation Variance Modification Expectation of
Functions of R.V.
Multivariate Linearity Covariance
Statistics Basics
General Terminology Bias & Variance Margin of Error Analysis
Multiple Distributions
Inference Overview Mean Variance Unbiased Variance
Standard Deviation CI w/ known $\sigma$ CI w/ unknown $\sigma$
Hypothesis Overview
Distributions Bernoulli Binomial Beta Poisson
Geometric Beta-Binomial Dirichlet Uniform
Exponential Gaussian Wishart Pareto
Regression Analysis Overview Assumption Violation Plots
Study Design Overview Cohort Case-Control
Bayesian Approaches
General Overview Categories Reporting Modeling
Prior Improper Prior Likelihood Likelihood Principle
Posterior Exchangeability Large Samples Conflicts
Freedman's Theory
Model Measurement Creditability Odds Ratios Bayes Factor Model Selection
Analysis Modeling Binary Data Normal Predication
Decision Theory Hierarchical Models Bayesian Linear Models
Inference Point Estimate Region Estimate Hypothesis Testing Unisance Parameters
Result Interpretation
Conjugate Prior Overview Exponential Family Uniform-Bernoulli Beta-Bernoulli
Dirichlet-Multinomial Gamma-Poisson Gamma-Exponential Gamma-Geometric
InvGamma-Gaussian ScaledInv-$\chi^2$-Gaussian InvWishart-Gaussian Pareto-Uniform
Time Series Analysis
Basics Overview
Numerical Methods
Basics Overview Methods Integrals Importance Sampling
Bayesian Hierarchical Empirical Deterministic
Monte Carlo Methods Basics Integrals Gaussian Two Binomials
Multiparameters
Markov Chain Monte Carlo Methods Basics Metropolis-Hastings (MH) Random-Walk-MH Independence-MH
Gibbs Sampling
Subject Title

Machine Learning

Topic Sub-topics
General and Preprocessing
Basics Overview Decision Tree Clustering
Modeling Model representation Pipeline Model Selection
Feature Engineering
General Summary Overview Variable Types Common Issues
Mutual Information (MI)
Imputing Missing Values Overview Mean & Median Arbitrary Value End of Tail
Frequent Category Missing Category Complete Case Analysis Missing Indicator
Random Sample Iterative KNN
Encoding Categorical Variables Overview One-Hot Integer (Target) Count / Frequency
Ordered Label Mean (Target) Weight of Evidence Probability Ratio
Rare Label Binary Catboost Leave-One-Out
James-Stein
Transforming Variables Overview Logarithmic Square Root Recipocol
Exponential & Power Box-Cox Yeo-Johnson
Variable Discretization Overview Equal-Width Equal-Frequency K-Means
Decision Trees Custom
Outliers Overview Detection IQR Proximity Rule DBSCAN
Isolation Forests Local Outlier Factor Trimming Censoring
Imputer Transformation
Feature Scaling Overview Mean Normalization Standardization Robust
Min-Max Maximum Absolute Vector Unit Norm
Date/Time & Mixed Variables Date & Time Mixed Periodicity
Advanced Topics Automated Geospatial Resampling Imbalanced
Feature Selection
General Overview
Filter Method Overview Basic Correlation Statistical & Ranking
Wrapper Method Overview Forward Backward Exhaustive
Step Forward / Backward Bidirectional Search
Embedded Method for FSelect Overview Regularization Feature Importance Permutation Importance
Hybrid Methods for FSelect Overview Filter & Wrapper Embedded & Wrapper Recursive Elimination
Recursive Addition
Advanced Methjods for FSelect Dimensionality Reduction Heursitic Search Deep learning
Supervised Machine Learning
Linear Regression Model Cost Function Gradient Descent Vectorization
Polynomial Regression Normal Equation
Logistic Regression Model Cost Function Gradient Descent Vectorization
Neural Networks Model Forward Propagation Back Propagation Vectorization
Logic Operators Initialization Training
Support Vector Machine (SVM) Overview Modeling Decision Boundary Kernels
Binary Classification Multiclass Classification Linear SVM Non-linear SVM
Kernel Trick Popular Kernel Functions
Unsupervised Machine Learning
K-Mean
Model Algorithm Initialization Parameter
Expactation-Maximization (EM) Model (NA) Algorithm
Principal Component Analysis (PCA) Model Algorithm Reconstruct PA Number
Advice Vectorization
Anomaly Detection Problem Gaussian Distribution Algorithm System
Advice on Machine Learning System
System Considerations Learning Rate $\alpha$ Optimization One-vs-all Bias/Variance
Evaluation Learning Curve Diagnostic Error Analysis
Ceiling Analysis Performance Dimensionality Reduction Artificial Data
Special Applications / Special Topics
Applications Spam Classifier Recommender Systems Large Scale Machine Learning Online Learning
Map Reduce Photo OCR

Neural Networks

Topic Sub-topics
Basic Neural Networks
NN in Machine Learning Forward Propagation Back Propagation Vectorization Logic Operators
Initialization Training
Fundamentals Motivation Anatomy Concepts & NN Learning Types
Learning Methods Bias & Variance Considerations Backpropagation
BP Math Derivation General Algorithm
Architectures Types Simple Neuron Model Perceptrons
Activation Functions Overview Sigmoid & Softmax Hyper Tangent Softplus
Rectified Linear UNnit Maxout Self-gated
Lost/Cost Function & Gradient Descent Overview Gradient Descent Delta Rule Mini-batch
Tricks of Mini-batch
Output Unit Overview
Huperparameters Summary Overview Batch Size Weight Decay
Linear Neurons Model Cost Function Error Surface Backpropagation
Logistic Neurons Model Backpropagation Softmax Loss Fucntion Softmax Gradient Descent
Overfitting & Underfitting Overview Underfitting Overfitting Meta Parameters
Combined Models Mixture of Experts Early Stopping Weight Decay
Adding Noise Dropout Inverted Dropout
Bayesian Approach Overview Weight Decay Full Bayesian
Optimization Issues & Algorithms Challenges Local Optima & Saddle Points Poor Conditioning Vanishing/Exploding Gradients
Adaptivr Learning Rates Momentum Parameter Initialization Normalization
Beale's Function for Assessment Keras Implementation Cross-valiation Implementation
Second-order Bacpropagation Overview Derviatives Hessian Calculation Conclusions
Momentum Classical Backpropagation Nesterov Cyclical
Parameter Initialization Strategies Xavier HE Normal Bias
Pre-initialization
Normalization Internal Covariate Shift Batch Normalization
Second-order Algorithms General Representation QuickProp QRProp
Relaxation Methods Overview Weight & Node Permutation Symmetric & Asymmetric
Adaptive Learnign Rates Overview Cyclical Learning Rates Estimating the Learning Rates SGD w/ Warm Restarts
Snapshot Ensambles Polyak-Ruppert Averaging Silva & Almeida Delta-bar-Delta
AdaGrad RProp RMPProp Adam
Dymamic Adaption QuickProp QRProp
Parameter Initialization Xavier HE Normal Bias Pre-initilaization
Applications Family Tree Speech Recognition Architecture for NLP
Joint Model Coordinate Frames Hyperparasmeter
Convolutional Neural Networks
General Topics Issues Viewpoint Invariance Replicated Features Transfer Learning
Hyperparameters Stride & Padding ReLU Pooling Dropout
Network in Network
Other Models Region Based GAN Generating Image Description Finding Roads (2012)
Hand-written Recognition Le Net Brute Force Measurement Spatiasl Transform
Object Classification Problem Space Modeling Training Testing
Object Recognition Overview AlexNet (2012) ZF net (2013) VGG Net (2014)
GoogLeNet (2015)
Recurrent Neural Networks
General Topics Overview Training Binary Addition Long Short-term Memory
Optimization Hessian-free
Applications Text Characters Predicting Next Character Echo State networks
Hopfield Networks & Boltzmann Machines
Hopfiled Networks Overview Energy Function Memories Spurious Minima
Issues Searching Simulated Annealing
Boltzmann Machines Overview Causal Generative Model Boltzmann Machine Model Learning
Phases Statistics Mean Field
Restricted Boltzmann Machines Model Persistent Contrastive Divergence Contrastive Divergence Collaborative Filtering
Belief / Bayesian Networks
Belief Overview Sigmoid Explaining Away Factorial
Bayesian Modeling Weight Decay Full
Learning General Rule Wake-Sleep Algorithm
Deep Belief Nets Overview Contrastive Fine-Tune Real-Valued Data
Infinite Sigmoid Belief
Autoencoder Overview PCA Deep Nets Document Retrieval
Image Retrieval

Python for Data Science

Topic Sub-topics
Common Functions
Import Data
Matplotlib Official Pyplot API Environment & Modules Classes Official Docs
Methods Line Style & Marker Color abbreviations
Seaborn Seaborn API
Probability & Statistics
General Sets Numpy Common Functions Pandas Common Functions
Related Functions Random Number Generator Random Sampling Statistical Distributions
Numpy Statistics Numpy Math
Data Science
General Open CSV File Methods
Date & Time Import Files Attributes Methods
SciPy Import Files Statistical Module
Numpy w/ Data Science Course Import Files General Array Creation Combining Array
Array Operations Math Functions Indexing/Slicing Random Number Generator
Pandas w/ Data Science Course Import File General Timestamp
DataFrame w/ Data Science Course Class Loading File Attributes Indexing/Slicing
Methods
Pandas DataFrame APIs Attributes Indexng & Iteration Binary Operators Functions
Statistics Indexing & Manipulation Missing Data Reshaping, Sorting & Transposing
DataFrames Manipulation Time Series Plotting
Pandas TimeStamps Class Attributes Methods

scikit-learn API Reference

Topic Sub-topics
scikit-learn Base Classes Probability Calibration Clustering Comosite Estimators
Covariance Estimators Cross Decomposition Datasets Matrix Decompsition
Discriminant Analysis Dummy Estimators Ensemble Methods Exceptions & Warnings
Experimental Feature Extraction Feature Selection Gaussian Process
Impute Isotonic Regression Kernel Approximation Linear Models
Manifold Learning Metrics Mixture Models Model Selection
Multiclass & Multilabel Classification Naive Bayes Nearest Neighbors Neural network Models
Pipeline Preprocessing & Normalization Random Projection Semi-Supervised Learning
Support Vector Machines Decision Trees Utilities


Reading Notes

Topic Sub-topics
General Topics Visualization
Probability & Statistics Basics Study Design Bayesian Approaches Time Series Analysis
Linear Algebra General Operations & Properties Eigenvalue and Eigenvector
Artificial Intelligence
Machine Learning General Topics Feature Engineering Feature Selection Models
Applications
Neural Networks General Topics Activation Functions CNN Deep Learning
Database
Python Implementation

NB: keywords for Git Commits

Symbol Description
feat new feature
fix a bug fix
docs changes to documentation
style format, missing semicolons, etc.; no code change
refact refactoring production code
test add tests, refactoring test; no production code change
chore updating build tasks, package manager config, etc.; no production code change

About

Collections of Materials for Data Science, AI, Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published