Skip to content

bjpcjp/scikit-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scikit-learn contents

Section Title Contents
00 Getting Started Estimators, Transformers, Preprocessors, Pipelines, Model Evaluation, Parameter Searches, Next Steps
01 Linear Models OLS, Ridge, Lasso, Elastic-Net, Least Angle Regression (LARS), LARS Lasso, OMP, Naive Bayes, Generalized Linear Models (GLM), Tweedie Models, Stochastic Gradient Descent (SGD), Perceptrons, Passive-Aggressive Algos, Polynomial Regression
01a Logistic Regression Basics, Examples
01b Splines Polynomial Regression & Basis Functions, Periodic splines
01c Quantile Regression Examples, QR vs linear regression
01d Outliers Robustness, RANSAC, Huber, Thiel-Sen
02 Discriminant Analysis LDA, QDA, Math Foundations, Shrinkage, Estimators
03 Kernel Ridge Regression KRR vs SVR
04 Support Vector Machines (SVMs) Classifiers, Regressors, Scoring, Weights, Complexity, Kernels
05 Stochastic Gradient Descent (SGD) Classifiers, Solvers, Regressors, Sparse Data; Complexity; Stopping/Convergence; Tips
06 K Nearest Neighbors (KNN) Algos (Ball Tree, KD Tree, Brute Force), Radius-based KNN, Nearest Centroid Classifiers, Caching, Neighborhood Components Analysis (NCA)
07 Gaussian Processes (GPs) Regressors, Classifiers, Kernels
08 Cross Decomposition Partial Least Squares (PLS), Canonical PLS, SVD PLS, PLS Regression, Canonical Correlation Analysis (CCA)
09 Naive Bayes (NB) Gaussian NB, Multinomial NB, Complement NB, Bernoulli NB, Categorical NB, Out-of-core fitting
10 Decision Trees (DTs) Classifiers, Graphviz, Regressions, Multiple Outputs, Extra Trees, Complexity, Algorithms, Gini, Entropy, Misclassification, Minimal cost-complexity Pruning
11a Ensembles/Bagging Methods, Random Forests, Extra Trees, Parameters, Parallel Execution, Feature Importance, Random Tree Embedding
11b Ensembles/Boosting Gradient Boosting (GBs), GB Classifiers, GB Regressions, Tree Sizes, Loss Functions, Shrinkage, Subsampling, Feature Importance, Histogram Gradient Boosting (HGB), HGB - Monotonic Constraints
11ba Ensembles/Boosting/Adaboost examples
11c Ensembles/Voting Hard Voting, Soft Voting, Voting Regressor
11d Ensembles/General Stacking Summary
12 Multiclass/Multioutput Problems Label Binarization, One vs Rest (OvR), One vs One (OvO) Classification, Output Codes, Multilabel, Multioutput Classification, Classifier Chains, Multioutput Regressions, Regression Chains
13 Feature Selection (FS) Removing Low-Variance Features, Univariate FS,
14 Semi-Supervised Self-Training Classifier, Label Propagation, Label Spreading
15 Isotonic Regression Example
16 Calibration Curves Intro/Example, Cross-Validation, Metrics, Regressors
17 Perceptrons Intro, Classification, Regression, Regularization, Training, Complexity, Tips
21 Gaussian Mixtures (GMs) Expectation Maximization, Variational Bayes GM
22 Manifolds Isomap, Locally Linear Embedding (LLE), Modified LLE, Hessian LLE, Local Tangent Space Alignment (LTSA), Multidimensional Scaling (MDS), Random Trees Embedding, Spectral Embedding, t-SNE, Neighborhood Components Analysis (NCA)
23 Clustering K-Means, Voronoi Diagrams, Affinity Propagation, Mean Shift, Spectral Clustering, Agglomerative Clustering, Dendrograms, Connectivity Constraints, Distance Metrics, DBSCAN, Optics, Birch
23a Clustering Metrics Rand Index, Mutual Info Score, Homogeneity, Completeness, V-Measure, Fowlkes-Mallows, Silhouette Coefficient, Calinski-Harabasz, Davies-Bouldin, Contingency Matrix, Pair Confusion Matrix
24 Biclustering Spectral Co-Clustering, Spectral Bi-Clustering, Metrics
25 Component Analysis / Matrix Factorization PCA, Incremental PCA, PCA w/ Random SVD, PCA w/ Sparse Data, Kernel PCA, Dimension Reduction Comparison, Truncated SVD / LSA, Dictionary Learning, Factor Analysis, Independent Component Analysis, Non-Negative Matrix Factorization (NNMF), Latent Dirichlet Allocation (LDA)
26 Covariance Empirical CV, Shrunk CV, Max Likelihood Estimation (MLE), Ledoit-Wolf Shrinkage, Oracle Approximating Shrinkage, Sparse Inverse CV, aka Precision Matrix, Mahalanobis Distance
27 Novelties & Outliers One-Class SVMs, Elliptic Envelope, Isolation Forest, Local Outlier Factor
28 Density Estimation (DE) Histograms, Kernel DE
29 Restricted Boltzmann Machines (RBMs) Intro, Training
31 Cross Validation (CV) Intro, Metrics, Parameter Estimation, Pipelines, Prediction Plots, Nesting, K-Fold, Stratified K-Fold, Leave One Out, Leave P Out, Class Label CV, Grouped Data CV, Predefined Splits, Time Series Splits, Permutation Testing, Visualizations
32 Parameter Tuning Grid Search, Randomized Optimization, Successive Halving, Composite Estimators & Parameter Spaces, Alternative to Brute Force, Info Criteria (AIC, BIC)
33 Metrics & Scoring (Intro) scoring, make_scorer
33a Classification Metrics Accuracy, Top-K Accuracy, Balanced Accuracy, Cohen's Kappa, Confusion Matrix, Classification Report, Hamming Loss, Precision, Recall, F-Measure, Precision-Recall Curve, Average Precision, Jaccard Similarity, Hinge Loss, Log Loss, Matthews Correlation Coefficient, Receiver Operating Characteristic (ROC) Curves, ROC-AUC, Detection Error Tradeoff (DET), Zero One Loss, Brier Score
33b Multilabel Ranking Metrics Coverage Error, Label Ranking Avg Precision (LRAP), Label Ranking Loss, Discounted Cumulative Gain (DCG), Normalized DCG
33c Regression Metrics Explained Variance, Max Error, Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Squared Log Error (MSLE), Mean Absolute Pct Error (MAPE), R^2 score, aka Coefficient of Determination , Tweedie Deviances
33d Dummy Metrics Dummy Classifiers, Dummy Regressors
34 Viz/Validation Validation Curve, Learning Curve
41 Viz/Inspection 2D PDPs, 3D PDPs, Individual Conditional Expectation (ICE) Plot
42 Viz/Permutations Permutation Feature Importance (PFI), Impurity vs Permutation Metrics
50a Viz/ROC Curves ROC Curve
50b Viz/custom PDP Plots Example
50c Vis/Classification metrics Confusion Matrix, ROC Curve, Precision-Recall Curve
61 Composite Transformers Pipelines, Caching, Regression Target xforms, Feature Unions, Column Transformers
62a Text Feature Extraction Bag of Words (BoW), Sparsity, Count Vectorizer, Stop Words, Tf-Idf, Binary Markers, Text file decoding, Hashing Trick, Out-of-core Scaling, Custom Vectorizers
62b Image Patch Extraction Extract from Patches, Reconstruct from Patches, Connectivity Graphs
63 Data Preprocessing Scaling, Quantile Transforms, Power Maps (Box-Cox, Yeo-Johnson), Category Coding, One-Hot Coding, Quantization aka Binning, Feature Binarization
64 Missing Value Imputation Univariate, Multivariate, Multiple-vs-Single, Nearest-Neighbors, Marking Imputed Values
66 Random Projections Johnson-Lindenstrauss lemma, Gaussian RP, Sparse RP Empirical Validation
67 Kernel Approximations Nystroem, RBF Sampler, Additive Chi-Squared Sampler, Skewed Chi-Squared Sampler, Polynomial Sampling - Tensor Sketch
68 Pairwise Ops Distances vs Kernels, Cosine Similarity, Kernels
69 Transforming Prediction Targets Label Binarization, Multilabel Binarization, Label Encoding
71 Toy Datasets Boston, Iris, Diabetes, Digits, Linnerud, Wine, Breast Cancer, Olivetti faces, 20 newsgroups, Labeled faces, Forest covertypes, Reuters corpus, KDD, Cal housing
73 Artificial Data random-nclass-data, Gaussian blobs, Gaussian quantiles, Circles, Moons, Multilabel class data, Hastie data, BiClusters, Checkerboards, Regression, Friedman1/2/3, S-Curve, Swiss Roll, Low-Rank Matrix, Sparse Coded Signal, Sparse Symmetric Positive Definite (SPD) Matrix
74 Other Data Sample images, SVMlight/LibSVM formats, OpenML, pandas.io, scipy.io, numpy.routines.io, scikit-image, imageio, scipy.io.wavfile
81 Scaling Out-of-core ops (BUG = TODO)
82 Latency Bulk-vs-atomic ops, Latency vs Validation, Latency vs #Features, Latency vs Datatype, Latency vs Feature Extraction, Linear Algebra Libs (BLAS, LAPACK, ATLAS, OpenBLAS, MKL, vecLib)
83 Parallelism JobLib, OpenMP, NumPy, Oversubscription, config switches
90 Persistence Pickle, Joblib

About

Updates in progress. Jupyter workbooks will be added as time allows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published