Class/Function | Description |
---|---|
Basic classes | |
Base class for all estimators in scikit-learn |
|
Mixin class for all bicluster estimators in scikit-learn |
|
Mixin class for all classifiers in scikit-learn. |
|
Mixin class for all cluster estimators in scikit-learn. |
|
Mixin class for all density estimators in scikit-learn. |
|
Mixin class for all regression estimators in scikit-learn. |
|
Mixin class for all transformers in scikit-learn. |
|
Transformer mixin that performs feature selection given a support mask |
|
Functions | |
base.clone(estimator, *[, safe]) |
Constructs a new estimator with the same parameters. |
base.is_classifier(estimator) |
Return True if the given estimator is (probably) a classifier. |
base.is_regressor(estimator) |
Return True if the given estimator is (probably) a regressor. |
config_context(**new_config) |
Context manager for global scikit-learn configuration |
Retrieve current values for configuration set by set_config |
|
set_config([assume_finite, |
Set global scikit-learn configuration |
Print useful debugging information” |
sklearn.calibration: Probability Calibration
Function | Description |
---|---|
Probability calibration with isotonic regression or logistic regression. |
|
calibration.calibration_curve |
Compute true and predicted probabilities for a calibration curve. |
sklearn.cluster: Clustering
Class/Function | Description |
---|---|
Classes | |
cluster.AffinityPropagation(*[, damping, …]) |
Perform Affinity Propagation Clustering of data. |
Agglomerative Clustering |
|
cluster.Birch(*[, threshold, …]) |
Implements the Birch clustering algorithm. |
cluster.DBSCAN([eps, min_samples, metric, …]) |
Perform DBSCAN clustering from vector array or distance matrix. |
cluster.FeatureAgglomeration([n_clusters, …]) |
Agglomerate features. |
cluster.KMeans([n_clusters, init, n_init, …]) |
K-Means clustering. |
cluster.MiniBatchKMeans([n_clusters, init, …]) |
Mini-Batch K-Means clustering. |
cluster.MeanShift(*[, bandwidth, seeds, …]) |
Mean shift clustering using a flat kernel. |
cluster.OPTICS(*[, min_samples, max_eps, …]) |
Estimate clustering structure from vector array. |
cluster.SpectralClustering([n_clusters, …]) |
Apply clustering to a projection of the normalized Laplacian. |
cluster.SpectralBiclustering([n_clusters, …]) |
Spectral biclustering (Kluger, 2003). |
cluster.SpectralCoclustering([n_clusters, …]) |
Spectral Co-Clustering algorithm (Dhillon, 2001). |
Functions | |
cluster.affinity_propagation(S, *[, …]) |
Perform Affinity Propagation Clustering of data |
Performs DBSCAN extraction for an arbitrary epsilon. |
|
cluster.cluster_optics_xi(*, reachability, …) |
Automatically extract clusters according to the Xi-steep method. |
cluster.compute_optics_graph(X, *, …) |
Computes the OPTICS reachability graph. |
cluster.dbscan(X[, eps, min_samples, …]) |
Perform DBSCAN clustering from vector array or distance matrix. |
cluster.estimate_bandwidth(X, *[, quantile, …]) |
Estimate the bandwidth to use with the mean-shift algorithm. |
cluster.k_means(X, n_clusters, *[, …]) |
K-means clustering algorithm. |
cluster.mean_shift(X, *[, bandwidth, seeds, …]) |
Perform mean shift clustering of data using a flat kernel. |
cluster.spectral_clustering(affinity, *[, …]) |
Apply clustering to a projection of the normalized Laplacian. |
cluster.ward_tree(X, *[, connectivity, …]) |
Ward clustering based on a Feature matrix. |
sklearn.compose: Composite Estimators
Function | Description |
---|---|
compose.ColumnTransformer |
Applies transformers to columns of an array or pandas DataFrame. |
Meta-estimator to regress on a transformed target. |
|
Construct a ColumnTransformer from the given transformers. |
|
compose.make_column_selector |
Create a callable to select columns to be used with ColumnTransformer. |
sklearn.covariance: Covariance Estimators
Function | Description |
---|---|
covariance.EmpiricalCovariance(*[, …]) |
Maximum likelihood covariance estimator |
covariance.EllipticEnvelope(*[, …]) |
An object for detecting outliers in a Gaussian distributed dataset. |
covariance.GraphicalLasso([alpha, mode, …]) |
Sparse inverse covariance estimation with an l1-penalized estimator. |
covariance.GraphicalLassoCV(*[, alphas, …]) |
Sparse inverse covariance w/ cross-validated choice of the l1 penalty. |
covariance.LedoitWolf(*[, store_precision, …]) |
LedoitWolf Estimator |
covariance.MinCovDet(*[, store_precision, …]) |
Minimum Covariance Determinant (MCD): robust estimator of covariance. |
covariance.OAS(*[, store_precision, …]) |
Oracle Approximating Shrinkage Estimator |
covariance.ShrunkCovariance(*[, …]) |
Covariance estimator with shrinkage |
covariance.empirical_covariance(X, *[, …]) |
Computes the Maximum likelihood covariance estimator |
covariance.graphical_lasso(emp_cov, alpha, *) |
l1-penalized covariance estimator |
covariance.ledoit_wolf(X, *[, …]) |
Estimates the shrunk Ledoit-Wolf covariance matrix. |
covariance.oas(X, *[, assume_centered]) |
Estimate covariance with the Oracle Approximating Shrinkage algorithm. |
covariance.shrunk_covariance(emp_cov[, …]) |
Calculates a covariance matrix shrunk on the diagonal |
covariance.empirical_covariance(X, *[, …]) |
Computes the Maximum likelihood covariance estimator |
covariance.graphical_lasso(emp_cov, alpha, *) |
l1-penalized covariance estimator |
covariance.ledoit_wolf(X, *[, …]) |
Estimates the shrunk Ledoit-Wolf covariance matrix. |
covariance.oas(X, *[, assume_centered]) |
Estimate covariance with the Oracle Approximating Shrinkage algorithm. |
covariance.shrunk_covariance(emp_cov[, …]) |
Calculates a covariance matrix shrunk on the diagonal |
sklearn.cross_decomposition: Cross decomposition
Function | Description |
---|---|
cross_decomposition.CCA([n_components, …]) |
CCA Canonical Correlation Analysis. |
PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, referred as PLS-C2A in [Wegelin 2000]. |
|
PLS regression |
|
cross_decomposition.PLSSVD([n_components, …]) |
Partial Least Square SVD |
sklearn.datasets: Datasets
Class/Function | Description |
---|---|
Basic classes | |
datasets.clear_data_home([data_home]) |
Delete all the content of the data home cache. |
datasets.dump_svmlight_file(X, y, f, *[, …]) |
Dump the dataset in svmlight / libsvm file format. |
datasets.fetch_20newsgroups(*[, data_home, …]) |
Load the filenames and data from the 20 newsgroups dataset (classification). |
Load the 20 newsgroups dataset and vectorize it into token counts (classification). |
|
Load the California housing dataset (regression). |
|
datasets.fetch_covtype(*[, data_home, …]) |
Load the covertype dataset (classification). |
datasets.fetch_kddcup99(*[, subset, …]) |
Load the kddcup99 dataset (classification). |
datasets.fetch_lfw_pairs(*[, subset, …]) |
Load the Labeled Faces in the Wild (LFW) pairs dataset (classification). |
datasets.fetch_lfw_people(*[, data_home, …]) |
Load the Labeled Faces in the Wild (LFW) people dataset (classification). |
datasets.fetch_olivetti_faces(*[, …]) |
Load the Olivetti faces data-set from AT&T (classification). |
datasets.fetch_openml([name, version, …]) |
Fetch dataset from openml by name or dataset id. |
datasets.fetch_rcv1(*[, data_home, subset, …]) |
Load the RCV1 multilabel dataset (classification). |
Loader for species distribution dataset from Phillips et. |
|
datasets.get_data_home([data_home]) |
Return the path of the scikit-learn data dir. |
datasets.load_boston(*[, return_X_y]) |
Load and return the boston house-prices dataset (regression). |
datasets.load_breast_cancer(*[, return_X_y, …]) |
Load and return the breast cancer wisconsin dataset (classification). |
datasets.load_diabetes(*[, return_X_y, as_frame]) |
Load and return the diabetes dataset (regression). |
datasets.load_digits(*[, n_class, …]) |
Load and return the digits dataset (classification). |
datasets.load_files(container_path, *[, …]) |
Load text files with categories as subfolder names. |
datasets.load_iris(*[, return_X_y, as_frame]) |
Load and return the iris dataset (classification). |
datasets.load_linnerud(*[, return_X_y, as_frame]) |
Load and return the physical excercise linnerud dataset. |
datasets.load_sample_image(image_name) |
Load the numpy array of a single sample image |
Load sample images for image manipulation. |
|
datasets.load_svmlight_file(f, *[, …]) |
Load datasets in the svmlight / libsvm format into sparse CSR matrix |
datasets.load_svmlight_files(files, *[, …]) |
Load dataset from multiple files in SVMlight format |
datasets.load_wine(*[, return_X_y, as_frame]) |
Load and return the wine dataset (classification). |
Samples generator | |
datasets.make_biclusters(shape, n_clusters, *) |
Generate an array with constant block diagonal structure for biclustering. |
datasets.make_blobs([n_samples, n_features, …]) |
Generate isotropic Gaussian blobs for clustering. |
datasets.make_checkerboard(shape, n_clusters, *) |
Generate an array with block checkerboard structure for biclustering. |
datasets.make_circles([n_samples, shuffle, …]) |
Make a large circle containing a smaller circle in 2d. |
datasets.make_classification([n_samples, …]) |
Generate a random n-class classification problem. |
datasets.make_friedman1([n_samples, …]) |
Generate the “Friedman #1” regression problem |
datasets.make_friedman2([n_samples, noise, …]) |
Generate the “Friedman #2” regression problem |
datasets.make_friedman3([n_samples, noise, …]) |
Generate the “Friedman #3” regression problem |
datasets.make_gaussian_quantiles(*[, mean, …]) |
Generate isotropic Gaussian and label samples by quantile |
datasets.make_hastie_10_2([n_samples, …]) |
Generates data for binary classification used in Hastie et al. |
datasets.make_low_rank_matrix([n_samples, …]) |
Generate a mostly low rank matrix with bell-shaped singular values |
datasets.make_moons([n_samples, shuffle, …]) |
Make two interleaving half circles |
Generate a random multilabel classification problem. |
|
datasets.make_regression([n_samples, …]) |
Generate a random regression problem. |
datasets.make_s_curve([n_samples, noise, …]) |
Generate an S curve dataset. |
datasets.make_sparse_coded_signal(n_samples, …) |
Generate a signal as a sparse combination of dictionary elements. |
datasets.make_sparse_spd_matrix([dim, …]) |
Generate a sparse symmetric definite positive matrix. |
Generate a random regression problem with sparse uncorrelated design |
|
datasets.make_spd_matrix(n_dim, *[, …]) |
Generate a random symmetric, positive-definite matrix. |
datasets.make_swiss_roll([n_samples, noise, …]) |
Generate a swiss roll dataset. |
sklearn.decomposition: Matrix Decomposition
Function | Description |
---|---|
Dictionary learning |
|
decomposition.FactorAnalysis([n_components, …]) |
Factor Analysis (FA) |
decomposition.FastICA([n_components, …]) |
FastICA: a fast algorithm for Independent Component Analysis. |
decomposition.IncrementalPCA([n_components, …]) |
Incremental principal components analysis (IPCA). |
decomposition.KernelPCA([n_components, …]) |
Kernel Principal component analysis (KPCA) |
Latent Dirichlet Allocation with online variational Bayes algorithm |
|
Mini-batch dictionary learning |
|
Mini-batch Sparse Principal Components Analysis |
|
decomposition.NMF([n_components, init, …]) |
Non-Negative Matrix Factorization (NMF) |
decomposition.PCA([n_components, copy, …]) |
Principal component analysis (PCA). |
decomposition.SparsePCA([n_components, …]) |
Sparse Principal Components Analysis (SparsePCA) |
decomposition.SparseCoder(dictionary, *[, …]) |
Sparse coding |
decomposition.TruncatedSVD([n_components, …]) |
Dimensionality reduction using truncated SVD (aka LSA). |
sklearn.discriminant_analysis: Discriminant Analysis
Function | Description |
---|---|
Linear Discriminant Analysis |
|
Quadratic Discriminant Analysis |
sklearn.dummy: Dummy estimators
Function | Description |
---|---|
dummy.DummyClassifier(*[, strategy, …]) |
DummyClassifier is a classifier that makes predictions using simple rules. |
dummy.DummyRegressor(*[, strategy, …]) |
DummyRegressor is a regressor that makes predictions using simple rules. |
sklearn.ensemble: Ensemble Methods
Function | Description |
---|---|
An AdaBoost classifier. |
|
ensemble.AdaBoostRegressor([base_estimator, …]) |
An AdaBoost regressor. |
ensemble.BaggingClassifier([base_estimator, …]) |
A Bagging classifier. |
ensemble.BaggingRegressor([base_estimator, …]) |
A Bagging regressor. |
An extra-trees classifier. |
|
ensemble.ExtraTreesRegressor([n_estimators, …]) |
An extra-trees regressor. |
Gradient Boosting for classification. |
|
Gradient Boosting for regression. |
|
ensemble.IsolationForest(*[, n_estimators, …]) |
Isolation Forest Algorithm. |
A random forest classifier. |
|
A random forest regressor. |
|
An ensemble of totally random trees. |
|
ensemble.StackingClassifier(estimators[, …]) |
Stack of estimators with a final classifier. |
ensemble.StackingRegressor(estimators[, …]) |
Stack of estimators with a final regressor. |
ensemble.VotingClassifier(estimators, *[, …]) |
Soft Voting/Majority Rule classifier for unfitted estimators. |
ensemble.VotingRegressor(estimators, *[, …]) |
Prediction voting regressor for unfitted estimators. |
Histogram-based Gradient Boosting Regression Tree. |
|
Histogram-based Gradient Boosting Classification Tree. |
sklearn.exceptions: Exceptions and warnings
Function | Description |
---|---|
Warning class used to notify the user of any change in the behavior. |
|
Custom warning to capture convergence problems |
|
Warning used to notify implicit data conversions happening in the code. |
|
Custom warning to notify potential issues with data dimensionality. |
|
Warning used to notify the user of inefficient computation. |
|
Warning class used if there is an error while fitting the estimator. |
|
Exception class to raise if estimator is used before fitting. |
|
Warning used when the dot operation does not use BLAS. |
|
Warning used when the metric is invalid |
sklearn.experimental: Experimental
Function | Description |
---|---|
Transforms lists of feature-value mappings to vectors. |
|
Implements feature hashing, aka the hashing trick. |
sklearn.feature_extraction: Feature Extraction
Function | Description |
---|---|
Basics | |
Transforms lists of feature-value mappings to vectors. |
|
Implements feature hashing, aka the hashing trick. |
|
From images | |
Reshape a 2D image into a collection of patches |
|
feature_extraction.image.grid_to_graph(n_x, n_y) |
Graph of the pixel-to-pixel connections |
Graph of the pixel-to-pixel gradient connections |
|
Reconstruct the image from all of its patches. |
|
Extracts patches from a collection of images |
|
From text | |
Convert a collection of text documents to a matrix of token counts |
|
Convert a collection of text documents to a matrix of token occurrences |
|
Transform a count matrix to a normalized tf or tf-idf representation |
|
Convert a collection of raw documents to a matrix of TF-IDF features. |
sklearn.feature_selection: Feature Selection
Function | Description |
---|---|
Univariate feature selector with configurable strategy. |
|
Select features according to a percentile of the highest scores. |
|
feature_selection.SelectKBest([score_func, k]) |
Select features according to the k highest scores. |
feature_selection.SelectFpr([score_func, alpha]) |
Filter: Select the pvalues below alpha based on a FPR test. |
feature_selection.SelectFdr([score_func, alpha]) |
Filter: Select the p-values for an estimated false discovery rate |
feature_selection.SelectFromModel(estimator, *) |
Meta-transformer for selecting features based on importance weights. |
feature_selection.SelectFwe([score_func, alpha]) |
Filter: Select the p-values corresponding to Family-wise error rate |
feature_selection.RFE(estimator, *[, …]) |
Feature ranking with recursive feature elimination. |
feature_selection.RFECV(estimator, *[, …]) |
Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. |
feature_selection.VarianceThreshold([threshold]) |
Feature selector that removes all low-variance features. |
feature_selection.chi2(X, y) |
Compute chi-squared stats between each non-negative feature and class. |
Compute the ANOVA F-value for the provided sample. |
|
feature_selection.f_regression(X, y, *[, center]) |
Univariate linear regression tests. |
Estimate mutual information for a discrete target variable. |
|
Estimate mutual information for a continuous target variable. |
sklearn.gaussian_process: Gaussian Processes
Function | Description |
---|---|
General | |
Gaussian process classification (GPC) based on Laplace approximation. |
|
Gaussian process regression (GPR). |
|
Kernels | |
Kernel which is composed of a set of other kernels. |
|
Constant kernel. |
|
Dot-Product kernel. |
|
Exp-Sine-Squared kernel (aka periodic kernel). |
|
The Exponentiation kernel takes one base kernel and a scalar parameter p and combines them via |
|
A kernel hyperparameter’s specification in form of a namedtuple. |
|
Base class for all kernels. |
|
Matern kernel. |
|
Wrapper for kernels in sklearn.metrics.pairwise. |
|
gaussian_process.kernels.Product(k1, k2) |
The Product kernel takes two kernels k1 and k2 and combines them via |
gaussian_process.kernels.RBF([length_scale, …]) |
Radial-basis function kernel (aka squared-exponential kernel). |
Rational Quadratic kernel. |
|
gaussian_process.kernels.Sum(k1, k2) |
The Sum kernel takes two kernels k1 and k2 and combines them via |
White kernel. |
sklearn.impute: Impute
Function | Description |
---|---|
impute.SimpleImputer(*[, missing_values, …]) |
Imputation transformer for completing missing values. |
impute.IterativeImputer([estimator, …]) |
Multivariate imputer that estimates each feature from all the others. |
impute.MissingIndicator(*[, missing_values, …]) |
Binary indicators for missing values. |
impute.KNNImputer(*[, missing_values, …]) |
Imputation for completing missing values using k-Nearest Neighbors. |
sklearn.inspection: Inspection
Function | Description |
---|---|
General | |
inspection.partial_dependence(estimator, X, …) |
Partial dependence of features. |
inspection.permutation_importance(estimator, …) |
Permutation importance for feature evaluation [Rd9e56ef97513-BRE]. |
Plotting | |
Partial Dependence Plot (PDP) visualization. |
sklearn.isotonic: Isotonic regression
Function | Description |
---|---|
isotonic.IsotonicRegression(*[, y_min, …]) |
Isotonic regression model. |
Determine whether y is monotonically correlated with x. |
|
isotonic.isotonic_regression(y, *[, …]) |
Solve the isotonic regression model. |
kernel_approximation: Kernel Approximation
Function | Description |
---|---|
Approximate feature map for additive chi2 kernel. |
|
kernel_approximation.Nystroem([kernel, …]) |
Approximate a kernel map using a subset of the training data. |
kernel_approximation.RBFSampler(*[, gamma, …]) |
Approximates feature map of an RBF kernel by Monte Carlo approximation of its Fourier transform. |
Approximates feature map of the “skewed chi-squared” kernel by Monte Carlo approximation of its Fourier transform. |
sklearn.kernel_ridge: Kernel Ridge Regression
Function | Description |
---|---|
kernel_ridge.KernelRidge([alpha, kernel, …]) |
Kernel ridge regression. |
sklearn.linear_model: Linear Models
Function | Description |
---|---|
Linear classifiers | |
linear_model.LogisticRegression([penalty, …]) |
Logistic Regression (aka logit, MaxEnt) classifier. |
linear_model.LogisticRegressionCV(*[, Cs, …]) |
Logistic Regression CV (aka logit, MaxEnt) classifier. |
Passive Aggressive Classifier |
|
linear_model.Perceptron(*[, penalty, alpha, …]) |
Read more in the User Guide. |
linear_model.RidgeClassifier([alpha, …]) |
Classifier using Ridge regression. |
linear_model.RidgeClassifierCV([alphas, …]) |
Ridge classifier with built-in cross-validation. |
linear_model.SGDClassifier([loss, penalty, …]) |
Linear classifiers (SVM, logistic regression, etc.) with SGD training. |
Classical linear regressors | |
linear_model.LinearRegression(*[, …]) |
Ordinary least squares Linear Regression. |
linear_model.Ridge([alpha, fit_intercept, …]) |
Linear least squares with l2 regularization. |
linear_model.RidgeCV([alphas, …]) |
Ridge regression with built-in cross-validation. |
linear_model.SGDRegressor([loss, penalty, …]) |
Linear model fitted by minimizing a regularized empirical loss with SGD |
Regressors with variable selection | |
linear_model.ElasticNet([alpha, l1_ratio, …]) |
Linear regression with combined L1 and L2 priors as regularizer. |
linear_model.ElasticNetCV(*[, l1_ratio, …]) |
Elastic Net model with iterative fitting along a regularization path. |
linear_model.Lars(*[, fit_intercept, …]) |
Least Angle Regression model a.k.a. |
linear_model.LarsCV(*[, fit_intercept, …]) |
Cross-validated Least Angle Regression model. |
linear_model.Lasso([alpha, fit_intercept, …]) |
Linear Model trained with L1 prior as regularizer (aka the Lasso) |
linear_model.LassoCV(*[, eps, n_alphas, …]) |
Lasso linear model with iterative fitting along a regularization path. |
linear_model.LassoLars([alpha, …]) |
Lasso model fit with Least Angle Regression a.k.a. |
linear_model.LassoLarsCV(*[, fit_intercept, …]) |
Cross-validated Lasso, using the LARS algorithm. |
linear_model.LassoLarsIC([criterion, …]) |
Lasso model fit with Lars using BIC or AIC for model selection |
Orthogonal Matching Pursuit model (OMP) |
|
Cross-validated Orthogonal Matching Pursuit model (OMP). |
|
Bayesian regressors | |
linear_model.ARDRegression(*[, n_iter, tol, …]) |
Bayesian ARD regression. |
linear_model.BayesianRidge(*[, n_iter, tol, …]) |
Bayesian ridge regression. |
Multi-task linear regressors with variable selection | |
linear_model.MultiTaskElasticNet([alpha, …]) |
Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer |
Multi-task L1/L2 ElasticNet with built-in cross-validation. |
|
linear_model.MultiTaskLasso([alpha, …]) |
Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer. |
linear_model.MultiTaskLassoCV(*[, eps, …]) |
Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer. |
Outlier-robust regressors | |
linear_model.HuberRegressor(*[, epsilon, …]) |
Linear regression model that is robust to outliers. |
RANSAC (RANdom SAmple Consensus) algorithm. |
|
linear_model.TheilSenRegressor(*[, …]) |
Theil-Sen Estimator: robust multivariate regression model. |
Generalized linear models (GLM) for regression | |
linear_model.PoissonRegressor(*[, alpha, …]) |
Generalized Linear Model with a Poisson distribution. |
linear_model.TweedieRegressor(*[, power, …]) |
Generalized Linear Model with a Tweedie distribution. |
linear_model.GammaRegressor(*[, alpha, …]) |
Generalized Linear Model with a Gamma distribution. |
Miscellaneous | |
Passive Aggressive Regressor |
|
linear_model.enet_path(X, y, *[, l1_ratio, …]) |
Compute elastic net path with coordinate descent. |
linear_model.lars_path(X, y[, Xy, Gram, …]) |
Compute Least Angle Regression or Lasso path using LARS algorithm [1] |
linear_model.lars_path_gram(Xy, Gram, *, …) |
lars_path in the sufficient stats mode [1] |
linear_model.lasso_path(X, y, *[, eps, …]) |
Compute Lasso path with coordinate descent |
linear_model.orthogonal_mp(X, y, *[, …]) |
Orthogonal Matching Pursuit (OMP) |
linear_model.orthogonal_mp_gram(Gram, Xy, *) |
Gram Orthogonal Matching Pursuit (OMP) |
linear_model.ridge_regression(X, y, alpha, *) |
Solve the ridge equation by the method of normal equations. |
sklearn.manifold: Manifold Learning
Function | Description |
---|---|
manifold.Isomap(*[, n_neighbors, …]) |
Isomap Embedding |
manifold.LocallyLinearEmbedding(*[, …]) |
Locally Linear Embedding |
manifold.MDS([n_components, metric, n_init, …]) |
Multidimensional scaling |
manifold.SpectralEmbedding([n_components, …]) |
Spectral embedding for non-linear dimensionality reduction. |
manifold.TSNE([n_components, perplexity, …]) |
t-distributed Stochastic Neighbor Embedding. |
sklearn.metrics: Metrics
Function | Description |
---|---|
Model Selection Interface | |
metrics.check_scoring(estimator[, scoring, …]) |
Determine scorer from user options. |
metrics.get_scorer(scoring) |
Get a scorer from string. |
metrics.make_scorer(score_func, *[, …]) |
Make a scorer from a performance metric or loss function. |
Classification metrics | |
metrics.accuracy_score(y_true, y_pred, *[, …]) |
Accuracy classification score. |
metrics.auc(x, y) |
Compute Area Under the Curve (AUC) using the trapezoidal rule |
metrics.average_precision_score(y_true, …) |
Compute average precision (AP) from prediction scores |
metrics.balanced_accuracy_score(y_true, …) |
Compute the balanced accuracy |
metrics.brier_score_loss(y_true, y_prob, *) |
Compute the Brier score. |
metrics.classification_report(y_true, y_pred, *) |
Build a text report showing the main classification metrics. |
metrics.cohen_kappa_score(y1, y2, *[, …]) |
Cohen’s kappa: a statistic that measures inter-annotator agreement. |
metrics.confusion_matrix(y_true, y_pred, *) |
Compute confusion matrix to evaluate the accuracy of a classification. |
metrics.dcg_score(y_true, y_score, *[, k, …]) |
Compute Discounted Cumulative Gain. |
metrics.f1_score(y_true, y_pred, *[, …]) |
Compute the F1 score, also known as balanced F-score or F-measure |
metrics.fbeta_score(y_true, y_pred, *, beta) |
Compute the F-beta score |
metrics.hamming_loss(y_true, y_pred, *[, …]) |
Compute the average Hamming loss. |
metrics.hinge_loss(y_true, pred_decision, *) |
Average hinge loss (non-regularized) |
metrics.jaccard_score(y_true, y_pred, *[, …]) |
Jaccard similarity coefficient score |
metrics.log_loss(y_true, y_pred, *[, eps, …]) |
Log loss, aka logistic loss or cross-entropy loss. |
metrics.matthews_corrcoef(y_true, y_pred, *) |
Compute the Matthews correlation coefficient (MCC) |
metrics.multilabel_confusion_matrix(y_true, …) |
Compute a confusion matrix for each class or sample |
metrics.ndcg_score(y_true, y_score, *[, k, …]) |
Compute Normalized Discounted Cumulative Gain. |
metrics.precision_recall_curve(y_true, …) |
Compute precision-recall pairs for different probability thresholds |
Compute precision, recall, F-measure and support for each class |
|
metrics.precision_score(y_true, y_pred, *[, …]) |
Compute the precision |
metrics.recall_score(y_true, y_pred, *[, …]) |
Compute the recall |
metrics.roc_auc_score(y_true, y_score, *[, …]) |
Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. |
metrics.roc_curve(y_true, y_score, *[, …]) |
Compute Receiver operating characteristic (ROC) |
metrics.zero_one_loss(y_true, y_pred, *[, …]) |
Zero-one classification loss. |
Regression metrics | |
metrics.explained_variance_score(y_true, …) |
Explained variance regression score function |
metrics.max_error(y_true, y_pred) |
max_error metric calculates the maximum residual error. |
metrics.mean_absolute_error(y_true, y_pred, *) |
Mean absolute error regression loss |
metrics.mean_squared_error(y_true, y_pred, *) |
Mean squared error regression loss |
metrics.mean_squared_log_error(y_true, y_pred, *) |
Mean squared logarithmic error regression loss |
metrics.median_absolute_error(y_true, y_pred, *) |
Median absolute error regression loss |
metrics.r2_score(y_true, y_pred, *[, …]) |
R^2 (coefficient of determination) regression score function. |
metrics.mean_poisson_deviance(y_true, y_pred, *) |
Mean Poisson deviance regression loss. |
metrics.mean_gamma_deviance(y_true, y_pred, *) |
Mean Gamma deviance regression loss. |
metrics.mean_tweedie_deviance(y_true, y_pred, *) |
Mean Tweedie deviance regression loss. |
Multilabel ranking metrics | |
metrics.coverage_error(y_true, y_score, *[, …]) |
Coverage error measure |
Compute ranking-based average precision |
|
metrics.label_ranking_loss(y_true, y_score, *) |
Compute Ranking loss measure |
Clustering metrics | |
Adjusted Mutual Information between two clusterings. |
|
metrics.adjusted_rand_score(labels_true, …) |
Rand index adjusted for chance. |
metrics.calinski_harabasz_score(X, labels) |
Compute the Calinski and Harabasz score. |
metrics.davies_bouldin_score(X, labels) |
Computes the Davies-Bouldin score. |
metrics.completeness_score(labels_true, …) |
Completeness metric of a cluster labeling given a ground truth. |
Build a contingency matrix describing the relationship between labels. |
|
metrics.fowlkes_mallows_score(labels_true, …) |
Measure the similarity of two clusterings of a set of points. |
Compute the homogeneity and completeness and V-Measure scores at once. |
|
metrics.homogeneity_score(labels_true, …) |
Homogeneity metric of a cluster labeling given a ground truth. |
metrics.mutual_info_score(labels_true, …) |
Mutual Information between two clusterings. |
Normalized Mutual Information between two clusterings. |
|
metrics.silhouette_score(X, labels, *[, …]) |
Compute the mean Silhouette Coefficient of all samples. |
metrics.silhouette_samples(X, labels, *[, …]) |
Compute the Silhouette Coefficient for each sample. |
metrics.v_measure_score(labels_true, …[, beta]) |
V-measure cluster labeling given a ground truth. |
Biclustering metrics | |
metrics.consensus_score(a, b, *[, similarity]) |
The similarity of two sets of biclusters. |
Pairwise metrics | |
Computes the additive chi-squared kernel between observations in X and Y |
|
metrics.pairwise.chi2_kernel(X[, Y, gamma]) |
Computes the exponential chi-squared kernel X and Y. |
metrics.pairwise.cosine_similarity(X[, Y, …]) |
Compute cosine similarity between samples in X and Y. |
Compute cosine distance between samples in X and Y. |
|
Valid metrics for pairwise_distances. |
|
metrics.pairwise.euclidean_distances(X[, Y, …]) |
Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors. |
Compute the Haversine distance between samples in X and Y |
|
Valid metrics for pairwise_kernels |
|
metrics.pairwise.laplacian_kernel(X[, Y, gamma]) |
Compute the laplacian kernel between X and Y. |
metrics.pairwise.linear_kernel(X[, Y, …]) |
Compute the linear kernel between X and Y. |
metrics.pairwise.manhattan_distances(X[, Y, …]) |
Compute the L1 distances between the vectors in X and Y. |
Calculate the euclidean distances in the presence of missing values. |
|
metrics.pairwise.pairwise_kernels(X[, Y, …]) |
Compute the kernel between arrays X and optional array Y. |
metrics.pairwise.polynomial_kernel(X[, Y, …]) |
Compute the polynomial kernel between X and Y. |
metrics.pairwise.rbf_kernel(X[, Y, gamma]) |
Compute the rbf (gaussian) kernel between X and Y. |
metrics.pairwise.sigmoid_kernel(X[, Y, …]) |
Compute the sigmoid kernel between X and Y. |
Computes the paired euclidean distances between X and Y |
|
Compute the L1 distances between the vectors in X and Y. |
|
Computes the paired cosine distances between X and Y |
|
metrics.pairwise.paired_distances(X, Y, *[, …]) |
Computes the paired distances between X and Y. |
metrics.pairwise_distances(X[, Y, metric, …]) |
Compute the distance matrix from a vector array X and optional Y. |
metrics.pairwise_distances_argmin(X, Y, *[, …]) |
Compute minimum distances between one point and a set of points. |
Compute minimum distances between one point and a set of points. |
|
metrics.pairwise_distances_chunked(X[, Y, …]) |
Generate a distance matrix chunk by chunk with optional reduction |
Plotting | |
metrics.plot_confusion_matrix(estimator, X, …) |
Plot Confusion Matrix. |
Plot Precision Recall Curve for binary classifiers. |
|
metrics.plot_roc_curve(estimator, X, y, *[, …]) |
Plot Receiver operating characteristic (ROC) curve. |
sklearn.mixture: Gaussian Mixture Models
Function | Description |
---|---|
mixture.BayesianGaussianMixture(*[, …]) |
Variational Bayesian estimation of a Gaussian mixture. |
mixture.GaussianMixture([n_components, …]) |
Gaussian Mixture. |
sklearn.model_selection: Model Selection
Function | Description |
---|---|
Splitter Classes | |
model_selection.GroupKFold([n_splits]) |
K-fold iterator variant with non-overlapping groups. |
Shuffle-Group(s)-Out cross-validation iterator |
|
model_selection.KFold([n_splits, shuffle, …]) |
K-Folds cross-validator |
Leave One Group Out cross-validator |
|
model_selection.LeavePGroupsOut(n_groups) |
Leave P Group(s) Out cross-validator |
Leave-One-Out cross-validator |
|
Leave-P-Out cross-validator |
|
model_selection.PredefinedSplit(test_fold) |
Predefined split cross-validator |
model_selection.RepeatedKFold(*[, n_splits, …]) |
Repeated K-Fold cross validator. |
Repeated Stratified K-Fold cross validator. |
|
model_selection.ShuffleSplit([n_splits, …]) |
Random permutation cross-validator |
model_selection.StratifiedKFold([n_splits, …]) |
Stratified K-Folds cross-validator |
Stratified ShuffleSplit cross-validator |
|
model_selection.TimeSeriesSplit([n_splits, …]) |
Time Series cross-validator |
Splitter Functions | |
model_selection.check_cv([cv, y, classifier]) |
Input checker utility for building a cross-validator |
model_selection.train_test_split(*arrays, …) |
Split arrays or matrices into random train and test subsets |
Hyper-parameter optimizers | |
model_selection.GridSearchCV(estimator, …) |
Exhaustive search over specified parameter values for an estimator. |
model_selection.ParameterGrid(param_grid) |
Grid of parameters with a discrete number of values for each. |
model_selection.ParameterSampler(…[, …]) |
Generator on parameters sampled from given distributions. |
Randomized search on hyper parameters. |
|
Model validation | |
model_selection.cross_validate(estimator, X) |
Evaluate metric(s) by cross-validation and also record fit/score times. |
model_selection.cross_val_predict(estimator, X) |
Generate cross-validated estimates for each input data point |
model_selection.cross_val_score(estimator, X) |
Evaluate a score by cross-validation |
model_selection.learning_curve(estimator, X, …) |
Learning curve. |
Evaluate the significance of a cross-validated score with permutations |
|
model_selection.validation_curve(estimator, …) |
Validation curve. |
sklearn.multiclass: Multiclass and multilabel classification
Function | Description |
---|---|
multiclass.OneVsRestClassifier(estimator, *) |
One-vs-the-rest (OvR) multiclass/multilabel strategy |
multiclass.OneVsOneClassifier(estimator, *) |
One-vs-one multiclass strategy |
multiclass.OutputCodeClassifier(estimator, *) |
(Error-Correcting) Output-Code multiclass strategy |
sklearn.naive_bayes: Naive Bayes
Function | Description |
---|---|
naive_bayes.BernoulliNB(*[, alpha, …]) |
Naive Bayes classifier for multivariate Bernoulli models. |
naive_bayes.CategoricalNB(*[, alpha, …]) |
Naive Bayes classifier for categorical features |
naive_bayes.ComplementNB(*[, alpha, …]) |
The Complement Naive Bayes classifier described in Rennie et al. |
naive_bayes.GaussianNB(*[, priors, …]) |
Gaussian Naive Bayes (GaussianNB) |
naive_bayes.MultinomialNB(*[, alpha, …]) |
Naive Bayes classifier for multinomial models |
sklearn.neighbors: Nearest Neighbors
Function | Description |
---|---|
neighbors.BallTree(X[, leaf_size, metric]) |
BallTree for fast generalized N-point problems |
DistanceMetric class |
|
neighbors.KDTree(X[, leaf_size, metric]) |
KDTree for fast generalized N-point problems |
neighbors.KernelDensity(*[, bandwidth, …]) |
Kernel Density Estimation. |
Classifier implementing the k-nearest neighbors vote. |
|
neighbors.KNeighborsRegressor([n_neighbors, …]) |
Regression based on k-nearest neighbors. |
neighbors.KNeighborsTransformer(*[, mode, …]) |
Transform X into a (weighted) graph of k nearest neighbors |
neighbors.LocalOutlierFactor([n_neighbors, …]) |
Unsupervised Outlier Detection using Local Outlier Factor (LOF) |
Classifier implementing a vote among neighbors within a given radius |
|
neighbors.RadiusNeighborsRegressor([radius, …]) |
Regression based on neighbors within a fixed radius. |
Transform X into a (weighted) graph of neighbors nearer than a radius |
|
neighbors.NearestCentroid([metric, …]) |
Nearest centroid classifier. |
neighbors.NearestNeighbors(*[, n_neighbors, …]) |
Unsupervised learner for implementing neighbor searches. |
Neighborhood Components Analysis |
sklearn.neural_network: Neural network models
Function | Description |
---|---|
neural_network.BernoulliRBM([n_components, …]) |
Bernoulli Restricted Boltzmann Machine (RBM). |
Multi-layer Perceptron classifier. |
|
Multi-layer Perceptron regressor. |
sklearn.pipeline: Pipeline
Function | Description |
---|---|
pipeline.FeatureUnion(transformer_list, *[, …]) |
Concatenates results of multiple transformer objects. |
pipeline.Pipeline(steps, *[, memory, verbose]) |
Pipeline of transforms with a final estimator. |
pipeline.make_pipeline(*steps, **kwargs) |
Construct a Pipeline from the given estimators. |
pipeline.make_union(*transformers, **kwargs) |
Construct a FeatureUnion from the given transformers. |
sklearn.preprocessing: Preprocessing and Normalization
Function | Description |
---|---|
preprocessing.Binarizer(*[, threshold, copy]) |
Binarize data (set feature values to 0 or 1) according to a threshold |
preprocessing.FunctionTransformer([func, …]) |
Constructs a transformer from an arbitrary callable. |
preprocessing.KBinsDiscretizer([n_bins, …]) |
Bin continuous data into intervals. |
Center a kernel matrix |
|
preprocessing.LabelBinarizer(*[, neg_label, …]) |
Binarize labels in a one-vs-all fashion |
Encode target labels with value between 0 and n_classes-1. |
|
Transform between iterable of iterables and a multilabel format |
|
preprocessing.MaxAbsScaler(*[, copy]) |
Scale each feature by its maximum absolute value. |
preprocessing.MinMaxScaler([feature_range, copy]) |
Transform features by scaling each feature to a given range. |
preprocessing.Normalizer([norm, copy]) |
Normalize samples individually to unit norm. |
preprocessing.OneHotEncoder(*[, categories, …]) |
Encode categorical features as a one-hot numeric array. |
preprocessing.OrdinalEncoder(*[, …]) |
Encode categorical features as an integer array. |
preprocessing.PolynomialFeatures([degree, …]) |
Generate polynomial and interaction features. |
preprocessing.PowerTransformer([method, …]) |
Apply a power transform featurewise to make data more Gaussian-like. |
Transform features using quantiles information. |
|
preprocessing.RobustScaler(*[, …]) |
Scale features using statistics that are robust to outliers. |
preprocessing.StandardScaler(*[, copy, …]) |
Standardize features by removing the mean and scaling to unit variance |
preprocessing.add_dummy_feature(X[, value]) |
Augment dataset with an additional dummy feature. |
preprocessing.binarize(X, *[, threshold, copy]) |
Boolean thresholding of array-like or scipy.sparse matrix |
preprocessing.label_binarize(y, *, classes) |
Binarize labels in a one-vs-all fashion |
preprocessing.maxabs_scale(X, *[, axis, copy]) |
Scale each feature to the [-1, 1] range without breaking the sparsity. |
preprocessing.minmax_scale(X[, …]) |
Transform features by scaling each feature to a given range. |
preprocessing.normalize(X[, norm, axis, …]) |
Scale input vectors individually to unit norm (vector length). |
preprocessing.quantile_transform(X, *[, …]) |
Transform features using quantiles information. |
preprocessing.robust_scale(X, *[, axis, …]) |
Standardize a dataset along any axis |
preprocessing.scale(X, *[, axis, with_mean, …]) |
Standardize a dataset along any axis |
preprocessing.power_transform(X[, method, …]) |
Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. |
sklearn.random_projection: Random projection
Function | Description |
---|---|
Reduce dimensionality through Gaussian random projection |
|
Reduce dimensionality through sparse random projection |
sklearn.semi_supervised: Semi-Supervised Learning
Function | Description |
---|---|
semi_supervised.LabelPropagation([kernel, …]) |
Label Propagation classifier |
semi_supervised.LabelSpreading([kernel, …]) |
LabelSpreading model for semi-supervised learning |
sklearn.svm: Support Vector Machines
Function | Description |
---|---|
Estimators | |
svm.LinearSVC([penalty, loss, dual, tol, C, …]) |
Linear Support Vector Classification. |
svm.LinearSVR(*[, epsilon, tol, C, loss, …]) |
Linear Support Vector Regression. |
svm.NuSVC(*[, nu, kernel, degree, gamma, …]) |
Nu-Support Vector Classification. |
svm.NuSVR(*[, nu, C, kernel, degree, gamma, …]) |
Nu Support Vector Regression. |
svm.OneClassSVM(*[, kernel, degree, gamma, …]) |
Unsupervised Outlier Detection. |
svm.SVC(*[, C, kernel, degree, gamma, …]) |
C-Support Vector Classification. |
svm.SVR(*[, kernel, degree, gamma, coef0, …]) |
Epsilon-Support Vector Regression. |
svm.l1_min_c(X, y, *[, loss, fit_intercept, …]) |
sklearn.tree: Decision Trees
Function | Description |
---|---|
tree.DecisionTreeClassifier(*[, criterion, …]) |
A decision tree classifier. |
tree.DecisionTreeRegressor(*[, criterion, …]) |
A decision tree regressor. |
tree.ExtraTreeClassifier(*[, criterion, …]) |
An extremely randomized tree classifier. |
tree.ExtraTreeRegressor(*[, criterion, …]) |
An extremely randomized tree regressor. |
tree.export_graphviz(decision_tree[, …]) |
Export a decision tree in DOT format. |
tree.export_text(decision_tree, *[, …]) |
Build a text report showing the rules of a decision tree. |
tree.plot_tree(decision_tree, *[, …]) |
sklearn.utils: Utilities
Function | Description |
---|---|
Find the minimum value of an array over positive values |
|
utils.as_float_array(X, *[, copy, …]) |
Converts an array-like to an array of floats. |
utils.assert_all_finite(X, *[, allow_nan]) |
Throw a ValueError if X contains NaN or infinity. |
utils.Bunch(**kwargs) |
Container object exposing keys as attributes |
utils.check_X_y(X, y[, accept_sparse, …]) |
Input validation for standard estimators. |
utils.check_array(array[, accept_sparse, …]) |
Input validation on an array, list, sparse matrix or similar. |
utils.check_scalar(x, name, target_type, *) |
Validate scalar parameters type and value. |
utils.check_consistent_length(*arrays) |
Check that all arrays have consistent first dimensions. |
utils.check_random_state(seed) |
Turn seed into a np.random.RandomState instance |
Estimate class weights for unbalanced datasets. |
|
Estimate sample weights by class for unbalanced datasets. |
|
utils.deprecated([extra]) |
Decorator to mark a function or class as deprecated. |
utils.estimator_checks.check_estimator(Estimator) |
Check if estimator adheres to scikit-learn conventions. |
Pytest specific decorator for parametrizing estimator checks. |
|
utils.estimator_html_repr(estimator) |
Build a HTML representation of an estimator. |
utils.extmath.safe_sparse_dot(a, b, *[, …]) |
Dot product that handle the sparse matrix case correctly |
Computes an orthonormal matrix whose range approximates the range of A. |
|
utils.extmath.randomized_svd(M, n_components, *) |
Computes a truncated randomized SVD |
Compute log(det(A)) for A symmetric |
|
utils.extmath.density(w, **kwargs) |
Compute density of a sparse vector |
utils.extmath.weighted_mode(a, w, *[, axis]) |
Returns an array of the weighted modal (most common) value in a |
utils.gen_even_slices(n, n_packs, *[, n_samples]) |
Generator to create n_packs slices going up to n. |
Return the shortest path length from source to all reachable nodes. |
|
Perform a shortest-path graph search on a positive directed or undirected graph. |
|
utils.indexable(*iterables) |
Make arrays indexable for cross-validation. |
Create a decorator for methods that are delegated to a sub-estimator |
|
Determine the type of data indicated by the target. |
|
Check if y is in a multilabel format. |
|
Extract an ordered array of unique labels |
|
Compute the 32bit murmurhash3 of key at seed. |
|
utils.resample(*arrays, **options) |
Resample arrays or sparse matrices in a consistent way |
utils._safe_indexing(X, indices, *[, axis]) |
Return rows, items or columns of X using indices. |
utils.safe_mask(X, mask) |
Return a mask which is safe to use on X. |
utils.safe_sqr(X, *[, copy]) |
Element wise squaring of array-likes and sparse matrices. |
utils.shuffle(*arrays, **options) |
Shuffle arrays or sparse matrices in a consistent way |
Compute incremental mean and variance along an axix on a CSR or CSC matrix. |
|
utils.sparsefuncs.inplace_column_scale(X, scale) |
Inplace column scaling of a CSC/CSR matrix. |
utils.sparsefuncs.inplace_row_scale(X, scale) |
Inplace row scaling of a CSR or CSC matrix. |
utils.sparsefuncs.inplace_swap_row(X, m, n) |
Swaps two rows of a CSC/CSR matrix in-place. |
Swaps two columns of a CSC/CSR matrix in-place. |
|
utils.sparsefuncs.mean_variance_axis(X, axis) |
Compute mean and variance along an axix on a CSR or CSC matrix |
Inplace column scaling of a CSR matrix. |
|
Inplace row normalize using the l1 norm |
|
Inplace row normalize using the l2 norm |
|
Sample integers without replacement. |
|
utils.validation.check_is_fitted(estimator) |
Perform is_fitted validation for estimator. |
utils.validation.check_memory(memory) |
Check that memory is joblib.Memory-like. |
utils.validation.check_symmetric(array, *[, …]) |
Make sure that array is 2D, square and symmetric. |
utils.validation.column_or_1d(y, *[, warn]) |
Ravel column or 1d numpy array, else raises an error |
Checks whether the estimator’s fit method supports the given parameter. |
|
utils.all_estimators([type_filter]) |
Get a list of all estimators from sklearn. |
utils.parallel_backend(backend[, n_jobs, …]) |
Change the default backend used by Parallel inside a with block. |
utils.register_parallel_backend(name, factory) |
Register a new Parallel backend factory. |