Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta-issue: frameless-ml #215

Open
8 of 68 tasks
atamborrino opened this issue Nov 30, 2017 · 0 comments
Open
8 of 68 tasks

meta-issue: frameless-ml #215

atamborrino opened this issue Nov 30, 2017 · 0 comments

Comments

@atamborrino
Copy link
Contributor

atamborrino commented Nov 30, 2017

Meta-issue to list what has been done in frameless-ml and what remains to be done.

Spark ML docs: https://spark.apache.org/docs/latest/ml-guide.html

Abstractions

  • TypedTransformer, the type-safe equivalent of Spark ML Transformer
  • TypedEstimator, the type-safe equivalent of Spark ML Estimator
  • TypedPipeline, the type-safe equivalent of Spark ML Pipeline
  • TypedEvaluator, the type-safe equivalent of Spark ML Evaluator

Typed transformers

  • TF-IDF
  • Word2Vec
  • CountVectorizer
  • Tokenizer
  • StopWordsRemover
  • nn-gram
  • Binarizer
  • PCA
  • PolynomialExpansion
  • Discrete Cosine Transform (DCT)
  • StringIndexer
  • IndexToString
  • OneHotEncoder
  • VectorIndexer
  • Interaction
  • Normalizer
  • StandardScaler
  • MinMaxScaler
  • MaxAbsScaler
  • Bucketizer
  • ElementwiseProduct
  • SQLTransformer
  • VectorAssembler
  • QuantileDiscretizer
  • Imputer
  • Feature Selectors
  • VectorSlicer
  • RFormula
  • ChiSqSelector
  • Locality Sensitive Hashing
  • LSH Operations
  • Approximate Similarity Join
  • Approximate Nearest Neighbor Search
  • LSH Algorithms
  • Bucketed Random Projection for Euclidean Distance
  • MinHash for Jaccard Distance

Typed estimators

  • Binomial logistic regression
  • Multinomial logistic regression
  • Decision tree classifier
  • Random forest classifier
  • Gradient-boosted tree classifier
  • Multilayer perceptron classifier
  • Linear Support Vector Machine
  • One-vs-Rest classifier (a.k.a. One-vs-All)
  • Naive Bayes
  • Linear regression
  • Generalized linear regression
  • Available families
  • Decision tree regression
  • Random forest regression
  • Gradient-boosted tree regression
  • Survival regression
  • Isotonic regression
  • K-means
  • Latent Dirichlet allocation (LDA)
  • Bisecting k-means
  • Gaussian Mixture Model (GMM)
  • ALS
  • FP-Growth
  • CrossValidator
  • TrainValidationSplit

Typed evaluators

  • RegressionEvaluator
  • BinaryClassificationEvaluator
  • MulticlassClassificationEvaluator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants