Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #7

Open
20 of 24 tasks
LukeMathWalker opened this issue Dec 1, 2019 · 80 comments
Open
20 of 24 tasks

Roadmap #7

LukeMathWalker opened this issue Dec 1, 2019 · 80 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@LukeMathWalker
Copy link
Contributor

LukeMathWalker commented Dec 1, 2019

In terms of functionality, the mid-term end goal is to achieve an offering of ML algorithms and pre-processing routines comparable to what is currently available in Python's scikit-learn.

These algorithms can either be:

  • re-implemented in Rust;
  • re-exported from an existing Rust crate, if available on crates.io with a compatible interface.

In no particular order, focusing on the main gaps:

  • Clustering:

    • DBSCAN
    • Spectral clustering;
    • Hierarchical clustering;
    • OPTICS.
  • Preprocessing:

    • PCA
    • ICA
    • Normalisation
    • CountVectoriser
    • TFIDF
    • t-SNE
  • Supervised Learning:

    • Linear regression;
    • Ridge regression;
    • LASSO;
    • ElasticNet;
    • Support vector machines;
    • Nearest Neighbours;
    • Gaussian processes; (integrating friedrich - tracking issue Integrating friedrich into linfa nestordemeure/friedrich#1)
    • Decision trees;
    • Random Forest
    • Naive Bayes
    • Logistic Regression
    • Ensemble Learning
    • Least Angle Regression
    • PLS

The collection is on purpose loose and non-exhaustive, it will evolve over time - if there is an ML algorithm that you find yourself using often on a day to day, please feel free to contribute it 💯

@LukeMathWalker LukeMathWalker added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Dec 1, 2019
@LukeMathWalker LukeMathWalker pinned this issue Dec 1, 2019
@Nimpruda
Copy link

Nimpruda commented Dec 2, 2019

Hi, I'm eager to help I'll take Linear regression, Lasso and ridge.

@LukeMathWalker
Copy link
Contributor Author

Cool! I worked a bit on linear regression a while ago - you can find a very vanilla implementation of it here: https://github.com/rust-ndarray/ndarray-examples/tree/master/linear_regression @Nimpruda

@InCogNiTo124
Copy link
Contributor

InCogNiTo124 commented Dec 2, 2019

What does Normalization mean, is it like sklearn's StandardScaler or something else?

@LukeMathWalker
Copy link
Contributor Author

Exactly @InCogNiTo124.

@ADMoreau
Copy link

ADMoreau commented Dec 2, 2019

This is an interesting project and I will work on the PCA implementation

@nestordemeure
Copy link

I am the author of the friedrich crate which implements Gaussian Processes.

While it is still a work in progress, it is fully featured and I would be happy to help integrate it into the project if you have directions to do so.

@LukeMathWalker
Copy link
Contributor Author

That would be awesome @nestordemeure - I'll have a look at the project and I'll get back to you! Should I open an issue on friedrich's repository when I am ready? Or would you prefer it to be tracked here on the linfa repository?

@nestordemeure
Copy link

Both are ok with me.

An issue in friedrich's repository might help avoid overcrowning linfa with issues but do as you prefer.

@mstallmo
Copy link

mstallmo commented Dec 2, 2019

I'd love to take the Nearest Neighbors implementation

@milesgranger
Copy link

milesgranger commented Dec 3, 2019

I think this is really great, I just started on a sklearn like implementation of their pipelines, here but more or less for experimentation without anything serious. I'll be sure to keep my eye on issues/goals here and help out where I can. Thanks for the initiative! 👏

@ChristopherRabotin
Copy link

Hi there! First off, I don't have any experience in ML, but I read a lot about it (and listen to way too many podcasts on the topic). I'm interested in jumping in. I have quite some experience developing in Rust, and specifically high fidelity simulation tools (cf nyx and hifitime).

I wrote an Ant Colony Optimizer in Rust. ACOs are great for traversing graphs which represent a solution space, a problem which is considered NP hard if I'm not mistaken. Is that something used at all in ML? If so, would it be of interest to this library, or is there a greater interest (for now) to focus on the problems listed in the first post?

Cheers

@Nimpruda
Copy link

Nimpruda commented Dec 4, 2019

Hi @ChristopherRabotin I've never heard of ACOs but as it's in relation with graphs you should check if it has any uses with Markov Chains.

@ChristopherRabotin
Copy link

So far, I haven't found how both can be used together. The closest I found was finding several papers which use Markov Chains to analyze ACOs.

@onehr
Copy link

onehr commented Dec 6, 2019

I would like to take the Naive Bayes one.

@tyfarnan
Copy link

tyfarnan commented Dec 8, 2019

I'll take on Gaussian Processes.

@bplevin36
Copy link

bplevin36 commented Dec 8, 2019

I'll put some work towards the text tokenization algorithms (CountVectorizer and TFIDF). I'm also extremely interested in a good SVM implementation in Rust. Whoever is working on that, let me know if you'd like some help or anything.

@LukeMathWalker
Copy link
Contributor Author

LukeMathWalker commented Dec 8, 2019

Please take a look at what is already out there before diving head down into a reimplementation @tyfarnan - I haven't had the time to look at friedrich by @nestordemeure yet (taking a break after the final push to release the blog post and related code 😅) but we should definitely start from there as well as the GP sub-module in rusty-machine.

@nestordemeure
Copy link

@tyfarnan, don't hesitate to contact me via an issue on friedrich's repository once @LukeMathWalker has explicited what is expected of code that is integrated into Linfa and how this integration will be done.

@DallasC
Copy link

DallasC commented Dec 11, 2019

I did a quick round up of crates that implement the algorithms listed on the roadmap. Probably missed quite a few too but this can be a good starting point.

It was just a quick search so I don't know how reliavent each crate is but I tried to make a note if the crate was old and unmaintained. Hopefully this can be useful for helping with algorithm design or saving us from having to reimplement something that is already there.

Algo ecosystem gist

@LukeMathWalker
Copy link
Contributor Author

Tracking friedrich<>linfa integration here: nestordemeure/friedrich#1

@LukeMathWalker
Copy link
Contributor Author

I have updated the Issue to make sure it's immediately clear who is working on what and what items are still looking for an owner 👍

@InCogNiTo124
Copy link
Contributor

hey @LukeMathWalker could you add me next to the normalization? I plan to do it by New Year's as I'm still not very experienced with Rust, but I have an idea how to implement it

@LukeMathWalker
Copy link
Contributor Author

Done @InCogNiTo124 🙏

@xd009642
Copy link
Member

Started implementing DBScan in #12.

Also if there are suggestions Gaussian Mixture Models would be cool

@LukeMathWalker
Copy link
Contributor Author

Implementation of DBSCAN merged to master - thanks @xd009642 🙏

@bytesnake
Copy link
Member

hi all i'd like to help implement too. what's the best way to pick up a task?

not difficult, just mention your interest here and I will add you to the list once you've submitted the initial draft :)

@Clara322
Copy link

Is there any interest for linfa supporting model selection algorithms such as grid search or hyperparameter tuning?

@xd009642
Copy link
Member

@Clara322 I personally think that would be a good candidate for a new linfa crate if you want to open an issue for it specifically then there can be some discussion on the specifics of what the design will look like and the steps to implement it 👍

@vaijira
Copy link

vaijira commented Dec 18, 2021

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

@erkasc01
Copy link

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

@bytesnake
Copy link
Member

bytesnake commented Jan 28, 2022

Could it be added "causal inference" like https://github.com/microsoft/dowhy library or would be out of scope for linfa?

there are many interesting pattern which linfa can learn from but we would need first to support graphical models

Hi everyone, I've implemented the semi-supervised learning algorithm called dynamic label propagation using Rust. I'm getting accuracy score up to 98% for one of the datasets I've been using. I don't think this algorithm is very well known, but could it be added to the Linfa library?

cool, sure! Once you have a working prototype, submit a PR and I will review the integration. We have to see how to add support for incomplete datasets though

@vaijira
Copy link

vaijira commented Feb 5, 2022

@bytesnake I'm playing with it, creating graph and identification support. If one day i feel it can be ready i'll submit a PR. https://github.com/vaijira/linfa/tree/causal/algorithms/linfa-causal-inference

@YuhanLiin
Copy link
Collaborator

Infrastructure Goals

Aside from just adding new algorithms, there are also some infrastructure tasks that will significantly improve the ergonomics and performance of Linfa. They are listed here in descending order of importance, in my opinion:

@oojo12
Copy link
Contributor

oojo12 commented Oct 21, 2022

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

@bernardo-sb
Copy link
Contributor

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

@YuhanLiin
Copy link
Collaborator

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

@YuhanLiin
Copy link
Collaborator

I've been working on some features like: Categorical Encoding, MAPE and random forest. How can I contribute?

Random forests are covered by this PR.

Categorical encoding would go into linfa-preprocessing. I'm pretty sure we don't have it but just check to make sure.

MAPE is a simple function that would go into linfa/src/metrics_regression.rs

@oojo12
Copy link
Contributor

oojo12 commented Oct 23, 2022

Can I work on adding Linear Discriminant Analysis to linfa? Here is a link to the Sklearn analog

Does LDA output the dimensionally-reduced data at all? If so it should go into linfa-reduction

It can perform dimensionality-reduction (transform). It can also just be used to predict classes (predict). The parentheses hold the method analog in Sklearn. Is there a preference for which should be implemented? Also, I am still getting familiar with Rust so it may take a few weeks to get done.

@YuhanLiin
Copy link
Collaborator

Preferably implement both if possible.

@oojo12
Copy link
Contributor

oojo12 commented Oct 24, 2022 via email

@LundinMachine
Copy link

Are there plans to implement ridge regression in the linear sub-package? Looking for models to contribute.

@YuhanLiin
Copy link
Collaborator

Ridge regression should already be in linfa-elasticnet

@LundinMachine
Copy link

What about imputation, similar to scikit imput?

@YuhanLiin
Copy link
Collaborator

We don't have that. That can go in linfa-preprocessing

@HridayM25
Copy link

HridayM25 commented Mar 12, 2023

Hi!
Can I take up Random Forests?
Also can we look to implement xgboost and adaboost?

@YuhanLiin
Copy link
Collaborator

#229 implements bootstrap aggregation, which is a generalization of random forests, so you could work on that.

xgboost and adaboost seem to both be ensemble algorithms that are not necessarily tied to decision trees (correct me if I'm wrong), so we should probably put them in a new algorithm crate called linfa-emsemble or something. Bootstrap aggregation should probably go in there as well.

@sebasv
Copy link

sebasv commented Sep 25, 2023

I'd like to contribute quantile regression

@MarekJReid
Copy link

In terms of functionality, the mid-term end goal is to achieve an offering of ML algorithms and pre-processing routines comparable to what is currently available in Python's scikit-learn.

These algorithms can either be:

* re-implemented in Rust;

* re-exported from an existing Rust crate, if available on [crates.io](crates.io) with a compatible interface.

In no particular order, focusing on the main gaps:

* Clustering:
  
  * [x]  DBSCAN
  * [x]  Spectral clustering;
  * [x]  Hierarchical clustering;
  * [x]  OPTICS.

* Preprocessing:
  
  * [x]  PCA
  * [x]  ICA
  * [x]  Normalisation
  * [x]  CountVectoriser
  * [x]  TFIDF
  * [x]  t-SNE

* Supervised Learning:
  
  * [x]  Linear regression;
  * [x]  Ridge regression;
  * [x]  LASSO;
  * [x]  ElasticNet;
  * [x]  Support vector machines;
  * [x]  Nearest Neighbours;
  * [ ]  Gaussian processes; (integrating `friedrich` - tracking issue [Integrating friedrich into linfa nestordemeure/friedrich#1](https://github.com/nestordemeure/friedrich/issues/1))
  * [x]  Decision trees;
  * [ ]  Random Forest
  * [x]  Naive Bayes
  * [x]  Logistic Regression
  * [ ]  Ensemble Learning
  * [ ]  Least Angle Regression
  * [x]  PLS

The collection is on purpose loose and non-exhaustive, it will evolve over time - if there is an ML algorithm that you find yourself using often on a day to day, please feel free to contribute it 💯

Id love to take on Random Forest! I have previously implemented it simplistically in Go, but I'd love to make it happen in Rust. This is my first open source contribution - let me know how I can make it happen :)

@giorgiozoppi
Copy link

I'd also would like to help this.

@AndersonYin
Copy link

I'm interested in the least angle regression (lars). It seems that PR #115 was trying to implement it but it has paused for 3 years. So I guess it's basically abolished. I'm going to pick it up.

@giorgiozoppi
Copy link

I am interested in random forests.

@zenconnor
Copy link

@MarekJReid @giorgiozoppi did either of you take a chance at random forests?

@giorgiozoppi
Copy link

giorgiozoppi commented Mar 9, 2024

i look into. At school we did this week. For python binding maturin is perfect. @zenconnor should i look inside scitkit-learn? I was looking at scikit learn implementation, as soon I can i provide a class diagram of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests