Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse data category #3

Open
maciejkula opened this issue Aug 24, 2016 · 4 comments
Open

Sparse data category #3

maciejkula opened this issue Aug 24, 2016 · 4 comments

Comments

@maciejkula
Copy link
Contributor

I think it might be worth creating a category for packages that deal with sparse data well --- this is extremely important for all sorts of NLP and recommendation applications, where very large and very sparse matrices are commonplace.

Obviously, rustlearn supports this, or all of its classifiers (including random forests!) :)

@anowell
Copy link
Owner

anowell commented Aug 24, 2016

I'll be the first to admit that I don't think I got the categories exactly correct (or that I could find any 2 resources that agreed on a way to categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point me to any other crates you think might be candidates for living there? (either in this issue or as a PR)

@maciejkula
Copy link
Contributor Author

Yes, it's far from clear. I'll have a think and see if I can come up with
something.

On 23 Aug 2016 21:52, "Anthony Nowell" notifications@github.com wrote:

I'll be the first to admit that I don't think I got the categories exactly
correct (or that I could find any 2 resources that agreed on a way to
categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point
me to any other crates you think might be candidates for living there?
(either in this issue or as a PR)


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#3 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACSCA2b2Agh7BJMX5qPRDXc4lZnMaaJaks5qi835gaJpZM4JroIC
.

@bluss
Copy link

bluss commented Nov 13, 2016

possibly look at https://github.com/vbarrielle/sprs too

@xpe
Copy link
Contributor

xpe commented Nov 23, 2018

I'll chime in. I agree that "sparse data support" is a thread that runs through many matrix, ML, and NLP libraries. That said, I'm not sure if it warrants its own category.

I currently lean towards saying "maybe not". Here's two reasons why.

  1. I tend to think of sparse data support as something that ML practitioners tend to look for after they've chosen an approach. Put another way, ML practitioners search for certain primary functionality or capabilities first, and then after look to find sparsity support. (To put it another way, I'm not sure how often a practitioner would say, "I'm only going to choose from ML approaches that already include sparse data support.")

  2. If you we add "sparse data" as a new category, we might be getting into the weeds (i.e. an excessive level of detail). Would we be inviting an explosion of relatively minor categories? Just something to think about.

anowell added a commit that referenced this issue Dec 9, 2021
anowell added a commit that referenced this issue Dec 9, 2021
GH Actions publishing: take #3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants