Sparse data category #3

maciejkula · 2016-08-24T04:33:59Z

I think it might be worth creating a category for packages that deal with sparse data well --- this is extremely important for all sorts of NLP and recommendation applications, where very large and very sparse matrices are commonplace.

Obviously, rustlearn supports this, or all of its classifiers (including random forests!) :)

anowell · 2016-08-24T04:52:09Z

I'll be the first to admit that I don't think I got the categories exactly correct (or that I could find any 2 resources that agreed on a way to categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point me to any other crates you think might be candidates for living there? (either in this issue or as a PR)

maciejkula · 2016-08-24T04:53:19Z

Yes, it's far from clear. I'll have a think and see if I can come up with
something.

On 23 Aug 2016 21:52, "Anthony Nowell" notifications@github.com wrote:

I'll be the first to admit that I don't think I got the categories exactly
correct (or that I could find any 2 resources that agreed on a way to
categorize ML that is useful for describing a language ecosystem)

Would you be willing to draft up the initial category overview and point
me to any other crates you think might be candidates for living there?
(either in this issue or as a PR)

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#3 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACSCA2b2Agh7BJMX5qPRDXc4lZnMaaJaks5qi835gaJpZM4JroIC
.

bluss · 2016-11-13T14:38:36Z

possibly look at https://github.com/vbarrielle/sprs too

xpe · 2018-11-23T03:32:18Z

I'll chime in. I agree that "sparse data support" is a thread that runs through many matrix, ML, and NLP libraries. That said, I'm not sure if it warrants its own category.

I currently lean towards saying "maybe not". Here's two reasons why.

I tend to think of sparse data support as something that ML practitioners tend to look for after they've chosen an approach. Put another way, ML practitioners search for certain primary functionality or capabilities first, and then after look to find sparsity support. (To put it another way, I'm not sure how often a practitioner would say, "I'm only going to choose from ML approaches that already include sparse data support.")
If you we add "sparse data" as a new category, we might be getting into the weeds (i.e. an excessive level of detail). Would we be inviting an explosion of relatively minor categories? Just something to think about.

GH Actions publishing: take #3

anowell added a commit that referenced this issue Dec 9, 2021

GH Actions publishing: take #3

9965b32

anowell added a commit that referenced this issue Dec 9, 2021

Merge pull request #110 from anowell/ci

9bf0b9f

GH Actions publishing: take #3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse data category #3

Sparse data category #3

maciejkula commented Aug 24, 2016

anowell commented Aug 24, 2016

maciejkula commented Aug 24, 2016

bluss commented Nov 13, 2016

xpe commented Nov 23, 2018 •

edited

Sparse data category #3

Sparse data category #3

Comments

maciejkula commented Aug 24, 2016

anowell commented Aug 24, 2016

maciejkula commented Aug 24, 2016

bluss commented Nov 13, 2016

xpe commented Nov 23, 2018 • edited

xpe commented Nov 23, 2018 •

edited