Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to increase the tool bar for the list #383

Open
zhimin-z opened this issue Mar 3, 2023 · 7 comments
Open

Proposal to increase the tool bar for the list #383

zhimin-z opened this issue Mar 3, 2023 · 7 comments

Comments

@zhimin-z
Copy link
Contributor

zhimin-z commented Mar 3, 2023

I recommend lifting the bar of the tools due to the bloating number of tools (~471) nowadays. We lift the bars and only keep the more impactful tools on our list since I notice a great number of tools (~13%) in our list are becoming unmaintained for several years (Of course, their repos' stars stop growing as well).

How about lifting the minimum requirements in terms of Git repo stars up to 500?

Why 500 stars rather than 300, or 200?

Empirically, I notice a great sum of tools are losing popularity and eventually becoming obsolete (or unmaintained) before they reach the ceiling of 500 stars.

What do you think? @axsaucedo

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Mar 4, 2023

Also, there exist some tools that show up many times in the list:

image
image
image
image
image

As mentioned in #352
We might make a better standard if we want to keep an exclusive (or a most impactful) version of a specific tool OR ELSE we are allowed to keep multiple copies in our list. Right now the standard seems very inconsistent. @axsaucedo

@axsaucedo
Copy link
Collaborator

This is a really good idea, and I do like the concept, however there's a couple of poits discussed, answering each:

  • Duplicates = yes let's definitely remove - we can have a separate issue to ensure we address that one, good find!
  • Raising the number of stars - that's an interesting suggestion, and I definitely see where you are coming from
    • I do agree that 100 stars are quite low, I do also find that stars are an important metric but not the only one
    • I would certainly be keen to revisit the guidelines to see what we could raise the bar on
    • I also wonder if we could think of a way to automatically validate this (ie do a quick automated check once we merge the release PR)

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Mar 5, 2023

  • I also wonder if we could think of a way to automatically validate this (ie do a quick automated check once we merge the release PR)

Thanks for your feedback. I wonder what it means to automatically validate this (ie do a quick automated check once we merge the release PR)? What do you have in mind? @axsaucedo

@zhimin-z zhimin-z mentioned this issue Mar 5, 2023
@zhimin-z
Copy link
Contributor Author

zhimin-z commented Mar 5, 2023

This is a really good idea, and I do like the concept, however there's a couple of poits discussed, answering each:

  • Duplicates = yes let's definitely remove - we can have a separate issue to ensure we address that one, good find!

  • Raising the number of stars - that's an interesting suggestion, and I definitely see where you are coming from

    • I do agree that 100 stars are quite low, I do also find that stars are an important metric but not the only one
    • I would certainly be keen to revisit the guidelines to see what we could raise the bar on
    • I also wonder if we could think of a way to automatically validate this (ie do a quick automated check once we merge the release PR)

I do know using the repo stars may not be the best criteria to differentiate impactful vs less/unimpactful tools but this is perhaps the easiest and most efficient way to achieve so. Considering the fast-changing pace of ML tools, it has the potential to eventually become de facto most appropriate way to decide whether or not to include a tool. Do you agree, @axsaucedo ?

@axsaucedo
Copy link
Collaborator

I don't see why not increasing it at least slightly, to 200 or 300, let's give it a try by this amount, I do agree that it's a good heuristic

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Mar 6, 2023

How about the update frequency? If a tool has not been updated/maintained for one, or two years, shall we still keep it on our list? We need to specify the update frequency explicitly in the guideline to exclude obsolete tools @axsaucedo

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Mar 7, 2023

I don't see why not increasing it at least slightly, to 200 or 300, let's give it a try by this amount, I do agree that it's a good heuristic

The core logic for why we need to set up a high bar (>500, or 1000) rather than a medium one (>200, or 300) is that: if an OSS tool is used for production-level ML deployment, then it usually draws enough attention from the industry. If not, we perhaps need to reevaluate this tool since not so many practitioners pay attention to (or use) this tool in real-world practices.

In reality, we seldom think highly of an OSS tool nowadays when it is little known or used by practitioners (particularly beneath the fact that OSS culture is so popular worldwide and we can almost see the OSS community as a perfect competition market). The most common case is that, as far as we notice, when a tool is widely recognized and used by practitioners, the repo stars increase at a steady pace and eventually reach a high level (>>1k). Thus, I still recommend a high bar rather than a medium one. @axsaucedo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants