Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling this effort up #32

Open
syhw opened this issue May 7, 2019 · 2 comments
Open

Scaling this effort up #32

syhw opened this issue May 7, 2019 · 2 comments

Comments

@syhw
Copy link
Owner

syhw commented May 7, 2019

What should be included in wer_are_we?

When I started this repo, I put only numbers that I trusted, a (very few) of them I reproduced, or I knew they were reproducible. Now it seems (from the past year issues and pull requests) that people want it to be exhaustive. In this regard, there are two broad families of solutions:

  1. an extensive editorial work, it means vetting the results somehow, leveraging any of:
    1. reproduced work
    2. cited work
    3. network of trust ("editorial board")
    4. $your_suggestion
  2. be exhaustive but give metrics/indicators for readers to form their own judgement.

I don't want to be the gatekeeper for the community, but I do care about trustable numbers and reproducible science. Classic concerns are validating on the test set, having a language model including the test set, and basically human error. Still, it doesn't mean that even slightly bogus numbers (plain wrong ones are still banished) are not interesting, they should just be taken with a grain of salt. Otherwise I'm an adept of "trust, but verify" a.k.a. optimism in the face of uncertainty. Thus, I am leaning towards 2. and adding a column for "trustability" that gives (if there is) an argument for this number. It can be a link to a github repo, a paper reproducing it (e.g. openseq2seq for DeepSpeech2 and wav2letter), or a high number of citations or a noteworthy citation. What do you think of that?

I am also going to include baselines from Kaldi if there is no good argument against in #31.

Do you want to help?

#28 raises the question of my (lack of) responsiveness lately. If you're interested in helping out with the maintaining of the repo and if you adhere with the above, feel free to submit PRs of course. A good PR is not just the number(s) and the paper title, but also a note explaining what is special/specific in this paper approach. It's even better if there is a note in your PR that is slightly longer form than the "Notes" column, that shows that you understood the paper.

I will also consider adding a few trusted maintainers with push access.

Let me know in the comments if you have suggestions on how to scale that better while informing readers about the trustworthiness of the results we list.

@tlikhomanenko
Copy link
Collaborator

Hi @syhw,

thanks a lot for your huge work on this!

I prefer to have 2. option with a full picture (which still can be helpful to understand what people are doing and how this is realistic) to be prepared :).

However, my suggestion is to have two different tables:

  1. for verified results with links to code/repro/proof (like golden test results)
  2. with other results which should be still verified somehow. In this case information will be organized in a clear way to cite, to check where we are now, what we still need to reproduce.

Also I prefer to have separate two columns: reproducible and #citations (a lot of citations is not always equivalent to reproducibility).

@lunixbochs
Copy link

lunixbochs commented Jun 8, 2019

The most interesting form of "reproducible" for me is if a third party (unaffiliated with the original team) has posted a blog or whitepaper claiming similar results to the original, aka "findings were actually reproduced".

Could have a reproduced column with confidence values such as "the editor repro'd", "link to third party repro", "no repro but we trust the source", and "low confidence"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants