Separate results with/without additional datasets #419

bfelbo · 2020-02-16T20:28:24Z

This repo is amazing for quickly finding the recent papers on a particular NLP problem. Thanks a lot for creating it! However, the leaderboards would be even more useful if they distinguished between methods using additional data and the ones that only used the supplied dataset. See this blog post by Anna Rogers for a more detailed discussion of why this is important.

For instance, consider the IMDB leaderboard on http://nlpprogress.com/english/sentiment_analysis.html. The top few methods use 100GB of text, whereas the virtual adversarial training (Miyato et al., 2016) on # 7 only uses the 25K labeled observations + 50K unlabeled observations, which are supplied as part of the IMDB task. It's very useful to know that.

Pushing the bigger models to limit is important as they can teach us about how far we can take the current paradigm and about its fundamental limitations, whereas it's similarly great to know the best approaches for training in a low-resource paradigm w/o a large pretrained model. Perhaps a simple solution would be to split them out into an "all tricks allowed" section and an "only supplied dataset" section?

sebastianruder · 2020-02-19T23:34:33Z

Hi Bjarke, thanks a lot for the suggestion! I think that's a really important distinction to make and I'd be really happy if we could surface this going forward. Alternatively, for some tasks people have already been using an asterisk next to the method name to highlight settings of particular methods. It might make sense to go with this as I think it's also helpful to know what additional resources a method employed and as it might be less obtrusive.

bfelbo · 2020-02-22T21:57:15Z

That's great to hear, Sebastian!

The asterisk approach makes sense. For some benchmarks, it would require many different symbols. For instance, for the IMDB dataset on the sentiment analysis leaderboard, e.g. XLNet uses Wikipedia, BooksCorpus, Giga5, ClueWeb, and Common Crawl, whereas BERT_large and BERT_base uses Wikipedia and BooksCorpus. If this repo went with an asterisk-style, perhaps the simplest would be to use a different symbol for each resource?

These symbols could be reused across different methods. This is similar to the author affiliation symbols in research publications. That would make it easy to compare across methods w/o having to read footnotes for each one. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate results with/without additional datasets #419

Separate results with/without additional datasets #419

bfelbo commented Feb 16, 2020 •

edited

sebastianruder commented Feb 19, 2020

bfelbo commented Feb 22, 2020

Separate results with/without additional datasets #419

Separate results with/without additional datasets #419

Comments

bfelbo commented Feb 16, 2020 • edited

sebastianruder commented Feb 19, 2020

bfelbo commented Feb 22, 2020

bfelbo commented Feb 16, 2020 •

edited