Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate results with/without additional datasets #419

Open
bfelbo opened this issue Feb 16, 2020 · 2 comments
Open

Separate results with/without additional datasets #419

bfelbo opened this issue Feb 16, 2020 · 2 comments

Comments

@bfelbo
Copy link

bfelbo commented Feb 16, 2020

This repo is amazing for quickly finding the recent papers on a particular NLP problem. Thanks a lot for creating it! However, the leaderboards would be even more useful if they distinguished between methods using additional data and the ones that only used the supplied dataset. See this blog post by Anna Rogers for a more detailed discussion of why this is important.

For instance, consider the IMDB leaderboard on http://nlpprogress.com/english/sentiment_analysis.html. The top few methods use 100GB of text, whereas the virtual adversarial training (Miyato et al., 2016) on # 7 only uses the 25K labeled observations + 50K unlabeled observations, which are supplied as part of the IMDB task. It's very useful to know that.

Pushing the bigger models to limit is important as they can teach us about how far we can take the current paradigm and about its fundamental limitations, whereas it's similarly great to know the best approaches for training in a low-resource paradigm w/o a large pretrained model. Perhaps a simple solution would be to split them out into an "all tricks allowed" section and an "only supplied dataset" section?

@sebastianruder
Copy link
Owner

Hi Bjarke, thanks a lot for the suggestion! I think that's a really important distinction to make and I'd be really happy if we could surface this going forward. Alternatively, for some tasks people have already been using an asterisk next to the method name to highlight settings of particular methods. It might make sense to go with this as I think it's also helpful to know what additional resources a method employed and as it might be less obtrusive.

@bfelbo
Copy link
Author

bfelbo commented Feb 22, 2020

That's great to hear, Sebastian!

The asterisk approach makes sense. For some benchmarks, it would require many different symbols. For instance, for the IMDB dataset on the sentiment analysis leaderboard, e.g. XLNet uses Wikipedia, BooksCorpus, Giga5, ClueWeb, and Common Crawl, whereas BERT_large and BERT_base uses Wikipedia and BooksCorpus. If this repo went with an asterisk-style, perhaps the simplest would be to use a different symbol for each resource?

These symbols could be reused across different methods. This is similar to the author affiliation symbols in research publications. That would make it easy to compare across methods w/o having to read footnotes for each one. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants