Skip to content

IBM/ne-table-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. The datasets contain extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set.

NE-Table: A Neural key-value table for Named Entities, RANLP 2019
Janarthanan Rajendran*, Jatin Ganhotra*, Xiaoxiao Guo, Mo Yu, Satinder Singh and Lazaros Polymenakos
https://dblp.org/rec/conf/ranlp/RajendranGGYSP19
(*Equal Contribution)

Extended Dialog bAbI tasks

Adaptation of the "Dialog bAbI tasks data" dataset released by Facebook, available at https://research.fb.com/downloads/babi/, under the CC BY 3.0 Unported license, available at https://creativecommons.org/licenses/by/3.0/legalcode

CBT-OOV datasets

Adaptation of the "The Children's Book Test (CBT)" dataset released by Facebook, available at https://research.fb.com/downloads/babi/, under the GNU Free Documentation License (Version 1.3, 3 November 2008), available at https://www.gnu.org/licenses/fdl-1.3.en.html

License

The dataset is released under CC BY-SA 4.0 license. For the full license, see LICENSE.txt. Please cite the following paper if you use this dataset in your work

@inproceedings{DBLP:conf/ranlp/RajendranGGYSP19,
  author    = {Janarthanan Rajendran and
               Jatin Ganhotra and
               Xiaoxiao Guo and
               Mo Yu and
               Satinder Singh and
               Lazaros Polymenakos},
  editor    = {Ruslan Mitkov and
               Galia Angelova},
  title     = {NE-Table: {A} Neural key-value table for Named Entities},
  booktitle = {Proceedings of the International Conference on Recent Advances in
               Natural Language Processing, {RANLP} 2019, Varna, Bulgaria, September
               2-4, 2019},
  pages     = {980--993},
  publisher = {{INCOMA} Ltd.},
  year      = {2019},
  url       = {https://doi.org/10.26615/978-954-452-056-4\_114},
  doi       = {10.26615/978-954-452-056-4\_114},
  timestamp = {Fri, 31 Jan 2020 12:36:51 +0100},
  biburl    = {https://dblp.org/rec/conf/ranlp/RajendranGGYSP19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contact

For more details on the datasets, see the paper
NE-Table: A Neural key-value table for Named Entities, RANLP 2019
Janarthanan Rajendran*, Jatin Ganhotra*, Xiaoxiao Guo, Mo Yu, Satinder Singh and Lazaros Polymenakos
https://dblp.org/rec/conf/ranlp/RajendranGGYSP19
(*Equal Contribution)

For questions on Extended Dialog bAbI tasks, contact Janarthanan Rajendran : rjana (at) umich (dot) edu
For questions on CBT-OOV dataset, contact Jatin Ganhotra : jatinganhotra (at) us (dot) ibm (dot) com

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name Extended dialog bAbI tasks and CBT-OOV datasets
alternateName Extended dialog bAbI tasks 1, 2 and 4 and OOV versions of the CBT test set
url
sameAs https://github.com/IBM/ne-table-datasets
description Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. The datasets contain extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set.
provider
property value
name IBM
sameAs https://en.wikipedia.org/wiki/IBM
citation https://dblp.org/rec/conf/ranlp/RajendranGGYSP19