NLP - Lexicon and Datasets

Paper	Conference	Remarks
The SEMAINE database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent	IEEE TAC 2012	1. Introduce a large audiovisual database as part of an iterative approach to building agents that can engage a person in a sustained, emotionally coloured conversation, using the Sensitive Artificial Listener (SAL) paradigm. 2. The dataset contains 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each.
Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs	CMCL Workshop 2011	1. Provide Cornell Movie Dialog Dataset 2. Find significant coordination across many families of function words in our large movie-script corpus
Norms of valence, arousal and dominance for 13,915 English lemmas	Behaviour Research Methods 2013	1. Four lines of research for emotional ratings of words: emotions themselves; impact of emotional features on the processing and memory of words; estimation of sentiments expressed by entire messages or texts; automatic estimation of emotional values of new words by comparing them to those of validated words. 2. Provided a dataset comprising 13,915 lemma-VAD pairs
Linguistic Inquiry and Word Count: LIWC2015	www.liwc.net	Provide the user manual for LIWC software
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems	SIGDIAL 2015	Introduce a dataset containing 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words.
Modelling Valence and Arousal in Facebook posts	Workshop on WASSA 2016	Introduce a new data set of 2895 Social Media posts rated by two psychologically-trained annotators on two separate ordinal nine-point scales (valence and arousal).
A Survey of Available Corpora for Building Data-Driven Dialogue Systems	Arxiv 2017	1. A wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. 2. Discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. 3. Examine methods for transfer learning between datasets and the use of external knowledge. 4. Discuss appropriate choice of evaluation metrics for the learning objective.
AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups	IEEE TAC 2017	1. A dataset for Multimodal research of affect, personality traits and mood on Individuals and Groups. 2. The dataset contains videos and neuro-physiological signals. 3. Participants emotions have been annotated with both self-assessment of affective levels (valence, arousal, control, familiarity, liking and basic emotions) felt during the videos as well as external-assessment of levels of valence and arousal.
A Sentiment-and-Semantics-Based Approach for Emotion Detection in Textual Conversations	Arxiv 2018	1. Introduce a collection of more than 30K three-turn tweets annotated with four emotions: happy, sad, angry and others. 2. Proposes four types of sequence-based convolutional neural network models with attention that leverage the sequence information encapsulated in dialogue.
Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks	AAAI Workshop 2018	1. Present a new corpus that provides annotation of seven emotions on consecutive utterances in dialogues extracted from the show, Friends.
EmotionLines: An Emotion Corpus of Multi-Party Conversations	LREC 2018	A total of 29,245 utterances from 2,000 dialogues are labeled in EmotionLines, each utterance in labelled in one of the 7 emotions: Ekman's six emotions + neutral.
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations	Arxiv 2018	1. An updated version of EmotionLines. 2. Every utterance in MELD is associated with an emotion and a sentiment label. Utterances in MELD are multimodal encompassing audio and visual modalities along with the text.

Back to index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP_lexicon.md

NLP_lexicon.md

NLP - Lexicon and Datasets

Files

NLP_lexicon.md

Latest commit

History

NLP_lexicon.md

File metadata and controls

NLP - Lexicon and Datasets