Skip to content

pgurunath/slu_confusion2vec

Repository files navigation

Spoken Language Intent Detection using Confusion2Vec

Introduction

This repo contains the data used in our paper: Spoken Language Intent Detection using Confusion2Vec. In our paper, we proposed an approach for robust intent detection on noisy ASR transcripts using Confusion2Vec embeddings. The data in this repo contains the normalized reference transcripts, the corresponding intent and slot labels for the reference transcripts, and the mapped audio samples in ATIS. The mapped audio samples were used to generate ASR outputs on which we evaluated our model's intent detection performance.

Data Convention

The data convention is as follows: <reference transcript><slot label><intent label><path to the audio file in ATIS>, delimited by tab charaters. Our data is consistent with the JointSLU repo, with the mapped audio samples added. Note, for these two utterances:

  • Utterance 1: i need information on flights for tuesday leaving baltimore for dallas dallas to boston and boston to baltimore
  • Utterance 2: round trip flights from minneapolis to san diego coach economy fare

Utterance 1 appears twice, but only one matched audio transcript in ATIS was found. We assigned the same audio sample for these two identical utterances. Utterance 2 cannot be matched to any of the audio transcripts in ATIS after our every effort. We assigned a '<UNMATCHED>' token as the <path to the audio file in ATIS>.

Citation

If you used our data, please cite our paper:

@online{1904.03576,
Author = {Prashanth Gurunath Shivakumar and Mu Yang and Panayiotis Georgiou},
Title = {Spoken Language Intent Detection using Confusion2Vec},
Year = {2019},
Eprint = {1904.03576},
Eprinttype = {arXiv},
}

About

Spoken Language Intent Detecting using Confusion2vec

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published