Skip to content

Unbabel/BConTrasT

Repository files navigation

BConTrast

This repository contains the train, dev, and test sets of the BConTrasT corpus used in the chat translation task for WMT20. It is based on the Taskmaster-1 corpus which includes monolingual (i.e. English) task-based dialogs in six domains, i.e. (i) ordering pizza, (ii) creating auto repair appointments, (iii) setting up ride service, (iv) ordering movie tickets, (v) ordering coffee drinks, and (vi) making restaurant reservations. A subset of Taskmaster-1 corpus was selected and translated into German at Unbabel.

Each conversation in the data file has the following structure:

  • ConversationID: A unique identifier for each conversation.
  • Utterances: An array of utterances that make up the conversation. Each utterance has the following fields:
    • UtteranceID: A 0-based index indicating the order of the utterances in the conversation.
    • Speaker: Either customer or agent, indicating which role generated this utterance.
    • Source: The utterance in the original source language.
    • Target: The utterance in the translated target language.

Note: Since here we assume customer and agent speak in their own language, the source and target text might be in English or German depending on the role.

License

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

The dataset in this repository, used for the WMT20 shared task, is provided under the terms of the CC-BY-SA-4.0.

Citation

If you use this please cite:

M Amin Farajian, António V Lopes, André FT Martins, Sameen Maruf, Gholamreza Haffari (2020). Findings of the wmt 2020 shared task on chat translation (https://aclanthology.org/2020.wmt-1.3/)

@inproceedings{farajian-etal-2020-findings,
    title = "Findings of the {WMT} 2020 Shared Task on Chat Translation",
    author = "Farajian, M. Amin  and
      Lopes, Ant{\'o}nio V.  and
      Martins, Andr{\'e} F. T.  and
      Maruf, Sameen  and
      Haffari, Gholamreza",
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.3",
    pages = "65--75",
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages