Skip to content

nlpcl-lab/ted-talks-annotation

Repository files navigation

ted-talks-annotation

This is a code for EMNLP-IJCNLP 2019 AnnoNLP Workshop Paper: "Computer Assisted Annotation of Tension Development in TED Talks through Crowdsourcing"

Annotation Tool

An annotation tool used in the paper to annotate the tension development.

Pre-requisites

  1. Install and run Mongodb.

Setup

  1. To connect the Mongodb, make your own config.py: cp config.sample.py config.py

    • If the default setting of the Mongodb has not been changed, you don't need to modify the config.py
  2. Install python requirements: pip install -r requirements.txt

  3. Download TED talks videos on the data/video_list.csv: python downloader.py

Usage

  1. Run the web-based annotation tool: export PYTHONPATH=.; python annotation/app.py

  2. Annotate! 😵

    • Click one of the given options on each video clips. The selected value will be saved to DB automatically with the sentential information.
    • We provided the annotators with this guideline document when using Amazon Mechanical Turk.
  3. Export the annotation data: export PYTHONPATH=.; python annotation/dbscript.py --run=export_data

    • The exported data path: data/output/tension.json
    • Example:
    {
        "doc_id": "5db7ac1c88e6da63a07a9c2e",
        "doc_title": "The power of vulnerability",
        "source": "https://www.youtube.com/watch?v=iCvmsMzlF7o",
        "sent_id": "5db7ac3388e6da63a07a9c6a",
        "sent_index": 60,
        "text": "And it turned out to be shame.",
        "labels": [1, 1, 1],
        "start_ts": 279364,
        "end_ts": 281543
    }
    ...