Skip to content

rstodden/TS_annotation_tool

Repository files navigation

TS-ANNO: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora

We introduce TS-ANNO, an open-source web application for manual creation and for evaluation of parallel corpora for text simplification (TS). TS-ANNO can be used for (i) sentence–wise alignment, (ii) rating alignment pairs (e.g., wrt. grammaticality, meaning preservation, ...), (iii) annotating alignment pairs wrt. simplification transformations (e.g., lexical substitution, sentence splitting, ...), and (iv) manual simplification of complex documents. For evaluation, TS-ANNO calculates inter-annotator agreement of alignments (i) and annotations (ii).

Demo

You can test annotation tool in our live demo (user: test, password: TS_anno22) or watch our demonstration video on YouTube. Link to Youtube Video

Installation

Main Functionalities

Upload Data

TS-Anno supports different methods and settings of data upload. To upload some data, login with your superuser account and click on "Data Upload" in the navigation bar on the left. You have the following choices:

  • Upload and pre-process parallel online documents with TS-ANNO's web crawler, e.g., to align and annotate some web data.
  • Upload local data:
    • Upload aligned texts (e.g., previously aligned or generated by a TS system). These data could be annotated or rated in TS-ANNO.
    • Upload plain but parallel texts to align, rate or annotate them.
    • Upload plain but non-parallel texts to manually simplify them. For further instructions and settings see .demo/Upload_Data.md.

Manual Sentence Alignment

One main functionality is manual sentence-wise alignment. The uploaded texts are split into sentences by SpaCy, so that several sentences of the complex and simple document can be aligned with each other. TS-ANNO supports n:m alignemnts, e.g., produced by a sentence split or merge of sentences. With the wand button most similar simple sentences of the current complex sentence can be shown.

ts_anno_align.mp4

Manual Rating

TS-ANNO supports manual rating of aligned sentence pairs. In the video below the default evaluation aspects are shown, the aspects as well as the scale size can be changed at ./settings_annotation/config_rating.py.

ts_anno_rating.mp4

Manual Annotation of Rewriting Transformations

Another functionality of TS-ANNO is the annotation of rewriting transformations performed during the simplification. One can mark the affected tokens of the rewriting in one or both of the texts and choose the level of the transformation and if identifiable also the class and sub transformation. This process can be repeated for all transformations of the pair. The name of the classes and transformations can be changed at ./settings_annotation/config_transformation.py.

ts_anno_transformation.mp4

Simplification

Another kind of annotation supported by TS-anno is the manual simplification. The annotator can choose as many sentences as they want to simplify at once. The system provides the annotator with some simplification guidelines and a suggestion generated by a text simplification system. If you want to use the suggestion-function, set load_simplification_model in ./settings_annotation/config_simplification.py to True and get your copy of a TS system, e.g., MUSS.

Screenshot of Manual Simplification in TS-ANNO

License:

The annotation tool is licensed under GNU General Public License v3.0.

Citation

If you use TS-anno in your research, please cite our paper:

@inproceedings{stodden-kallmeyer-2022-ts,
    title = "{TS}-{ANNO}: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora",
    author = "Stodden, Regina  and
      Kallmeyer, Laura",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-demo.14",
    pages = "145--155",
}

Contact:

Feel free to contact Regina Stodden if you have any comments or problems with the annotation tool.