Skip to content

Evaluation of and comparison between different models for natural language sentence generation from from structured data

Notifications You must be signed in to change notification settings

nepiskopos/templates_to_language_evaluation

Repository files navigation

Templates to Language Evaluation

Final project in Database Systems course -- academic year 2019-2020

Authors:


Description

In this project we attempt to evaluate and compare the 3 following models used in advancing from structured data templates to natural language sentences formulation and compare their performance on the E2E and WikiBio datasets:

  1. neural-template-gen, written by Wiseman et al. (2018) (we used our own fork of this software)
  2. TGen, written by Dušek and Jurčíček (2016) (we used our own fork of this software but now all of our changes have been upstreamed to the original project)
  3. wiki2bio, written by Liu et al. (2018) (we used our own fork of this software)

For the evaluation, we have also created a custom Dockerfile, which should be used to build a Docker container to re-run all of our experiments.

To make the whole procedure clearer and more straightforward, we have created a text-based user interface, which will navigate any user through the whole procedure.


Details

In addition to the original projects as they were provided by their respective authors, we have performed the following modifications/additions:

  • several bug fixes in all of the 3 software projects
  • ported code to newer versions of toolkits (PyTorch 1.0 and TensorFlow 1.15)
  • created custom Bash scripts to group together several commands that are needed for a single operation
  • created custom Python scripts for post-processing outputs as well as evaluating the models
  • created a Dockerfile, which creates an all-in-one image and takes care of everything in an straightforward and automatic way
  • created a User Interface that abstracts the complexity of the Docker commands and the Bash scripts

Contents

  • docker: directory which contains a Dockerfile, as well as some instructions on how to set up and use Docker on a terminal in Ubuntu
  • e2e: directory which contains Bash and Python scripts that help with evaluating the models on the E2E challenge, using the E2E NLG Challenge Evaluation metrics, i.e. the code that has been developed to automatically evaluate the metrics of any model that is tested on the E2E dataset, as well as the measured scores
  • ntg: directory which contains Bash and Python scripts that help with using the neural-template-gen software, as well as our own re-trained models, segmentations and generations
  • tgen: directory which contains a Bash script that helps with using the TGen software, as well as our own re-trained model, segmentations and outputs
  • w2b: directory which contains Bash and Python scripts that help with using the wiki2bio software, as well as our own re-trained model and outputs

Guidelines

  1. Install Ubuntu 20.04 (preferably server edition) which can be downloaded from the official Ubuntu website
  2. Install Git and Python version 3
sudo apt-get -y install git python3
  1. Install Docker following our own Docker Instructions
  2. Clone this repository locally with Git
git clone https://github.com/nbishdev/templates_to_language_evaluation.git
  1. Change the working directory to the locally stored copy of the repository
cd ./templates_to_language_evaluation/
  1. Launch the UI, which will guide you through the rest of the setup
python3 ./ui.py
  1. Follow the instructions presented in the UI, to build a Docker image and a Docker container, then execute the whole project through the container

About

Evaluation of and comparison between different models for natural language sentence generation from from structured data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published