Skip to content

Repository for the paper "Collecting Visually-Grounded Dialogue with A Game Of Sorts"

License

Notifications You must be signed in to change notification settings

willemsenbram/a-game-of-sorts

Repository files navigation



Repository for the paper "Collecting Visually-Grounded Dialogue with A Game Of Sorts" presented at LREC 2022. Please cite the following work if you use anything from this repository or from our paper:

@inproceedings{willemsen-etal-2022-collecting,
    title = "Collecting Visually-Grounded Dialogue with A Game Of Sorts",
    author = "Willemsen, Bram  and
      Kalpakchi, Dmytro  and
      Skantze, Gabriel",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.242",
    pages = "2257--2268"
}

📜 Overview


💬 The Task

A Game Of Sorts is a collaborative image ranking task. The default version of this "game" has two players tasked with reaching agreement on how to rank a set of nine images given some sorting criterion. Communication between players takes the form of a written dialogue which takes place under partial knowledge, as the position of the images on the gameboard is randomized and players cannot see each other's perspective. The game is played over four rounds with a recurring set of images. For a more detailed explanation of the game, we refer the reader to Section 3 of our paper.

Game UI 💻

Figure 1: User interface of the game during gameplay.

Figure 1 provides an overview of the game's user interface. The gameboard consists of a sorting criterion, embedded in a scenario, and a set of nine images1, laid out in a 3x3 grid. When a user hovers over an image with their mouse cursor, the image will pop over the other images, revealing a slightly zoomed-in version of the image in its original aspect ratio for improved visibility. To the right of the gameboard is the chat area. Before being able to send a message, the player is expected to indicate whether they are referencing one or more images in their message (i.e., self-annotation). To do so, they either select the image(s) in question on the gameboard by clicking on the referenced image(s) or they select the "I'm not referring to an image" option in case their message does not contain a reference to any of the images. Above the chat area, a red button is visible that reads "LOCK": when the players have managed to reach an agreement on how to rank one or more of the images, they will need to click this button after having selected the image in question on the gameboard; note that images can only be locked in one at a time. If both players lock the same image, the image is assigned a rank; ranks are assigned in descending order. The assigned rank of a successfully ranked image cannot be changed.

Dialogue Excerpt 🔎

Figure 2: Excerpt of a dialogue.

Figure 2 shows an excerpt of a dialogue in which the players discussed the scenario presented in Figure 1.


🍝 The Code

The game's codebase can be found in the ./game directory. The application (client and server-side) is written in JavaScript and makes use of Embedded JavaScript (EJS) templates. The database is MongoDB. We use Docker to provide a containerized version of the application.

🚧 NOTE: We are refactoring the code to improve readability and usability; the repo will be updated and code added once necessary improvements have been made.


📄 The Data

The collected data, as reported in our paper, can be found in the ./dataset directory. These include the dialogues, player actions, answers to the post-game questionnaire, the specific game configurations and other meta data required to reconstruct the interactions. This data is stored in JSON Lines-formatted files.

Message 🔎

{
    "interaction_id": "61c09b4ee90a3cc04fdce194", 
    "game_id": "game_1_dogs_1", 
    "board_id": "game_1_dogs_1_board_1", 
    "round_number": 1, 
    "round_active": true, 
    "message_number": 30, 
    "username": "tiny-elephant-9291", 
    "body_text": "Ok, how about the white curly dog. Looks like it has a strange haircut.", 
    "image_ids": [
        "n02093647_2585"
    ], 
    "timestamp": 1640013899727
}
Name Type Description
interaction_id str Interaction ID
game_id str Game ID
board_id str Gameboard ID
round_number int Round number; starts at 0 if practice round, otherwise starts at 1
round_active bool Indicates whether the message was sent during (true) or after (false) a round
message_number int Message number; resets each round, starts at 1
username str Username of the player that sent this message
body_text str Text content of this message
image_ids array Array of image IDs (as str) that were selected by the player to indicate which images were referenced in the message: self-annotation
timestamp int Unix timestamp indicating the time at which the server received the message

More examples and an explanation of the content of each file can be found in the ./dataset README.

Images 📷

The 45 images used in the data collection (as reported in our paper) were taken from various publicly-available datasets. Pointers are provided in the ./dataset README. A bash script is provided to automatically download the images to ./dataset/images.

Annotations 📑

Mentions 🔗

The paper "Resolving References in Visually-Grounded Dialogue via Text Generation" contributes, among other things, mention annotations at the span level and has all annotated mentions aligned with the images they denote.

git clone https://github.com/willemsenbram/reference-resolution-via-text-generation.git

See the ./annotations directory. Please cite their work if you make use of these annotations or other material from their repo or paper.

Footnotes

  1. The images shown here were taken from Open Images V6.