This repository contains code associated with the article Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora by Tuomo Hiippala and John Bateman, published in Digital Scholarship in the Humanities (open access).
To reproduce the results reported in the article, you must first download the following data:
- The Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset (direct download)
- The AI2D-RST corpus (direct download)
You should also create a fresh virtual environment for Python 3.8+ and install the libraries defined in requirements.txt
using the following command:
pip install -r requirements.txt
01_extract_blobs_from_ai2d.py
extracts graphic elements classified as "blobs" from the AI2D corpus and stores them into a directory named png_blobs
.
02_extract_colour_texture_features.py
extracts colour histograms and local binary patterns from the blobs. The results are stored into a HDF5 file named blob_features_ycbcr_hsv_gray_lbp.h5
.
03_plot_2d_umap.py
learns two-dimensional UMAP features for the 90-dimensional colour and texture features and plots the two-dimensional UMAP features.
04_plot_2d_umap+kde.py
plots kernel density estimations for UMAP features across selected diagram categories (e.g. cross-sections, cut-outs and illustrations).
05_plot_alluvial.R
plots an alluvial graph that maps the AI2D diagram categories to AI2D-RST diagram categories. You will need the ggplot2
and ggalluvial
libraries to run this script.
Questions? Open an issue on GitHub or e-mail me at tuomo dot hiippala @ helsinki dot fi.