Skip to content

hibernator11/notebook-texts-metadata

Repository files navigation

Binder

DOI

notebook-texts-metadata

This project includes several notebooks to exploit GLAM datasets that contain text and metadata.

dataset-extraction-images

This notebook extracts a dataset as a CSV file from a digital collection described using MARCXML files.

We use Data Package as a simple container format for describing a coherent collection of data in a single 'package'. It provides the basis for convenient delivery, installation and management of datasets.

This notebook uses a dataset of descriptive metadata from the Moving Image Archive catalogue, which is Scotland’s national collection of moving images.

topic-modeling-billing

This notebook extracts the most common words in a corpus of text documents. This notebook is an example of Topic Modeling based on Digitised Volumes of theatrical English, Scottish, and Irish playbills between 1600 - 1902 from data.bl.uk.

Topic modeling

References

The GLAM Workbench has been used as inspiration to create this example. In particular, the notebook Exploring metadata harvested from the Tribune negative collection in the State Library of NSW.

Theatrical playbills from Britain and Ireland (OCR text only) Optically Character Recognised (OCR)-derived text for the playbills, encoded in UTF-8. DOI: https://doi.org/10.21250/pb2