notebook-texts-metadata

This project includes several notebooks to exploit GLAM datasets that contain text and metadata.

dataset-extraction-images

This notebook extracts a dataset as a CSV file from a digital collection described using MARCXML files.

We use Data Package as a simple container format for describing a coherent collection of data in a single 'package'. It provides the basis for convenient delivery, installation and management of datasets.

This notebook uses a dataset of descriptive metadata from the Moving Image Archive catalogue, which is Scotland’s national collection of moving images.

topic-modeling-billing

This notebook extracts the most common words in a corpus of text documents. This notebook is an example of Topic Modeling based on Digitised Volumes of theatrical English, Scottish, and Irish playbills between 1600 - 1902 from data.bl.uk.

References

The GLAM Workbench has been used as inspiration to create this example. In particular, the notebook Exploring metadata harvested from the Tribune negative collection in the State Library of NSW.

Theatrical playbills from Britain and Ireland (OCR text only) Optically Character Recognised (OCR)-derived text for the playbills, encoded in UTF-8. DOI: https://doi.org/10.21250/pb2

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Moving-Image-Archive		Moving-Image-Archive
images		images
playbills-ocr-text		playbills-ocr-text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datapackage.zip		datapackage.zip
dataset-extraction-images.ipynb		dataset-extraction-images.ipynb
marc_records.csv		marc_records.csv
requirements.txt		requirements.txt
runtime.txt		runtime.txt
topic-modeling-billing.ipynb		topic-modeling-billing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving-Image-Archive

Moving-Image-Archive

images

images

playbills-ocr-text

playbills-ocr-text

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

datapackage.zip

datapackage.zip

dataset-extraction-images.ipynb

dataset-extraction-images.ipynb

marc_records.csv

marc_records.csv

requirements.txt

requirements.txt

runtime.txt

runtime.txt

topic-modeling-billing.ipynb

topic-modeling-billing.ipynb

Repository files navigation

notebook-texts-metadata

dataset-extraction-images

topic-modeling-billing

References

About

Releases 2

Packages

Languages

License

hibernator11/notebook-texts-metadata

Folders and files

Latest commit

History

Repository files navigation

notebook-texts-metadata

dataset-extraction-images

topic-modeling-billing

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages