Accessing National Library of Estonia Collections workshop

This repository presents simple tools and a workflow to access the Digitized Newspaper Collections at the National Library of Estonia. The collection currently includes 386,950 issues from 2200 periodical publications (including journals, magazines and newspapers). This amounts to a total of 3,635,503 pages of text and 6,298,369 articles segmented from this. Some of these texts are free to access, others need to follow some access restrictions due to copyright and other legal requirements. This workflow connects to the freely accessible issues in the collection that have been segmented, amounting to ~3,833,000 articles.

NB! These texts have the licence: Free access - restricted use. This publication's copyright protection has expired but the rights of works contained in the publication may be protected. The works may be used for private purposes or study and research purposes. In other cases please ascertain that the copyright term has expired.

The workflow can be followed in Jupyter or RStudio, to run them click on the following links.

Jupyter+R:

RStudio:

Usage

There are three script files currently in this repository.

data_overview.Rmd - gives an overview of the collection and allows further exploration of the metadata.
access_data.Rmd - link to the archived full-text files on data.digar.ee
search_and_concordance.Rmd - a simple workflow to search texts and export concordances for later study.

The Notebooks are currently run in a MyBinder environment which has memory limitations, allowing at least 1Gb of RAM, but no more than 2Gb (if memory is exceeded the system is restarted). For this, it is recommended to work with no more than 200 data files at the same time. Binder is meant for short term usage. It will automatically shut down user sessions that have more than 10 minutes of inactivity (if you leave your window open, this will be counted as “activity”) and can be expected to reliably work for 12 hours.

However the script files can easily be used on your own computer (just install R and then RStudio), and you are welcome to work with any and all texts locally. Download the whole Github repository here.

License

The workflow has been built keeping in mind the principles of Open Science in scientific computing Wilson et al. 2017, aiming for a maximally transparent and simple interface. The code is licensed under the MIT license, and is free to use for whatever purpose.

The workflow is registered at Zenodo, and can be cited as

Tinits, Peeter (2020). Eesti Rahvusraamatukogu digilabori tööriistad tekstimaterjali ligipääsuks ja töötlemiseks. Zenodo. http://doi.org/10.5281/zenodo.3953795
Tinits, Peeter (2020). GLAMlab toolkit to access and analyse texts at the National Library of Estonia. Zenodo. http://doi.org/10.5281/zenodo.3953795

Stable location now at https://github.com/peeter-t2/RR_GLAMlab_pilot/.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
output		output
tables		tables
.gitignore		.gitignore
BINDER_LICENSE		BINDER_LICENSE
LICENCE.md		LICENCE.md
README.md		README.md
RR_GLAMlab_pilot.Rproj		RR_GLAMlab_pilot.Rproj
access_data.Rmd		access_data.Rmd
data_overview.Rmd		data_overview.Rmd
install.R		install.R
postbuild		postbuild
requirements.txt		requirements.txt
runtime.txt		runtime.txt
search_and_concordance.Rmd		search_and_concordance.Rmd

License

Licenses found

peeter-t2/RR_GLAMlab_pilot

Folders and files

Latest commit

History

Repository files navigation

Accessing National Library of Estonia Collections workshop

Usage

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages