Trove newspapers

Current version: v1.3.4

This repository contains Jupyter notebooks to work with data from Trove's newspapers zone. For more information see the Trove Newspapers section of the GLAM Workbench.

Notebook topics

Trove newspapers in context

Visualise the total number of newspaper articles in Trove by year and state – explore how Trove's newspaper articles are distributed over time, and by state
Analyse rates of OCR correction – explore patterns in OCR text correction; how many corrections are there and where have they been made?
Finding non-English newspapers in Trove – use automated language detection to identify non-English language newspapers in Trove
Beyond the copyright cliff of death – find newspapers with content published after 1954
Gathering historical data about the addition of newspaper titles to Trove – find when newspaper titles were added to Trove by extracting lists from web archives

Visualising searches

QueryPic – simple app to visualise newspaper searches over time, this is the latest version with many new features
QueryPic Deconstructed – an older version of QueryPic that lets you build queries using keywords, states, or newspapers
Visualise Trove newspaper searches over time – use facets to slice up newspaper search results and visualise over time
Map Trove newspaper results by state – create a choropleth map to visualise search results by state
Map Trove newspaper results by place of publication – links newspapers to their place of publication and maps the results
Map Trove newspaper results by place of publication over time – adds a time dimension to the example above

Harvesting data

See the Trove Newspaper and Gazette Harvester if you want to harvest all the articles from a search.

Harvest information about newspaper issues – get information about available issues for each newspaper from the Trove API
Harvest the issues of a newspaper as PDFs – harvest available issues of a newspaper as PDFs
Harvest Australian Women's Weekly covers (or the front pages of any newspaper) – harvest the front pages of any newspaper, including covers from the Australian Women's Weekly

Useful tools

Save a Trove newspaper article as an image – grabs the page on which an article was published, and then crops the page image to the boundaries of the article to create a complete, intact image of the article as it was originally published
Download a page image – a simple app that lets you download page images as complete, high-resolution JPG files
Generate an article thumbnail – generate a nice square thumbnail image for a newspaper article
Upload Trove newspaper articles to Omeka-S – steps through the process of uploading Trove newspaper articles to your own Omeka-S instance via the API

Tips and tricks

Today’s news yesterday – uses the date index and the firstpageseq parameter to find articles from exactly 100 years ago that were published on the front page
Create a Trove OCR corrections ticker – uses the has:corrections parameter to get the total number of newspaper articles with OCR corrections
Get a list of Trove newspapers that doesn't include government gazettes – workaround for a problem with the newspaper/titles endpoint of the API
Get the page coordinates of a digitised newspaper article from Trove – demonstrates how to find the coordinates of a newspaper article on a digitised page

Get creative

Make composite images from lots of Trove newspaper thumbnails – creates thumbnails from a search and compiles them into a mega image
Create 'scissors and paste' messages from Trove newspaper articles – snip words out of page images and compile them into the message of your choice
Create large composite images from snipped words – harvest multiple versions of a list of words and compile them all into one big image

See the GLAM Workbench for more details.

Data files

CSV formatted lists of newspaper titles in Trove
- trove_newspaper_titles_2009_2021.csv – complete dataset of captures and titles
- trove_newspaper_titles_first_appearance_2009_2021.csv – filtered dataset, showing only the first appearance of each title / place / date range combination
- There is also an alphabetical list of newspaper titles, showing approximately when they first appeared in Trove.
CSV formatted list of Australian Women's Weekly issues, 1933 to 1982
Australian Women's Weekly front covers, 1933 to 1982 (2,566 images on Cloudstor) For easy browsing, I've compiled the images into a set of PDF files, one for each decade, available from Dropbox:
Trove newspapers with non-English language content
Trove newspapers with articles published after 1954

Run these notebooks

There are a number of different ways to use these notebooks. Binder is quickest and easiest, but it doesn't save your data. I've listed the options below from easiest to most complicated (requiring more technical knowledge).

Using Binder

Click on the button above to launch the notebooks in this repository using the Binder service (it might take a little while to load). This is a free service, but note that sessions will close if you stop using the notebooks, and no data will be saved. Make sure you download any changed notebooks or harvested data that you want to save.

See Using Binder for more details.

Using Reclaim Cloud

Reclaim Cloud is a paid hosting service, aimed particularly at supported digital scholarship in hte humanities. Unlike Binder, the environments you create on Reclaim Cloud will save your data – even if you switch them off! To run this repository on Reclaim Cloud for the first time:

Create a Reclaim Cloud account and log in.
Click on the button above to start the installation process.
A dialogue box will ask you to set a password, this is used to limit access to your Jupyter installation.
Sit back and wait for the installation to complete!
Once the installation is finished click on the 'Open in Browser' button of your newly created environment (note that you might need to wait a few minutes before everything is ready).

See Using Reclaim Cloud for more details.

Using the Nectar Research Cloud

The Nectar Research Cloud (part of the Australian Research Data Commons) provides cloud computing services to researchers in Australian and New Zealand universities. Any university-affiliated researcher can log on to Nectar and receive up to 6 months of free cloud computing time. And if you need more, you can apply for a specific project allocation.

The GLAM Workbench is available in the Nectar Cloud as a pre-configured application. This means you can get it up and going without worrying about the technical infrastructure – just fill in a few details and you're away! To create an instance of this repository in the Nectar Cloud:

Log in to the Nectar Dashboard using your university credentials.
From the Dashboard choose Applications -> Browse Local.
Enter 'GLAM' in the filter box and hit Enter, you should see the GLAM Workbench application.
Click on the GLAM Workbench application's Quick Deploy button.
Step through the various configuration options. Some options are only available if you have a dedicated project allocation.
When asked to select a GLAM Workbench repository, choose 'Trove newspapers' from the dropdown.
Complete the configuration and deploy your GLAM Workbench instance.
The url to access your instance will be displayed once it's ready. Click on the url!

See Using Nectar for more details.

Using Docker

You can use Docker to run a pre-built computing environment on your own computer. It will set up everything you need to run the notebooks in this repository. This is free, but requires more technical knowledge – you'll have to install Docker on your computer, and be able to use the command line.

Install Docker Desktop.
Create a new directory for this repository and open it from the command line.

From the command line, run the following command:

docker run -p 8888:8888 --name trove-newspapers -v "$PWD":/home/jovyan/work quay.io/glamworkbench/trove-newspapers repo2docker-entrypoint jupyter lab --ip 0.0.0.0 --NotebookApp.token='' --LabApp.default_url='/lab/tree/index.ipynb'

It will take a while to download and configure the Docker image. Once it's ready you'll see a message saying that Jupyter Notebook is running.
Point your web browser to http://127.0.0.1:8888

See Using Docker for more details.

Setting up on your own computer

If you know your way around the command line and are comfortable installing software, you might want to set up your own computer to run these notebooks.

Assuming you have recent versions of Python and Git installed, the steps might be something like:

Create a virtual environment, eg: python -m venv trove-newspapers
Open the new directory" cd trove-newspapers
Activate the environment source bin/activate
Clone the repository: git clone https://github.com/GLAM-Workbench/trove-newspapers.git notebooks
Open the new notebooks directory: cd notebooks
Install the necessary Python packages: pip install -r requirements.txt
Run Jupyter: jupyter lab

See the [GLAM Workbench for [more details](https://glam-workbench.net/getting-started/#using-python-on-your-own-computer.

Cite as

See the GLAM Workbench or Zenodo for up-to-date citation details.

This repository is part of the GLAM Workbench.
If you think this project is worthwhile, you might like to sponsor me on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github		.github
.jupyter/lab/user-settings/@jupyterlab/docmanager-extension		.jupyter/lab/user-settings/@jupyterlab/docmanager-extension
binder		binder
data		data
docs		docs
images		images
templates		templates
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.zenodo.json		.zenodo.json
Analysing_OCR_corrections.ipynb		Analysing_OCR_corrections.ipynb
Beyond_the_copyright_cliff_of_death.ipynb		Beyond_the_copyright_cliff_of_death.ipynb
Composite-thumbnails.ipynb		Composite-thumbnails.ipynb
Create-a-Trove-corrections-ticker.ipynb		Create-a-Trove-corrections-ticker.ipynb
Get-article-thumbnail.ipynb		Get-article-thumbnail.ipynb
Get_newspaper_titles_not_including_gazettes.ipynb		Get_newspaper_titles_not_including_gazettes.ipynb
LICENSE		LICENSE
Map-newspaper-results-by-place-of-publication-over-time.ipynb		Map-newspaper-results-by-place-of-publication-over-time.ipynb
Map-newspaper-results-by-place-of-publication.ipynb		Map-newspaper-results-by-place-of-publication.ipynb
Map-newspaper-results-by-state.ipynb		Map-newspaper-results-by-state.ipynb
QueryPic_deconstructed.ipynb		QueryPic_deconstructed.ipynb
README.md		README.md
Save-Trove-newspaper-article-as-image.ipynb		Save-Trove-newspaper-article-as-image.ipynb
Save-page-image.ipynb		Save-page-image.ipynb
Todays-news-yesterday.ipynb		Todays-news-yesterday.ipynb
Upload-Trove-newspapers-to-Omeka.ipynb		Upload-Trove-newspapers-to-Omeka.ipynb
find-non-english-newspapers.ipynb		find-non-english-newspapers.ipynb
harvest-aww-covers-and-newspaper-front-pages.ipynb		harvest-aww-covers-and-newspaper-front-pages.ipynb
harvest_newspaper_issues.ipynb		harvest_newspaper_issues.ipynb
harvest_newspaper_issues_as_pdfs.ipynb		harvest_newspaper_issues_as_pdfs.ipynb
historical-data-newspaper-titles.ipynb		historical-data-newspaper-titles.ipynb
index.ipynb		index.ipynb
index.md		index.md
jupyter_config.json		jupyter_config.json
newspapers_post_54.csv		newspapers_post_54.csv
non-english-newspapers.md		non-english-newspapers.md
postBuild		postBuild
pyproject.toml		pyproject.toml
querypic.ipynb		querypic.ipynb
reclaim-manifest.jps		reclaim-manifest.jps
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
runtime.txt		runtime.txt
titles_corrected.csv		titles_corrected.csv
titles_list.md		titles_list.md
trove-newspapers-create-composite-from-words.ipynb		trove-newspapers-create-composite-from-words.ipynb
trove-newspapers-get-coordinates-of-articles.ipynb		trove-newspapers-get-coordinates-of-articles.ipynb
trove-newspapers-scissors-and-paste.ipynb		trove-newspapers-scissors-and-paste.ipynb
trove_newspaper_titles_2009_2021.csv		trove_newspaper_titles_2009_2021.csv
trove_newspaper_titles_first_appearance_2009_2021.csv		trove_newspaper_titles_first_appearance_2009_2021.csv
visualise-searches-over-time.ipynb		visualise-searches-over-time.ipynb
visualise-total-newspaper-articles-by-state-year.ipynb		visualise-total-newspaper-articles-by-state-year.ipynb

License

GLAM-Workbench/trove-newspapers

Folders and files

Latest commit

History

Repository files navigation

Trove newspapers

Notebook topics