Current version: v1.3.4
This repository contains Jupyter notebooks to work with data from Trove's newspapers zone. For more information see the Trove Newspapers section of the GLAM Workbench.
- Visualise the total number of newspaper articles in Trove by year and state – explore how Trove's newspaper articles are distributed over time, and by state
- Analyse rates of OCR correction – explore patterns in OCR text correction; how many corrections are there and where have they been made?
- Finding non-English newspapers in Trove – use automated language detection to identify non-English language newspapers in Trove
- Beyond the copyright cliff of death – find newspapers with content published after 1954
- Gathering historical data about the addition of newspaper titles to Trove – find when newspaper titles were added to Trove by extracting lists from web archives
- QueryPic – simple app to visualise newspaper searches over time, this is the latest version with many new features
- QueryPic Deconstructed – an older version of QueryPic that lets you build queries using keywords, states, or newspapers
- Visualise Trove newspaper searches over time – use facets to slice up newspaper search results and visualise over time
- Map Trove newspaper results by state – create a choropleth map to visualise search results by state
- Map Trove newspaper results by place of publication – links newspapers to their place of publication and maps the results
- Map Trove newspaper results by place of publication over time – adds a time dimension to the example above
See the Trove Newspaper and Gazette Harvester if you want to harvest all the articles from a search.
- Harvest information about newspaper issues – get information about available issues for each newspaper from the Trove API
- Harvest the issues of a newspaper as PDFs – harvest available issues of a newspaper as PDFs
- Harvest Australian Women's Weekly covers (or the front pages of any newspaper) – harvest the front pages of any newspaper, including covers from the Australian Women's Weekly
- Save a Trove newspaper article as an image – grabs the page on which an article was published, and then crops the page image to the boundaries of the article to create a complete, intact image of the article as it was originally published
- Download a page image – a simple app that lets you download page images as complete, high-resolution JPG files
- Generate an article thumbnail – generate a nice square thumbnail image for a newspaper article
- Upload Trove newspaper articles to Omeka-S – steps through the process of uploading Trove newspaper articles to your own Omeka-S instance via the API
- Today’s news yesterday – uses the
date
index and thefirstpageseq
parameter to find articles from exactly 100 years ago that were published on the front page - Create a Trove OCR corrections ticker – uses the
has:corrections
parameter to get the total number of newspaper articles with OCR corrections - Get a list of Trove newspapers that doesn't include government gazettes – workaround for a problem with the
newspaper/titles
endpoint of the API - Get the page coordinates of a digitised newspaper article from Trove – demonstrates how to find the coordinates of a newspaper article on a digitised page
- Make composite images from lots of Trove newspaper thumbnails – creates thumbnails from a search and compiles them into a mega image
- Create 'scissors and paste' messages from Trove newspaper articles – snip words out of page images and compile them into the message of your choice
- Create large composite images from snipped words – harvest multiple versions of a list of words and compile them all into one big image
See the GLAM Workbench for more details.
- CSV formatted lists of newspaper titles in Trove
- trove_newspaper_titles_2009_2021.csv – complete dataset of captures and titles
- trove_newspaper_titles_first_appearance_2009_2021.csv – filtered dataset, showing only the first appearance of each title / place / date range combination
- There is also an alphabetical list of newspaper titles, showing approximately when they first appeared in Trove.
- CSV formatted list of Australian Women's Weekly issues, 1933 to 1982
- Australian Women's Weekly front covers, 1933 to 1982 (2,566 images on Cloudstor) For easy browsing, I've compiled the images into a set of PDF files, one for each decade, available from Dropbox:
- Trove newspapers with non-English language content
- Trove newspapers with articles published after 1954
There are a number of different ways to use these notebooks. Binder is quickest and easiest, but it doesn't save your data. I've listed the options below from easiest to most complicated (requiring more technical knowledge).
Click on the button above to launch the notebooks in this repository using the Binder service (it might take a little while to load). This is a free service, but note that sessions will close if you stop using the notebooks, and no data will be saved. Make sure you download any changed notebooks or harvested data that you want to save.
See Using Binder for more details.
Reclaim Cloud is a paid hosting service, aimed particularly at supported digital scholarship in hte humanities. Unlike Binder, the environments you create on Reclaim Cloud will save your data – even if you switch them off! To run this repository on Reclaim Cloud for the first time:
- Create a Reclaim Cloud account and log in.
- Click on the button above to start the installation process.
- A dialogue box will ask you to set a password, this is used to limit access to your Jupyter installation.
- Sit back and wait for the installation to complete!
- Once the installation is finished click on the 'Open in Browser' button of your newly created environment (note that you might need to wait a few minutes before everything is ready).
See Using Reclaim Cloud for more details.
The Nectar Research Cloud (part of the Australian Research Data Commons) provides cloud computing services to researchers in Australian and New Zealand universities. Any university-affiliated researcher can log on to Nectar and receive up to 6 months of free cloud computing time. And if you need more, you can apply for a specific project allocation.
The GLAM Workbench is available in the Nectar Cloud as a pre-configured application. This means you can get it up and going without worrying about the technical infrastructure – just fill in a few details and you're away! To create an instance of this repository in the Nectar Cloud:
- Log in to the Nectar Dashboard using your university credentials.
- From the Dashboard choose Applications -> Browse Local.
- Enter 'GLAM' in the filter box and hit Enter, you should see the GLAM Workbench application.
- Click on the GLAM Workbench application's Quick Deploy button.
- Step through the various configuration options. Some options are only available if you have a dedicated project allocation.
- When asked to select a GLAM Workbench repository, choose 'Trove newspapers' from the dropdown.
- Complete the configuration and deploy your GLAM Workbench instance.
- The url to access your instance will be displayed once it's ready. Click on the url!
See Using Nectar for more details.
You can use Docker to run a pre-built computing environment on your own computer. It will set up everything you need to run the notebooks in this repository. This is free, but requires more technical knowledge – you'll have to install Docker on your computer, and be able to use the command line.
- Install Docker Desktop.
- Create a new directory for this repository and open it from the command line.
- From the command line, run the following command:
docker run -p 8888:8888 --name trove-newspapers -v "$PWD":/home/jovyan/work quay.io/glamworkbench/trove-newspapers repo2docker-entrypoint jupyter lab --ip 0.0.0.0 --NotebookApp.token='' --LabApp.default_url='/lab/tree/index.ipynb'
- It will take a while to download and configure the Docker image. Once it's ready you'll see a message saying that Jupyter Notebook is running.
- Point your web browser to
http://127.0.0.1:8888
See Using Docker for more details.
If you know your way around the command line and are comfortable installing software, you might want to set up your own computer to run these notebooks.
Assuming you have recent versions of Python and Git installed, the steps might be something like:
- Create a virtual environment, eg:
python -m venv trove-newspapers
- Open the new directory"
cd trove-newspapers
- Activate the environment
source bin/activate
- Clone the repository:
git clone https://github.com/GLAM-Workbench/trove-newspapers.git notebooks
- Open the new
notebooks
directory:cd notebooks
- Install the necessary Python packages:
pip install -r requirements.txt
- Run Jupyter:
jupyter lab
See the [GLAM Workbench for [more details](https://glam-workbench.net/getting-started/#using-python-on-your-own-computer.
See the GLAM Workbench or Zenodo for up-to-date citation details.
This repository is part of the GLAM Workbench.
If you think this project is worthwhile, you might like to sponsor me on GitHub.