/
.zenodo.json
43 lines (43 loc) · 8.38 KB
/
.zenodo.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
"language": "eng",
"license": "MIT",
"title": "GLAM-Workbench/trove-newspapers",
"related_identifiers": [
{
"scheme": "url",
"identifier": "https://github.com/GLAM-Workbench/trove-newspapers/tree/v1.3.4",
"relation": "isDerivedFrom",
"resource_type": "software"
},
{
"scheme": "url",
"identifier": "https://glam-workbench.github.io/trove-newspapers/",
"relation": "isDocumentedBy",
"resource_type": "publication-softwaredocumentation"
},
{
"scheme": "url",
"identifier": "https://glam-workbench.github.io/",
"relation": "isPartOf",
"resource_type": "other"
}
],
"version": "v1.3.4",
"upload_type": "software",
"keywords": [
"digital humanities",
"Trove",
"Jupyter",
"newspapers",
"GLAM Workbench"
],
"publication_date": "2022-06-26",
"creators": [
{
"orcid": "0000-0001-7956-4498",
"name": "Sherratt, Tim"
}
],
"access_right": "open",
"description": "<p>Current version: <a href=\"https://github.com/GLAM-Workbench/trove-newspapers/releases/tag/v1.3.4\">v1.3.4</a></p> <p>This repository contains Jupyter notebooks to work with data from Trove’s newspapers zone. For more information see the <a href=\"https://glam-workbench.net/trove-newspapers/\">Trove Newspapers</a> section of the GLAM Workbench.</p> <h2 id=\"notebook-topics\">Notebook topics</h2> <h3 id=\"trove-newspapers-in-context\">Trove newspapers in context</h3> <ul> <li><strong>Visualise the total number of newspaper articles in Trove by year and state</strong> – explore how Trove’s newspaper articles are distributed over time, and by state</li> <li><strong>Analyse rates of OCR correction</strong> – explore patterns in OCR text correction; how many corrections are there and where have they been made?</li> <li><strong>Finding non-English newspapers in Trove</strong> – use automated language detection to identify non-English language newspapers in Trove</li> <li><strong>Beyond the copyright cliff of death</strong> – find newspapers with content published after 1954</li> <li><strong>Gathering historical data about the addition of newspaper titles to Trove</strong> – find when newspaper titles were added to Trove by extracting lists from web archives</li> </ul> <h3 id=\"visualising-searches\">Visualising searches</h3> <ul> <li><strong>QueryPic</strong> – simple app to visualise newspaper searches over time, this is the latest version with many new features</li> <li><strong>QueryPic Deconstructed</strong> – an older version of QueryPic that lets you build queries using keywords, states, or newspapers</li> <li><strong>Visualise Trove newspaper searches over time</strong> – use facets to slice up newspaper search results and visualise over time</li> <li><strong>Map Trove newspaper results by state</strong> – create a choropleth map to visualise search results by state</li> <li><strong>Map Trove newspaper results by place of publication</strong> – links newspapers to their place of publication and maps the results</li> <li><strong>Map Trove newspaper results by place of publication over time</strong> – adds a time dimension to the example above</li> </ul> <h3 id=\"harvesting-data\">Harvesting data</h3> <p>See the <a href=\"https://glam-workbench.net/trove-harvester/\">Trove Newspaper and Gazette Harvester</a> if you want to harvest all the articles from a search.</p> <ul> <li><strong>Harvest information about newspaper issues</strong> – get information about available issues for each newspaper from the Trove API</li> <li><strong>Harvest the issues of a newspaper as PDFs</strong> – harvest available issues of a newspaper as PDFs</li> <li><strong>Harvest Australian Women’s Weekly covers (or the front pages of any newspaper)</strong> – harvest the front pages of any newspaper, including covers from the Australian Women’s Weekly</li> </ul> <h3 id=\"useful-tools\">Useful tools</h3> <ul> <li><strong>Save a Trove newspaper article as an image</strong> – grabs the page on which an article was published, and then crops the page image to the boundaries of the article to create a complete, intact image of the article as it was originally published</li> <li><strong>Download a page image</strong> – a simple app that lets you download page images as complete, high-resolution JPG files</li> <li><strong>Generate an article thumbnail</strong> – generate a nice square thumbnail image for a newspaper article</li> <li><strong>Upload Trove newspaper articles to Omeka-S</strong> – steps through the process of uploading Trove newspaper articles to your own Omeka-S instance via the API</li> </ul> <h3 id=\"tips-and-tricks\">Tips and tricks</h3> <ul> <li><strong>Today’s news yesterday</strong> – uses the <code>date</code> index and the <code>firstpageseq</code> parameter to find articles from exactly 100 years ago that were published on the front page</li> <li><strong>Create a Trove OCR corrections ticker</strong> – uses the <code>has:corrections</code> parameter to get the total number of newspaper articles with OCR corrections</li> <li><strong>Get a list of Trove newspapers that doesn’t include government gazettes</strong> – workaround for a problem with the <code>newspaper/titles</code> endpoint of the API</li> <li><strong>Get the page coordinates of a digitised newspaper article from Trove</strong> – demonstrates how to find the coordinates of a newspaper article on a digitised page</li> </ul> <h3 id=\"get-creative\">Get creative</h3> <ul> <li><strong>Make composite images from lots of Trove newspaper thumbnails</strong> – creates thumbnails from a search and compiles them into a mega image</li> <li><strong>Create ‘scissors and paste’ messages from Trove newspaper articles</strong> – snip words out of page images and compile them into the message of your choice</li> <li><strong>Create large composite images from snipped words</strong> – harvest multiple versions of a list of words and compile them all into one big image</li> </ul> <p>See the <a href=\"https://glam-workbench.github.io/trove-newspapers/\">GLAM Workbench for more details</a>.</p> <h3 id=\"data-files\">Data files</h3> <ul> <li>CSV formatted lists of newspaper titles in Trove <ul> <li><a href=\"trove_newspaper_titles_2009_2021.csv\">trove_newspaper_titles_2009_2021.csv</a> – complete dataset of captures and titles</li> <li><a href=\"trove_newspaper_titles_first_appearance_2009_2021.csv\">trove_newspaper_titles_first_appearance_2009_2021.csv</a> – filtered dataset, showing only the first appearance of each title / place / date range combination</li> <li>There is also an <a href=\"https://gist.github.com/wragge/7d80507c3e7957e271c572b8f664031a\">alphabetical list of newspaper titles</a>, showing approximately when they first appeared in Trove.</li> </ul></li> <li><a href=\"data/aww-issues.csv\">CSV formatted list of Australian Women’s Weekly issues, 1933 to 1982</a></li> <li><a href=\"https://cloudstor.aarnet.edu.au/plus/s/NaKjoKNFOGXXDNN\">Australian Women’s Weekly front covers, 1933 to 1982</a> (2,566 images on Cloudstor) For easy browsing, I’ve compiled the images into a set of PDF files, one for each decade, available from Dropbox: <ul> <li><a href=\"https://www.dropbox.com/s/0j6zpeuw6tbey5k/aww-1933-1939.pdf?dl=0\">1933 to 1939</a></li> <li><a href=\"https://www.dropbox.com/s/y1he8dd6h655weu/aww-1940-1949.pdf?dl=0\">1940 to 1949</a></li> <li><a href=\"https://www.dropbox.com/s/i9gp9i51nofmlqo/aww-1950-1959.pdf?dl=0\">1950 to 1959</a></li> <li><a href=\"https://www.dropbox.com/s/2of63tovcnphijo/aww-1960-1969.pdf?dl=0\">1960 to 1969</a></li> <li><a href=\"https://www.dropbox.com/s/f2yxpg8u4dx5uf2/aww-1970-1979.pdf?dl=0\">1970 to 1979</a></li> <li><a href=\"https://www.dropbox.com/s/xanohtas1fi7eu4/aww-1980-1982.pdf?dl=0\">1980 to 1982</a></li> </ul></li> <li><a href=\"https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97\">Trove newspapers with non-English language content</a></li> <li><a href=\"newspapers_post_54.csv\">Trove newspapers with articles published after 1954</a></li> </ul> <h2 id=\"cite-as\">Cite as</h2> <p>See the GLAM Workbench or <a href=\"https://doi.org/10.5281/zenodo.3521724\">Zenodo</a> for up-to-date citation details.</p> <hr /> <p>This repository is part of the <a href=\"https://glam-workbench.github.io/\">GLAM Workbench</a>.<br /> If you think this project is worthwhile, you might like <a href=\"https://github.com/sponsors/wragge?o=esb\">to sponsor me on GitHub</a>.</p>"
}