-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook for bulk downloading of AJCP material #58
Comments
I'd like to be able to download sections from the AJCP digitised collection. For instance, material from the Miscellaneous Series, London Missionary Society Collection. From here, it would be great to search by three categories- name, date, and geographical location. The example below shows the general data in the finding aid- Letters mainly from missionaries in the Society, Hervey and Samoan Islands and also the New Hebrides, Loyalty Islands and Savage Island (Niue), 1862 - 1863 (File Box 29) But what I'd really like is to harvest files based on the descriptive section of the file, as seen below. For my research, I would target information about Lawes. The correspondents include Charles Barff (Huahine), P.G. Bird (Savaii, Apia), Stephen M. Creagh (Uea, Lifu), George Drummond (Upolu), Samuel Ella (Aneiteum), John Geddie (Aneiteum), Henry Gee (Apia), W. Wyatt Gill (Mangaia), James L. Green (Taha'a), William Howe (Papeete), John Jones (Mare), Ernst R.W. Krause (Rarotonga), William G. Lawes (Savage Island), Samuel Macfarlane (Lifu), George Morris (Raiatea), Archibald W. Murray (Malua), Henry Nisbet (Malua), George Platt (Raiatea), Thomas Powell (Tutuila), George Pratt (Matautu, Savage Island),Carl Schmidt (Apia), James Sleigh (Lifu) and George Turner (Sydney). |
So to break this down:
Is that what you'd like? |
Notes to self: Searching within a finding aid fires off a POST request that returns an HTML fragment. The params are something like this: params = {"faIdentifier":"nla.obj-1126174847","term":"lawes","nuc":"ANL:AJCP","facets":"all","zone":"collection","selectedFacets":[],"pageSize":10,"cursorMark":"AoErc3UyMzcxMDI4Nzk=","start":1,"previous":["*"]} And are posted as json to Results are HTML so would need to scrape identifiers from the HTML for further processing. |
Worth noting too that dezoomify (https://dezoomify.ophir.dev/) works a treat in downloading high-resolution versions of pages in the AJCP. |
Thanks for the dezoomify link, Tim. Bart mentioned he spoke with you recently and just commented on how good the images are! As for the query above, I think that sounds good! |
See: https://twitter.com/MichWatsonOz/status/1521725616735014912
The text was updated successfully, but these errors were encountered: