Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook for bulk downloading of AJCP material #58

Open
wragge opened this issue May 4, 2022 · 5 comments
Open

Notebook for bulk downloading of AJCP material #58

wragge opened this issue May 4, 2022 · 5 comments

Comments

@wragge
Copy link
Contributor

wragge commented May 4, 2022

See: https://twitter.com/MichWatsonOz/status/1521725616735014912

@dleetalb
Copy link

dleetalb commented May 6, 2022

I'd like to be able to download sections from the AJCP digitised collection.

For instance, material from the Miscellaneous Series, London Missionary Society Collection.

From here, it would be great to search by three categories- name, date, and geographical location.

The example below shows the general data in the finding aid-

Letters mainly from missionaries in the Society, Hervey and Samoan Islands and also the New Hebrides, Loyalty Islands and Savage Island (Niue), 1862 - 1863 (File Box 29)

But what I'd really like is to harvest files based on the descriptive section of the file, as seen below. For my research, I would target information about Lawes.

The correspondents include Charles Barff (Huahine), P.G. Bird (Savaii, Apia), Stephen M. Creagh (Uea, Lifu), George Drummond (Upolu), Samuel Ella (Aneiteum), John Geddie (Aneiteum), Henry Gee (Apia), W. Wyatt Gill (Mangaia), James L. Green (Taha'a), William Howe (Papeete), John Jones (Mare), Ernst R.W. Krause (Rarotonga), William G. Lawes (Savage Island), Samuel Macfarlane (Lifu), George Morris (Raiatea), Archibald W. Murray (Malua), Henry Nisbet (Malua), George Platt (Raiatea), Thomas Powell (Tutuila), George Pratt (Matautu, Savage Island),Carl Schmidt (Apia), James Sleigh (Lifu) and George Turner (Sydney).

@wragge
Copy link
Contributor Author

wragge commented May 6, 2022

So to break this down:

  • You'd provide the notebook with a finding aid url and a search term
  • The notebook would then search for the term within the finding aid, getting a list of matching boxes/item groups
  • The notebook would then download all of the images in those boxes

Is that what you'd like?

@wragge
Copy link
Contributor Author

wragge commented May 6, 2022

Notes to self:

Searching within a finding aid fires off a POST request that returns an HTML fragment.

The params are something like this:

params = {"faIdentifier":"nla.obj-1126174847","term":"lawes","nuc":"ANL:AJCP","facets":"all","zone":"collection","selectedFacets":[],"pageSize":10,"cursorMark":"AoErc3UyMzcxMDI4Nzk=","start":1,"previous":["*"]}

And are posted as json to https://nla.gov.au/tarkine/nla.obj-1126174847/findingaid/search Results are paginated -- increment the start value. So next page would be "start": 11. Looks like the number of results per page can be changed.

Results are HTML so would need to scrape identifiers from the HTML for further processing.

@wragge
Copy link
Contributor Author

wragge commented May 30, 2022

Worth noting too that dezoomify (https://dezoomify.ophir.dev/) works a treat in downloading high-resolution versions of pages in the AJCP.

@dleetalb
Copy link

Thanks for the dezoomify link, Tim. Bart mentioned he spoke with you recently and just commented on how good the images are!

As for the query above, I think that sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants