Harvesting data from NSW State Archives database #45

wragge · 2021-09-30T00:27:18Z

See: https://ozglam.chat/t/nsw-state-archive-record-series-without-index/648

My initial response:

It’s not an easy one. The Primo system that NSWSA now uses is not very friendly to scraping data. You can get data for individual items as XML once you know their id, but I can’t figure out how to get a list of results. The interface uses a Javascript framework to load the search results, so the results aren’t ‘in’ the HTML of the page. It’s so annoying (the new Trove interface does this too…).

I think the only way around this would be to use something like Selenium which mimics a web browser and would allow the Javascript details to load, or leverage Zotero’s Primo translator – that might at least get 50 results at a time…

wragge added the data source Suggestions for new data sources to add the Workbench. label Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harvesting data from NSW State Archives database #45

Harvesting data from NSW State Archives database #45

wragge commented Sep 30, 2021

Harvesting data from NSW State Archives database #45

Harvesting data from NSW State Archives database #45

Comments

wragge commented Sep 30, 2021