Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting data from NSW State Archives database #45

Open
wragge opened this issue Sep 30, 2021 · 0 comments
Open

Harvesting data from NSW State Archives database #45

wragge opened this issue Sep 30, 2021 · 0 comments
Labels
data source Suggestions for new data sources to add the Workbench.

Comments

@wragge
Copy link
Contributor

wragge commented Sep 30, 2021

See: https://ozglam.chat/t/nsw-state-archive-record-series-without-index/648

My initial response:

It’s not an easy one. The Primo system that NSWSA now uses is not very friendly to scraping data. You can get data for individual items as XML once you know their id, but I can’t figure out how to get a list of results. The interface uses a Javascript framework to load the search results, so the results aren’t ‘in’ the HTML of the page. It’s so annoying (the new Trove interface does this too…).

I think the only way around this would be to use something like Selenium which mimics a web browser and would allow the Javascript details to load, or leverage Zotero’s Primo translator – that might at least get 50 results at a time…

@wragge wragge added the data source Suggestions for new data sources to add the Workbench. label Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source Suggestions for new data sources to add the Workbench.
Projects
None yet
Development

No branches or pull requests

1 participant