photogrammar

Code for getting and exploring the photogrammar data.

For example, we download a list of all the photo ids (these uniquely define the urls for scraping the rest of the data, by running the following code:

python src/get_photo_ids.py

This creates a file pickle/all_urls.p, a python pickle file. Now we can run the code to download MARC records from the Library of Congress website for all photo ids in the all_urls.p file. This is done by:

python src/get_marc_records.py

When finished, there should be files in the marc_records directory, such as 'marc_recordsfsa1997000988.csv'. Now, to finish the first stage of the scrape, we download the image urls using a similar syntax:

python src/get_img_urls.py

Which will create text files in the directory 'img_url' such as 'img_url/fsa1997000987.txt' which contain the urls of the photo images.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
marc_records		marc_records
pickle		pickle
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marc_records

marc_records

pickle

pickle

src

src

LICENSE

LICENSE

README.md

README.md

Repository files navigation

photogrammar

About

Releases

Packages

Languages

License

nolauren/photogrammar

Folders and files

Latest commit

History

Repository files navigation

photogrammar

About

Resources

License

Stars

Watchers

Forks

Languages