Skip to content

a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information

License

Notifications You must be signed in to change notification settings

slub/entityfactspicturesharvester

Repository files navigation

entityfactspicturesharvester - EntityFacts pictures harvester

entityfactspicturesharvester is a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets* (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information

*) EntityFacts are "fact sheets" on entities of the Integrated Authority File (GND), which is provided by German National Library (DNB)

Usage

It eats EntityFacts sheets as line-delimited JSON records from stdin.

It retrieves and stores the pictures (/thumbnails) linked in the depiction information of the EntityFacts sheets one by one as file into the give directory.

entityfactspicturesharvester

optional arguments:
  -h, --help                           show this help message and exit
  • example:
    example: entityfactspicturesharvester < [INPUT LINE-DELIMITED JSON FILE WITH ENTITYFACTS SHEETS]
    

Note

Each (found) picture will be stored with the following pattern: image_[GND IDENTIFIER].[ORIGINAL FILE ENDING], e.g., image_116458461.jpg (GND identfier = 116458461; file ending = jpg)

Each (found) thumbnail will be stored with the following pattern: thumbnail_[GND IDENTIFIER].[ORIGINAL FILE ENDING], e.g., thumbnail_172323940.png (GND identfier = 172323940; file ending = png)

429 responses

If you run into '429' responses ("too many requests", see, e.g., HTTP status code 429 at httpstatuses.com), then you may try to reduce the number of threads of the thread pool schedulers (line 31 and 32) and/or enable (+ (optionally) setup) the time delays before emitting the picture/thumbnail URLs (line 68 and 146) and/or before doing a request (line 157).

Run

  • clone this git repo or just download the entityfactspicturesharvester.py file
  • run ./entityfactspicturesharvester.py
  • for a hackish way to use entityfactspicturesharvester system-wide, copy to /usr/local/bin

Install system-wide via pip

sudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF ENTITYFACTSPICTURESHARVESTER]

(which provides you entityfactssheetsharvester as a system-wide commandline command)

See Also

  • entityfactssheetsharvester - a commandline command (Python3 program) that retrieves EntityFacts sheets from a given CSV with GND identifiers and returns them as line-delimited JSON records
  • entityfactspicturesmetadataharvester - a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves the (Wikimedia Commons file) metadata of these pictures (as line-delimited JSON records)

About

a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages