Skip to content

Smithsonian/Osprey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Osprey

Osprey is a system that checks the images produced by vendors in mass digitization projects by the Collections Digitization program of the Digitization Program Office, OCIO, Smithsonian.

DPO Logo

https://dpo.si.edu/

The system checks that the files pass a number of tests and displays the results in a web dashboard. This allows the vendor, the project manager, and the unit to monitor the progress and detect problems early.

Osprey Dashboard

This repo hosts the code for the dashboard, which presents the progress in each project and highlights any issues in the files.

Main dashboard

Example Project

File Checks

The Osprey Worker runs in Linux and updates the dashboard via an API (see below). The Worker can be configured to run one or more of these checks:

  • unique_file - Unique file name in the project
  • raw_pair - There is a raw file paired in a subfolder (e.g. tifs and raws (.eip/.iiq) subfolders)
  • jhove - The file is a valid image according to JHOVE
  • tifpages - The tif files don't contain an embedded thumbnail, or more than one image per file
  • magick - The file is a valid image according to Imagemagick
  • tif_compression - The tif file is compressed using LZW to save disk space

Other file checks can be added. Documentation to be added.

Setup

The app runs in Python using the Flask module and requires a MySQL database. Install and populate the database according to the instructions in database/tables.sql.

To install the required environment and modules to the default location (/var/www/app):

mkdir /var/www/app
cd /var/www/app
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt

Then, test the API by running the main file:

./app.py

or:

python3 app.py

which will start the service at http://localhost:5000/.

Update permissions:

deactivate
sudo chown -R apache:apache /var/www/app

Setup apache2/httpd as described in the web_server folder

API

The application includes an API with these routes:

  • /api/: Print available routes in JSON
  • /api/files/<file_id>: Get the details of a file by its file_id
  • /api/folders/<folder_id>: Get the details of a folder and the list of files
  • /api/folders/qc/<folder_id>: Get the details of a folder and the list of files from QC
  • /api/projects/: Get the list of projects in the system
  • /api/projects/<project_alias>: Get the details of a project by specifying the project_alias
  • /api/reports/<report_id>/: Get the data from a project report

Components

The system has two related repos:

  • Osprey Worker - Python tool that runs a series of checks on folders. Results are sent to the dashboard via an HTTP API to be saved to the database.
  • Osprey Misc - Database and scripts.

License

Available under the Apache License 2.0. Consult the LICENSE file for details.

About

Dashboard that displays the file validation results in mass digitization projects. Digitization Program Office, OCIO, Smithsonian.

Topics

Resources

License

Stars

Watchers

Forks