Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: what are the actual "tests" this software performs? #42

Open
nickdos opened this issue Jun 8, 2023 · 2 comments
Open

Comments

@nickdos
Copy link

nickdos commented Jun 8, 2023

I've read through all the readme files in this project but I can't find the answer to a simple question: what are the tests that are run against the images?

I'm trying to determine if its worth the effort installing and running this software but there is not enough information provided (that I could easily find) to allow me to make this call.

All I found were a number of instances of:

The system checks that the files pass a number of tests and displays the results in a Shiny dashboard.

It would be very useful to have some indication of what these tests are and what sort of errors they identify. Either in terms of listing the various classes of tests or listing them all. I have no idea how many or how detailed they are.

Background: 700K herbarium sheets scanned by Picturae that have not been QA'ed or processed into any downline system yet.

@villanueval
Copy link
Member

Hi,

The software is still under heavy development, so it is not optimized for easy deployment yet. I'll make a note to add these details to the documentation.

Deploying it now requires knowledge of Python, Flask, Linux, and MySQL to be able to run it.

The tests are configurable per project, but include:

  • raw_pair: Is there a raw file in the 'raw_files_path'
  • valid_name: Filename is in the list of allowed names
  • unique_file: Name is not repeated in the project
  • dupe_elsewhere: Check name against the dupe_elsewhere table, from other projects
  • jhove: Run jhove validation
  • magick: Run validation test with Imagemagick
  • tifpages: Check the number of pages in the tif, typically a thumbnail
  • tif_compression: Check if tif is compressed using LZW
  • derivative: Check for a derivative file in 'derivative_files_path'

The mention of Shiny is from an old version. The current version uses a Python/Flask application for the dashboard.

We are doing a major overhaul this Summer and will have a more stable version in a few weeks.

Hope this helps.

@nickdos
Copy link
Author

nickdos commented Jun 29, 2023

Thanks for that, its very useful information. I'll keep an eye on the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants