Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 2.79 KB

File metadata and controls

55 lines (36 loc) · 2.79 KB

Cohort Packager

Primary Author: Ruairidh MacLeod

Contents

  1. Overview
  2. Setup / Installation
  3. Queue Settings
  4. Config
  5. Expectations
  6. Reports

1. Overview

Collects all information regarding an extraction job, and monitors the filesystem for the anonymised files. Persists all information to a MongoDB collection.

Produces validation reports for each extraction suitable for review by research coordinators before the extraction files are released. See reports section. Reports are created automatically when an extraction is detected as being complete, and can also be manually recreated on the CLI by passing the -r or --recreate-reports flag with the corresponding extraction GUID.

2. Setup / Installation

  • Clone the project and build. Any NuGet dependencies should be automatically downloaded
  • Setup a yaml file with the configuration for your environment
  • Run CohortPackager.exe with your yaml config

3. Exchange and Queue Settings

Read/Write Type Config setting
Read ExtractRequestMessage DicomReprocessorOptions.ExtractRequestInfoOptions
Read ExtractRequestInfoMessage DicomReprocessorOptions.ExtractFilesInfoOptions
Read ExtractFileStatusMessage DicomReprocessorOptions.AnonImageStatusOptions

4. Config

YAML Section Purpose
JobWatcherTickrate How often the filesystem is checked for anonymised files (in seconds)

5. Expectations

Errors are logged as normal for a MicroserviceHost

6. Reports

When an extraction is completed, a set of reports are created detailing any errors or validation failures relating to the set of files that have been produced.

For a standard extraction, 4 files are produced:

  • README.md - A summary file containing metadata about the extraction job
  • rejected_files.csv - A list of any requested IDs which generated a rejection (a file was blocked etc.)
  • processing_errors.csv - A summary of any errors from the anonymiser or other components in the pipeline. This report should be inspected by a developer before data are released
  • verification_failures.csv - A full listing of all Failures generated by IsIdentifiable when scanning files after anonymisation

For an identifiable extraction, the verification_failures.csv is not produced.