Primary Author: Ruairidh MacLeod
Collects all information regarding an extraction job, and monitors the filesystem for the anonymised files. Persists all information to a MongoDB collection.
Produces validation reports for each extraction suitable for review by research coordinators before the extraction files are released. See reports section. Reports are created automatically when an extraction is detected as being complete, and can also be manually recreated on the CLI by passing the -r
or --recreate-reports
flag with the corresponding extraction GUID.
- Clone the project and build. Any NuGet dependencies should be automatically downloaded
- Setup a yaml file with the configuration for your environment
- Run
CohortPackager.exe
with your yaml config
Read/Write | Type | Config setting |
---|---|---|
Read | ExtractRequestMessage | DicomReprocessorOptions.ExtractRequestInfoOptions |
Read | ExtractRequestInfoMessage | DicomReprocessorOptions.ExtractFilesInfoOptions |
Read | ExtractFileStatusMessage | DicomReprocessorOptions.AnonImageStatusOptions |
YAML Section | Purpose |
---|---|
JobWatcherTickrate | How often the filesystem is checked for anonymised files (in seconds) |
Errors are logged as normal for a MicroserviceHost
When an extraction is completed, a set of reports are created detailing any errors or validation failures relating to the set of files that have been produced.
For a standard extraction, 4 files are produced:
README.md
- A summary file containing metadata about the extraction jobrejected_files.csv
- A list of any requested IDs which generated a rejection (a file was blocked etc.)processing_errors.csv
- A summary of any errors from the anonymiser or other components in the pipeline. This report should be inspected by a developer before data are releasedverification_failures.csv
- A full listing of allFailure
s generated by IsIdentifiable when scanning files after anonymisation
For an identifiable extraction, the verification_failures.csv
is not produced.