Skip to content

UMEssen/DeidentiFHIR-Pipeline

Repository files navigation

DeidentiFHIR-Pipeline

With the DeidentiFHIR-Pipeline, you can transfer FHIR based data from one source (e.g. a FHIR server) to a target (e.g. a FHIR server) and pseudonymize the data in between. Pseudonymization is based on the DeidentiFHIR library. The transfer consists of four steps:

  1. Cohort selection: Select the IDs of FHIR resources (e.g. Patients) that should be transfered
  2. Data selection: Fetch FHIR data that belongs to the selected cohort IDs
  3. Pseudonymization: Pseudonymize the FHIR data based on DeidentiFHIR profiles
  4. Data storing: Store the data in a target system

There can be multiple implementations for each step, e.g. a cohort selection could be based on consent policies stored in gICS or based on a list of IDs (e.g. Patient Identifiers). Which implementation should be used, can be configured in the application.yaml.
Available implementations can be found in the transfer folder.

Architecture

Quickstart

Log in to Github Container Registry:

docker login ghcr.io

Start with:

docker compose up -d

Post testbundles to FHIR server:

./post-testbundles-to-fhir-server.sh

Start transfer:

./start-configured-process.sh

Check if transfer was completed:

curl http://localhost:8042/transfer/all

Check if bundle was transfered to the other FHIR server and a pseudonymized Patient resource exists:

curl http://localhost:8083/fhir/Patient

Configuration

In application.yaml, see example in src/main/resources/application.yaml:

projects:
  "test-project1":
    parallelism: 16
    cohort-selection:
      via-ids:
        ids: ["1234"]
    data-selection:
      fhir-server:
        url: http://localhost:8082/fhir
        fhirIdQueryPlaceholder: <id>
        fhirIdQuery: Patient?identifier=<id>
        bundleQueryPlaceholder: <fhir-id>
        bundleQuery: Patient/<fhir-id>/$everything?_count=100000
    pseudonymization:
      deidentifhir:
        scraperConfigFile: <path/to/deidentiFHIR.profile>
        pseudonymizationConfigFile: <path/to/deidentiFHIR.profile>
        generateIDScraperConfig: true
        dateShiftingInMillis: 2419200000 # equals +/-14 days
        hashmap: # Pseudonyms are stored in local hashmap
          domain: test-project-domain
#        gpas: # Alternative: Pseudonyms are stored in gPAS
#          domain: test-project1-gpas-domain
#          gpasServiceWsdlUrl: http://localhost:8081/gpas/gpasService?wsdl
#          domainServiceWsdlUrl: http://localhost:8081/gpas/DomainService?wsdl
    data-storing:
      fhir-server:
        url: http://localhost:8083/fhir

What is needed for compilation:

  • Java 21
  • Maven
  • docker
  • docker compose

Configure maven for the Github Registry:

Generate a Github classic token with read:packages permissions and add this to your .m2/settings.xml:

<servers>
  ...
  <server>
    <id>github-ume</id>
    <username>insert-github-username-here</username>
    <password>insert-token-here</password>
  </server>
  ...
</servers>

Docs and details can be found here: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-apache-maven-registry

Start with:

mvn spring-boot:run

Compile project with:

./build-jar.sh

Run with:

java -jar target/deidentifhir-pipeline-0.1.3.jar --spring.config.location=src/main/resources/application.yaml

Create docker image with:

./build-docker-image.sh

Endpoints

POST:
See start-configured-process.sh.
You get back a UUID which you can use to get the status of the transfer via GET requests.

GET:

Endpoint documentation is also available as swagger-ui under localhost:8042/swagger-ui/index.html.

Development setup

See integrationtests/docker-compose.yml

Integrationtests

Integrationtests are executed with hurl.
Run with:

cd integrationtests && ./start-integrationtests.sh

Examples

Post bundles to FHIR server:

./post-all-bundles-to-fhir-server.sh

Start transfer:

./start-all-bundles-process.sh

Check transfer with:
http://localhost:8042/transfer/all