Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

Export a subset of data from a TranSMART instance based on a query, and upload them to a different (empty) TranSMART instance using transmart-copy. Short for TranSMART hypercube dicer.

License

thehyve/transmart-hyper-dicer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transmart-hyper-dicer slicing tool for tranSMART

Build status codecov PyPI PyPI - Downloads

transmart-hyper-dicer is a data slicing tool that reads data from one TranSMART instance and uploads it to another.

⚠️ Note: this is a very preliminary version, still under development. Issues can be reported at https://github.com/thehyve/transmart-hyper-dicer/issues.

Configuration

Connection to Keycloak identity provider and tranSMART is configured by setting the environment variables below:

Variable Description
TRANSMART_URL URL of the TranSMART back-end application e.g. https://transmart.example.com
KEYCLOAK_SERVER_URL URL of the Keycloak identity provider e.g. https://keycloak.example.com
KEYCLOAK_REALM Keycloak realm, e.g. dev
KEYCLOAK_CLIENT_ID Keycloak client ID, e.g. transmart-client
OFFLINE_TOKEN An offline token used used as a refresh token in order to communicate with TranSMART
VERIFY_CERT Either a boolean, in which case it controls whether the server’s TLS certificate is verified, or a string, in which case it must be a path to a CA bundle to use. Defaults to True.

In order to generate an offline token for USERNAME user, the following curl command can be used. To get the token the user needs to have the role mapping for the realm-level: offline_access. Before using the command you have to substitute words in uppercase with proper ones.

curl \
  -d 'client_id=KEYCLOAK_CLIENT_ID' \
  -d 'username=USERNAME' \
  -d 'password=PASSWORD' \
  -d 'grant_type=password' \
  -d 'scope=offline_access' \
  'https://KEYCLOAK_SERVER_URL/auth/realms/KEYCLOAK_REALM/protocol/openid-connect/token'

The value of the refresh_token field in the response is the offline token.

All the variables can be specified in the .env file as key-value pairs. They will be automatically set as environment variables, when starting the application. Example of the .env file:

KEYCLOAK_CLIENT_ID=transmart-client
KEYCLOAK_SERVER_URL=https://keycloak.example.com
KEYCLOAK_REALM=dev
OFFLINE_TOKEN=<refresh_token value from the curl response>
TRANSMART_URL=https://transmart.example.com

Installation

The package requires Python 3.6+.

To install transmart-hyper-dicer, do:

pip install transmart-hyper-dicer

Or from source:

git clone https://github.com/thehyve/transmart-hyper-dicer.git
cd transmart-hyper-dicer
pip install .

Run tests (including coverage) with:

python setup.py test

Usage

Read subset of data from the configured tranSMART instance, based on the constraint specified in an input JSON file and write the output in transmart-copy format to /path/to/output. The output directory should be empty of not existing (then it will be created).

Input constraint has to be a valid tranSMART constraint. Example of <input.json> file content:

{
  "type": "study_name",
  "studyId": "EHR"
}

Run:

transmart-hyper-dicer <input.json> /path/to/output

This generates the directories i2b2metadata and i2b2demodata in the output directory. The generated data can be loaded using transmart-copy:

# Download transmart-copy:
curl -f -L https://repo.thehyve.nl/service/local/repositories/releases/content/org/transmartproject/transmart-copy/17.1-HYVE-6.2/transmart-copy-17.1-HYVE-6.2.jar -o transmart-copy.jar
# Load data
PGUSER=tm_cz PGPASSWORD=tm_cz java -jar transmart-copy.jar -d output

Limitations

transmart-hyper-dicer reads all selected data in memory at once, limiting the amount of data that can be processed to what fits in memory. Therefore it is not suited for very large data sets.

Package management and dependencies

This project uses pip for installing dependencies and package management.

  • Dependencies should be added to setup.py in the install_requires list.

Acknowledgement

This project was funded by the German Ministry of Education and Research (BMBF) as part of the project DIFUTURE - Data Integration for Future Medicine within the German Medical Informatics Initiative (grant no. 01ZZ1804D).

License

Copyright (c) 2019 The Hyve B.V.

The Transmart Hyper Dicer is licensed under the MIT License. See the file LICENSE.

About

Export a subset of data from a TranSMART instance based on a query, and upload them to a different (empty) TranSMART instance using transmart-copy. Short for TranSMART hypercube dicer.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages