Skip to content

cannin/enhance_nlp_interaction_network_gsoc2020

Repository files navigation

Enhance NLP Interaction Network

This repository contains the code used to get information required for analysis of Reactome failed queries.

Interface Consistency:
MTI_WEBApiConsistencyChecker INDRAConsistencyChecker

Utils Package:
CI Coverage

Requirements

For extraction of MeSH terms, an UMLS license/account is required. If you do not have account, register at https://utslogin.nlm.nih.gov/cas/login and set the credentials in the configuration yaml file.

Notebooks

  1. Python - Reactome_PMID_Metadata_Extraction , generates reactome_pmid_metadata.tsv , which contains metadata of PMIDs present in Reactome.
  2. Python - Reactome_Failed_Query_Analysis , generates failed_query_analysis_output.tsv, which contains details regarding the failed query terms.
  3. R - Reactome_Analysis , performs the analysis using above generated files, in case the above files are not available, they will be downloaded.

Supporting files

MTI WebAPI is used to get MeSH terms using their batch processing. Their code is in Java hence pyjnius is used to run the JAR files. The files are present in /lib.
These JAR files can be found in ziy/skr-webapi repository.

Following files are generated by the python notebooks, if the user only wants to perform Analysis using R code then they will be automatically downloaded from the links:

File Generated by Source
reactome_pmid_metadata.tsv Reactome_PMID_Metadata_Extraction.ipynb Link
failed_query_analysis_output.tsv Reactome_Failed_Query_Analysis.ipynb Link

Steps to follow

  1. Binder

  2. Make a copy of parameters_sample.yml named parameters.yml and set the configurations in it. Following are mandatory parameters to change in the YML file:

    • MTI Credentials, register at https://utslogin.nlm.nih.gov/cas/login

        mti:
          email_id : "example@example.com"
          username : "username"
          password : "password"
      
    • INDRA Database REST URL
      indra_db_rest_url : "SET_INDRA_DB_URL"

    • Reactome Parameters
      reactome_organism: "Homo sapiens"

    • User Query
      query: "MATN2"

    Please Note : If you want to skip Metadata file creation and only run the Analysis then skip step 3 and 4 and continue from step 5, the required files will be downloaded accordingly.

  3. Binder
    Execute Reactome_PMID_Metadata_Extraction.ipynb, this will generate reactome_pmid_metadata.tsv file, which is used in step 5,

  4. Binder
    Execute Reactome_Failed_Query_Analysis.ipynb, this will generate failed_query_analysis_output.tsv file, which is required in step 5

Do NOT perform Step 5 with partially generated output files from step 3 and 4. If you have partial file then delete those as the Rmd code with download missing files which are pre processed, if required.

  1. Curators' UI Binder

Please note: This step will require complete TSV files generated by Step 3 and 4, if these files are not present in your directory or you have skipped step 3,4 then they will be downloaded.
In RStudio Console enter following
rmarkdown::render('Reactome_Analysis.Rmd', output_file = 'analysis_output.nb.html')
OR
Open Reactome_Analysis.Rmd in RStudio and run all the chunks to generate the analysis using Ctrl + Alt + R or follow the image below.
Run All Steps

Output Files:

  • indra_output.html
    Contains Statements from INDRA containing interactions for the query term
  • analysis_output.nb.html
    Contains the analysis performed using Rmd file.
    This file will not be generated if you use 'Run All' approach in previous step. To get the HTML output follow the image below
    Knit->Knit to HTML

To run all notebooks and R code

  1. Installation, (required when run without Docker)
    pip install --no-cache-dir -r ./dependencies/requirements.txt
    R -e 'source("./dependencies/installPackages.R")'
    
  2. Make a copy of parameters_sample.yml named parameters.yml and set the configurations in it. Following are mandatory parameters to change in the YML file:
    • MTI Credentials, register at https://utslogin.nlm.nih.gov/cas/login

        mti:
          email_id : "example@example.com"
          username : "username"
          password : "password"
      
    • INDRA Database REST URL
      indra_db_rest_url : "SET_INDRA_DB_URL"

    • Reactome Parameters
      reactome_organism: "Homo sapiens"

    • User Query
      query: "MATN2"

  3. Execute the Python Notebooks and R file
    bash startup.sh path/to/parameters.yml

Output Files:

  • indra_output.html
    Contains Statements from INDRA containing interactions for the query term
  • analysis_output.nb.html
    Contains the analysis performed using Rmd file.

Hot to run locally using Docker Image pritishaw/reactome-failed-query-analysis

  1. Pull Docker Image
    docker run --name reactome-failed-query-analysis pritishaw/reactome-failed-query-analysis:latest
  2. Start Notebooks
    docker pull pritishaw/reactome-failed-query-analysis:latest
  3. Follow sequence of execution as mentioned above

Click to see terminal video
asciicast

How to run locally using jupyter/repo2docker (Docker)

  1. Installation
    pip install jupyter-repo2docker
  2. Build and Start Notebooks
    jupyter-repo2docker https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020
    Note: Docker needs to be running in local machine
  3. An URL with token will be printed in terminal, you can access Jupyter Notebooks and RStudio using that link as follows:
    Jupyter Notebooks : Open the link directly, all Notebooks will be visible at /notebooks
    RStudio : Go to /rstudio to open RStudio
  4. Follow sequence of execution as mentioned above

Parameters

Sample file can be found here parameters_sample.yml. Following configurations can be made using the file. For testing the Python notebooks, you can use the template parameters_test.yml, it has configuration for processing a small subset of the query terms.

# PYTHON NOTEBOOK PARAMETERS ----
# Register at https://utslogin.nlm.nih.gov/cas/login for MTI credentials
mti:
  email_id : "example@example.com"
  username : "username"
  password : "password"

pmid_threshold : 20
indra_db_rest_url : "SET_INDRA_DB_URL"

reactome_failed_terms_link : "https://gist.githubusercontent.com/PritiShaw/03ce10747835390ec8a755fed9ea813d/raw/cc72cb5479f09b574e03ed22c8d4e3147e09aa0c/Reactome.csv"
failed_query_threshold : null # null Indicates all terms will be processed
failed_query_hits_threshold : 10

reactome_pmid_url : "https://reactome.org/download/current/ReactionPMIDS.txt"

failed_query_output_file_path : "failed_query_analysis_output.tsv"

pmid_chunk_limit : 0
pmid_metadata_output_path : "reactome_pmid_metadata.tsv"

# R NOTEBOOK (Rmd) PARAMETERS ----

# Notebook
max_dt_table_display : 100

# Python environment
python_virtualenv : "/srv/venv"

# General
min_failed_search_hits : 10

# Rank Terms
top_n_reactome_journals : 10
min_indra_query_term_count : 0
min_indra_statement_count : 0
min_pmc_citation_count : 0
min_oc_citation_count : 0

# Reactome Parameters
reactome_organism: "Homo sapiens"

# User Query
query: "MATN2"

# Output
all_mesh_by_top_level_pathways_file : "all_mesh_by_top_level_pathways_full.txt"
top_level_pathways_file : "top_level_pathways.txt"
indra_stmt_html_file : "indra_output.html"
indra_stmt_json_file : "indra_output.json"

How to use papermill

Papermill is used to parameterize the Python notebooks , to use this, follow the steps below:

  1. Install from requirements.txt
    pip install --no-cache-dir -r ./dependencies/requirements.txt

  2. Setup Config YAML file
    Create a copy of parameters_sample.yml and make the changes.

  3. To Run the Notebooks
    papermill Reactome_Failed_Query_Analysis.ipynb failed_query_analysis.ipynb --log-output -k python3 -f PATH/TO/CONFIG/FILE.yml
    papermill Reactome_PMID_Metadata_Extraction.ipynb pmid_metadata.ipynb --log-output -k python3 -f PATH/TO/CONFIG/FILE.yml

Terminal Video
asciicast

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published