Skip to content

IgemRNA is an open access toolbox for transcriptome data statistical and biochemical network topology-based analysis. IgemRNA was developed in the MATLAB environment in order to take advantage of the up-to-date and most commonly distributed GSM modelling tool Cobra Toolbox 3.0 and spreadsheet file capabilities.

BigDataInSilicoBiologyGroup/IgemRNA

Repository files navigation

IgemRNA

IgemRNA is transcriptome analysis MATLAB software with a graphical user interface (GUI) designed for the analysis of transcriptome data directly or the analysis of context-specific models generated from the provided model reconstruction, transciptome and optional medium composition data files. IgemRNA facilitates some of the Cobra Toolbox 3.0 constraint-based modelling functionalities for context-specific model generation and performing optimisation methods like FBA or FVA on them. Furthermore, IgemRNA can be used to validate transcriptomics data taking into account the interconnectivity of biochemical networks, steady state assumptions and Gene-Protein-Reaction (GPR) relationship. The result context-specific models can then be further analysed and compared to other phenotypes. The main advantage of IgemRNA software is that no previos programming skills are required due to the integrated user-friendly GUI which allows to select data files and data processing options in the IgemRNA form (see fig. 1).

image

fig. 1 - Full IgemRNA form

Folder structure description ********** The IgemRNA tool initially consists of 2 root folders (Data, Scripts), additional result folders are created when specific analysis tasks have been performed (Results non-optimization, Results post-optimization) and an IgemRNA.m file which calls the user graphical interface form. Data folder stores all input data files used for this tutorial and Scripts folder consists of all the script files that are being executed by the IgemRNA software according user-selected options in the IgemRNA form as well as the test cases provided in this demonstration. The Results non-optimization (results of direct transcriptome analysis) and Results post-optimization (results of context-specific model analysis) are folders where all the result files are saved.

Running IgemRNA software ********** In order to run IgemRNA the user must first have the MATLAB environment installed and started as well as have the IgemRNA software downloaded and the files extracted. Then the user can navigate to the root folder of IgemRNA in MATLAB where the IgemRNA.m file is located and run it.

Input data files ******** To access all options in the *IgemRNA* form, the user must supply input data files. This can be done by pressing the *Open* button in the corresponding file upload row and selecting the files via *File Explorer*.Transcriptomics data file** (see fig. 2B) is required to run non-optimization tasks but an additional model reconstruction file is required to access the post-optimization tasks. Medium composition data file (see fig. 2A) is optional, the selection of this data file does not extend the form, but specifies the provided exchange reaction constraints (upper and lower bounds) in the model for post-optimization tasks analysis.

Transcriptomics data and medium composition data can be provided in .xls or .xlsx formats, where columns are named respectively (see fig. 2) and sheet names correspond to dataset and phenotype names (dataSetName_phenotypeName). The model reconstruction file can be provided in .xls, .xml or other formats supported by Cobra Toolbox 3.0. In case the transcriptomics data consists of multiple datasets, the genes listed in each dataset must be in the same order!

image

fig. 2 - Input data file structure; A - Medium data file structure; B - Transcriptomics data file structure

Data files used for this tutorial can be found in the Data folder:

Non-optimization tasks **********

Non-optimization tasks include several transcriptomics data analysis tasks:

  • filter highly and lowly expressed genes,
  • filter lowly expressed genes,
  • filter up/down regulated genes between different phenotypes or data sets.

The results of each task are stored in a different folder within the Results non-optimization folder: Gene expression level comparison, Highly-lowly expressed genes, Lowly expressed genes (see fig. 3).

image

fig. 3 - Non-optimization results folder

******** 1. Filter highly and lowly expressed genes******** Non-optimization task Filter highly and lowly expressed genes generates result Excel files for each provided transcriptomics dataset. File names are assigned based on the provided dataset and phenotype names (from transcriptomics data), the applied thresholding approach (GT1, LT1, LT2, more on thresholding: ThresholdingGeneMappingOptions.docx) and provided global thresholds values (see fig. 4). Each result Excel file contains one sheet with the list of genes provided by transcriptomics data and 4 columns: GeneId, Data (expression value), ExpressionLevel and ThApplied. The ExpressionLevel column contains the expression levels determined based on the selected thresholding approach, provided global and for thresholding approaches LT1 and LT2 calculated local thresholds. Column ThApplied displays whether a local or a global threshold for a specific gene was applied (see fig. 5).

pic1 pic2
fig. 4 - Filter highly and lowly expressed genes folder fig. 5 - Filter highly and lowly expressed genes result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter highly and lowly expressed genes and press OK.

2. Run test case files from the Scripts folder of IgemRNA tool:

******** 2. Filter lowly expressed genes******** Non-optimization task Filter lowly expressed genes generates separate Excel result files for each dataset provided in transcriptomics data file. These result files contain filtered gene lists including genes with expression value below the given threshold value/s based on the selected thresholding approach. File names include dataset and phenotype names (from transcriptomics data file), the applied thresholding approach (GT1, LT1, LT2, more on thresholding: ThresholdingGeneMappingOptions.docx) and provided global threshold value/s (see fig. 6). Each result file consists of 4 columns GeneId, Data (expression value), ExpressionLevel (in this case, only Low) and ThApplied to show whether a global or a local threshold was applied for a specific gene. Only those genes with expression values below the given threshold (depending on which thresholding approach is applied) are listed in the result files (see fig. 7).

pic3 pic4
fig. 6 - Filter lowly expressed genes folder fig. 7 - Filter lowly expressed genes result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter lowly expressed genes and press OK.

2. Run test case files from the Scripts folder of IgemRNA tool:

******** 3. Filter up/down regulated genes between phenotypes******** Non-optimization task Filter up/down regulated genes between phenotypes generates result Excel files in the Gene expression level comparison folder. Result file name contains dataset and phenotype names for both transcriptomics datasets that have been compared (see fig. 8). Result Excel data files contain a full gene list from the target dataset, expression values for both target and source datasets as well as the determined up/down regulation status (see fig. 9).

pic5 pic6
fig. 8 - Filter up/down regulated genes between phenotypes folder fig. 9 - Filter up/down regulated genes between phenotypes result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file with multiple datasets, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter up/down regulated genes between phenotypes, choose phenotypes to compare, and press OK.

2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_findUpDownRegulatedGenes.m.

Post-optimization tasks ********** Context-specific models generated by IgemRNA post-optimization tasks as well as the results of the analysis performed on these models are saved in the Results post-optimization folder (see fig. 10). The results of post-optimization tasks are saved in folders with corresponding names: Flux-shifts, Non-flux reactions and Rate limiting reactions.

image

fig. 10 - Post-optimization results folder

There are two ways to generate models used in this tutorial:

1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options, choose one or more post-optimization tasks options and press OK.

2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_createContextSpecificModel.m

Since model generation and optimisation takes some time, especially for multiple transcriptomics datasets, context-specific model files used for this demonstration have already been provided in the Results post-optimization/Context-specific models folder.

******** 1. Filter non-flux reactions******** Post-optimization task Filter non-flux reactions performs an analysis on created context-specific models of the same phenotype, the name of the phenotype is included in the result file name (see fig. 11). Each result Excel file contains a list of reactions that carry no flux in the result context-specific model (lower and upper bound equals 0). An additional sheet for all the common non-flux reactions of the same phenotype is also provided in the sheet Common_(phenotypeName) (see fig. 12).

pic7 pic8
fig. 11 - Filter non-flux reactions folder fig. 12 - Filter non-flux reactions result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Filter non-flux reactions options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.

2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_filterNonFluxReactions.m

******** 2. Filter rate limiting reactions******** Post-optimization task Filter rate limiting reactions performs analysis on the generated context-specific models of the same phenotype, the phenotype name is included in the result files (see fig. 13).

image

fig. 13 - Filter rate limiting reactions folder

An analysis of these context-specific models have been performed in order to filter reactions that have reached the maximal flux value (MaxFlux calculated by FVA) based on the upper bound constraint set according to transcriptomics data and GPR associations. An additional sheet Common_(phenotypeName) for common rate limiting reactions has also been added to the result file where rate limiting reactions that are present in all context-specific models of the same phenotype (see fig. 14).

image

fig. 14 - Filter rate limiting reactions result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Filter rate limiting reactions options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.

2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_filterRateLimittingReactions.m.

******** 3. Flux shifts calculation between different phenotypesz******** Post-optimization task Calculate flux shifts between phenotypes compares flux values (calculated by FVA on the context-specific models) between two different phenotypes. In this demonstration flux shifts analysis task was performed on the S47D phenotype datasets using GT1 thresholding approach with the lower global threshold value of 0, phenotype SRR8994358_WT was compared to the wild type dataset SRR8994357_WT of the same thresholding approach and threshold values (see fig. 15).

image

fig. 15 - Calculate flux shifts between phenotypes folder

Each result file contains a full reaction list that corresponds to the Reaction List sheet in the provided model reconstruction file as well as additional columns for the calculation results: MinFlux and MaxFlux values (phenotype that is compared, fig. 16 L, M columns), MinFlux/MaxFlux(dataset name)_(phenotype name) phenotype that is used for comparison (fig. 16 N, O columns) and the MinFlux/MaxFluxRatio between these two phenotypes (fig. 16. P, Q columns).

image

fig. 16 - Calculate flux shifts between phenotypes result file

There are two ways to perform this test case:

1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Calculate flux shifts between phenotypes options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.

2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_calculateFluxShifts.m.

How to cite the IgemRNA software ******** [1] Kristina Grausa, Ivars Mozga, Karlis Pleiko, Agris Pentjuss,Integrative Gene Expression and Metabolic Analysis tool IgemRNA**, https://doi.org/10.1101/2021.08.02.454732

About

IgemRNA is an open access toolbox for transcriptome data statistical and biochemical network topology-based analysis. IgemRNA was developed in the MATLAB environment in order to take advantage of the up-to-date and most commonly distributed GSM modelling tool Cobra Toolbox 3.0 and spreadsheet file capabilities.

Topics

Resources

Stars

Watchers

Forks