Skip to content

A set of functions and a run scripts for users to run fgsea on the spectra scores output by scRNA-seq cNMF; can be modified for anything that can be preranked

License

Notifications You must be signed in to change notification settings

tbrunetti/cNMF_fgsea_wrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

cNMF_fgsea_wrapper

A set of wrapper functions and a run scripts for users to run fgsea on the spectra scores output by scRNA-seq cNMF; can be modified for anything that can be preranked.

Software

All code was tested on R v4.2.2 using the fgsea R Bioconductor package v1.22.0 on Linux OS flavor Ubuntu 20.04.5 LTS, focal.

Setup and Usage

  1. Clone this repository:
git clone https://github.com/tbrunetti/cNMF_fgsea_wrapper  

  1. In R, open fgsea_cNMF_ranks_run_script.R and at the top replace the second line with the path to where fgsea_cNMF_ranks_funcs.R is located on your system.
source("/path/to/cNMF_fgsea_wrapper/scripts/fgsea_cNMF_ranks_funcs.R")

  1. Update the section called user supplied arguments. Arguments you will need to supply are the following:
user input parameters Description
seed an integer to set the seed for fGSEA to help with data reproducibility
gmt_file_input the full path to the file name containig the gmt file you want to use for fGSEA. Examples: MSigDb provides downloadable gmt files. For custom made gmt files, please refer to the Broad Institute Wiki for how to format a gmt file.
output_prefix the full path and string prefix of where and the beginning file name you want the software to use to save various steps of each result so you can load the result in later without rerunning steps. Ex: ~/k3_allGeps, would save your results in the home directory and all files would have the k3_allGeps string in the save title of the files generated.
cnmf_spectra_scores_file This is one of the output files generated from the software cNMF. THe file to specify here is the one that contain the spectra scores. Ex: d4_cNMF.gene_spectra_score.k_5.dt_0_02.txt
adjp_thresh a floating point value between the range of 0-1. This is used in step2 of the run script and is the maximum adjusted_pvalue (non-inclusive) to retain for downstream analysis. Ex: setting this to 0.05, would be the same as applying an adjust-pvalue < 0.05.
nes_thresh This the the normalized enrichment score threshold, but it can only be set to one of 3 options: positive, negative, or both. For cNMF, I strongly recommend setting this to positive. Ex: setting this to positive, would only keep results that have a positive NES, meaning the those would be the top terms/pathways/genes driving your GEP.

Reusing prior results

You will notice each time you run steps 1 or 2, two files are automatically generated:

  • <output_prefix>_unfiltered.Rdat (saved output of step1)
  • <output_prefix>_filtered_padj_<adjp_thresh>_nes_<nes_thresh>.Rdat (saved output of step2)

Since these are saved, you can skip steps 1 and 2 and always go directly to 3 without rerunning. Additionally, if you wanted to apply different adjusted pvalue and NES filters, you can always just load in the step1 object and start at step 2. For example, if my step 1 file was named, hallmark_gmt_unfiltered.Rdat and my step2 file was named hallmark_gmt_filtered_padj_0.05_nes_positive.Rdat, I can run the following commands:

load(hallmark_gmt_unfiltered.Rdat)
load(hallmark_gmt_filtered_padj_0.05_nes_positive.Rdat)

You should see both data objects reappear in your variable space in Rstudio and you can go straight to step 3.

About

A set of functions and a run scripts for users to run fgsea on the spectra scores output by scRNA-seq cNMF; can be modified for anything that can be preranked

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages