Skip to content

To perform RNA-Seq data analysis and calculate length-scaled transcripts per million (TPM) values using the Salmon tool and the GenomicFeatures package in R.

Notifications You must be signed in to change notification settings

sivkri/rna-Seq-ScaledLengthTPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Calculate lengthscaledTPM from salmon output files

RNA-Seq Data Analysis using ScaledLengthTPM.R

This repository contains the code "ScaledLengthTPM.R" for performing RNA-Seq data analysis, specifically calculating length-scaled transcripts per million (TPM) values using the Salmon tool and the GenomicFeatures package in R.

Prerequisites

Before using the code, ensure that you have the following:

  • R programming language installed (version 3.5 or higher)
  • Required R packages installed: readr, tidyr, tximport, GenomicFeatures

Usage

  1. Clone the repository or download the "ScaledLengthTPM.R" file to your local machine.

  2. Set the working directory:

setwd("path/to/your/directory")

Replace "path/to/your/directory" with the appropriate path to the directory containing your RNA-Seq data and the "transcriptome.gtf" file.

  1. Install the required R packages:
install.packages(c("readr", "tidyr", "tximport", "GenomicFeatures"))
  1. Prepare your data:

    • Place your RNA-Seq data files in the "siva_SALMON_OUT/WT" directory.
    • Ensure that the quantification files generated by Salmon have the extension "quant.sf" and are located in the appropriate directories.
  2. Modify the code if necessary:

    • If your "transcriptome.gtf" file is named differently or located in a different directory, update the "gtf_file" variable in the code.
    • Adjust any other relevant paths or parameters according to your data and analysis requirements.
  3. Execute the code:

source("ScaledLengthTPM.R")

This will run the code and perform the RNA-Seq data analysis, generating the required outputs.

  1. Review the outputs:

    • The code will generate two CSV files: "tx2gene-WT.csv" and "txi_lengthscaledTPM_WT.csv".
    • "tx2gene-WT.csv" contains the mapping of transcript IDs to gene IDs.
    • "txi_lengthscaledTPM_WT.csv" contains the calculated length-scaled TPM values for each transcript.
  2. Interpret the results and use them for downstream analysis or visualization as needed.

Additional Information

  • The code assumes that the necessary files (RNA-Seq data, "transcriptome.gtf") are correctly organized in the provided directories. Double-check the paths and file names to ensure they match your setup.

  • For more information on the functions and packages used in the code, refer to the official documentation:

  • If you encounter any issues or have questions, feel free to open an issue in this repository.

Feel free to customize the README file according to your specific repository and provide additional instructions or information as needed.

About

To perform RNA-Seq data analysis and calculate length-scaled transcripts per million (TPM) values using the Salmon tool and the GenomicFeatures package in R.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages