Skip to content

Data package for Rat liver RNA-seq data from SEQC Toxicogenomics project

Notifications You must be signed in to change notification settings

ttdtrang/data-rnaseq-Rnor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data package for Rat liver RNA-seq data from SEQC Toxicogenomics project

Sources

  • Experimental data were generated in SEQC Toxicogenomics project. Original citation: Gong B, Wang C, Su Z, Hong H et al. Transcriptomic profiling of rat liver samples in a comprehensive study design by RNA-Seq. Sci Data 2014;1:140021. PMID: 25977778
  • Processing:
    • Sequencing reads were downloaded from SRA, at PRJNA239561
    • Quantification was done by 2 alternative workflows:
      1. Using Kallisto 0.45.0 with an index built from Ensembl Rat genome (cdna) Rnor_6.0.99 and 92 ERCC sequences
      2. Using STAR 2.7.1a to align against Ensembl Rat genome Rnor_6.0.99 and 92 ERCC sequences, and RSEM to estimate abundance levels for genes/isoforms.

Usage

Install the package, import the library and load the data set

devtools::install_github('ttdtrang/data-rnaseq-Rnor')
library(data.rnaseq.Rnor)
data(rnor.rnaseq.gene.kallisto)
dim(rnor.rnaseq.gene@assayData$exprs)

The package includes 4 data sets.

rnor.rnaseq.gene.kallisto
rnor.rnaseq.transcript.kallisto
rnor.rnaseq.gene.star_rsem
rnor.rnaseq.transcript.star_rsem

For Kallisto workflow, transcript-level counts are direct output from Kallisto while gene-level counts are the total counts of all transcripts belonging to the same gene.

For STAR-RSEM workflow, transcript-level and gene-level counts are collected from RSEM output rsem.genes.results and rsem.isoforms.results, respectively.

Steps to re-produce data curation

  1. cd data-raw
  2. Download all necessary raw data files.
  3. Set the environment variable DBDIR to point to the path containing said files. It is assumed that files are organized into directories corresponding to workflow, e.g.
├── kallisto
│   ├── feature_attributes.tsv
│   ├── matrix.est_counts.RDS
│   ├── matrix.gene.est_counts.RDS
│   ├── matrix.gene.tpm.RDS
│   └── matrix.tpm.RDS
├── PRJNA239561_metadata_cleaned.tsv
├── star-rsem
│   ├── feature_attrs.rsem.transcripts.tsv
│   ├── matrix.gene.expected_count.RDS
│   ├── matrix.gene.tpm.RDS
│   ├── matrix.transcripts.expected_count.RDS
│   ├── matrix.transcripts.tpm.RDS
│   └── starLog.final.tsv
└── subread
    ├── feature_attrs.featureCounts.genes.tsv
    ├── featureCounts-summary.genes.tsv
    ├── featureCounts-summary.transcripts.tsv
    └── matrix.gene.featureCounts.RDS
  1. Run the R notebook make-data-package.Rmd to assemble parts into ExpressionSet objects.

You may need to change some code chunk setting from eval=FALSE to eval=TRUE to make sure all chunks would be run. These chunks are disabled by default to avoid overwriting existing data files in the folder.

About

Data package for Rat liver RNA-seq data from SEQC Toxicogenomics project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published