Skip to content

ttdtrang/data-rnaseq-lymphoma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data package for B-cell lymphoma RNA-seq data from PRJNA477352

Sources

  • Experimental data were generated by Zhao et al. Original citation: * Zhao X, Ren Y, Lawlor M, Shah BD et al. BCL2 Amplicon Loss and Transcriptional Remodeling Drives ABT-199 Resistance in B Cell Lymphoma Models. Cancer Cell 2019 May 13;35(5):752-766.e9. PMID: 31085176
  • Processing:
    • Sequencing reads were downloaded from SRA, at PRJNA477352
    • Quantification was done by 2 alternative workflows:
      1. Using STAR 2.5.1a to align against the Gencode human genome v27, GRCh38.p10 and 92 ERCC sequences, and RSEM to estimate abundance levels for genes/isoforms.
      2. Similar to (1), but using STAR 2.7.1a
  • Metadata is downloaded from SRA and cleaned up for standard field names. GEO metadata was checked but no extra information was found.

Usage

Install the package, import the library and load the ExpressionSet of interest, for example

devtools::install_github('ttdtrang/data-rnaseq-lymphoma')
data(sarcoma.rnaseq.gene, package='data.rnaseq.lymphoma')
dim(lymphoma.rnaseq.gene.kallisto@assayData$exprs)

The package includes 4 data sets, 2 were processed with STAR_2.5-RSEM workflow, and 2 with STAR_2.7-RSEM workflow.

lymphoma.rnaseq.gene.star_rsem1
lymphoma.rnaseq.transcript.star_rsem1
lymphoma.rnaseq.gene.star_rsem2
lymphoma.rnaseq.transcript.star_rsem2

Steps to re-produce data curation

  1. cd data-raw
  2. Download all necessary raw data files.
  3. Set the environment variable DBDIR to point to the path containing said files. It is assumed that files are organized into directories corresponding to workflow, e.g.
├── GSE116129_family.soft
├── make-data-package.nb.html
├── make-data-package.Rmd
├── parse_geo_metadata.nb.html
├── parse_geo_metadata.Rmd
├── PRJNA477352_metadata_cleaned.tsv
├── star_2.5-rsem
│   ├── feature_attrs.transcripts.tsv
│   ├── matrix.gene.expected_count.RDS
│   ├── matrix.gene.tpm.RDS
│   ├── matrix.transcripts.expected_count.RDS
│   ├── matrix.transcripts.tpm.RDS
│   └── starLog.final.tsv
└── star_2.7-rsem
    ├── feature_attrs.rsem.transcripts.tsv
    ├── matrix.gene.expected_count.RDS
    ├── matrix.gene.tpm.RDS
    ├── matrix.transcripts.expected_count.RDS
    ├── matrix.transcripts.tpm.RDS
    └── starLog.final.tsv
  1. Run the R notebook make-data-package.Rmd to assemble parts into ExpressionSet objects.