Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

HCC1395 WGS Exome RNA Seq Data

Malachi Griffith edited this page May 20, 2015 · 15 revisions

If you wish to use the HCC1395 and HCC1395/BL whole genome (WGS), exome, and/or RNA-seq data you are welcome to do so but please cite the GMS manuscript and website.

Refer to the following ATCC links for details on the breast cancer and matched normal lymphoblastoid cell lines: HCC1395 at ATCC and HCC1395/BL at ATCC cell lines.

All data are 2x100 bp reads generated on an Illumina HiSeq 2000 instrument. The exome data was generated by use of a NimbleGen SeqCap EZ Human Exome Library v3.0 reagent (download annotation bed file here: NimbleGenExome_v3.bed). For precise details on how the libraries were isolated please refer to the GMS manuscript.

While stored in BAM format for efficiency, all files below contain ALL reads. Some will not actually align (percentage will depend on alignment strategy). We recommend using Picard SamToFastq to convert these BAMs back to Fastq format.

Data type File name Link
WGS Normal (lane 1) gerald_D1VCPACXX_6.bam download
WGS Normal (lane 2) gerald_D1VCPACXX_7.bam download
WGS Normal (lane 3) gerald_D1VCPACXX_8.bam download
WGS Tumor (lane 1) gerald_D1VCPACXX_1.bam download
WGS Tumor (lane 2) gerald_D1VCPACXX_2.bam download
WGS Tumor (lane 3) gerald_D1VCPACXX_3.bam download
WGS Tumor (lane 4) gerald_D1VCPACXX_4.bam download
WGS Tumor (lane 5) gerald_D1VCPACXX_5.bam download
Exome Normal (lane 1) gerald_C1TD1ACXX_7_CGATGT.bam download
Exome Tumor (lane 1) gerald_C1TD1ACXX_7_ATCACG.bam download
RNAseq Normal (lane 1) gerald_C2DBEACXX_3.bam download
RNAseq Tumor (lane 1) gerald_C1TD1ACXX_8_ACAGTG.bam download

Various downsampled versions of these BAMs are available here:

Data set Description Link
Full sized 12 Complete WGS, Exome, RNA-seq BAMs download
1/100th BAMs that were evenly downsampled to 1/100th original size download
1/1000th BAMs that were downsampled to 1/1000th and then supplemented with extra coverage for some regions download
Exome only 2 complete Exome BAMs download

After installation of the GMS you should prime the system with one of these data sets using a command like the following (from where you cloned the GMS repository):

./setup/prime-system.pl --data=hcc1395_1tenth_percent --sync=tarball --low_resources --memory=Xgb

To select one of the datasets above set --data to one of: hcc1395, hcc1395_1percent, hcc1395_1tenth_percent, hcc1395_exome_only

Converting from BAM to FASTQ format for other uses of this data

Note: If you just want to use this data and need to create a complete version of the data in FASTQ format, you can do this quite easily using Picard SamToFastq

For example:

java -Xmx2g -jar picard-tools-1.118/SamToFastq.jar INPUT=gerald_D1VCPACXX_6.bam FASTQ=gerald_D1VCPACXX_6_R1.fastq SECOND_END_FASTQ=gerald_D1VCPACXX_6_R2.fastq 

Incidentally it is way more space efficient to store your raw data in BAM format as we do rather than FASTQ. Most tools (even aligners) now take a BAM as input. To convert FASTQ to a BAM of unaligned reads. You can use Picard FastqToSam

java -Xmx2g -jar picard-tools-1.118/FastqToSam.jar FASTQ=gerald_D1VCPACXX_6_R1.fastq FASTQ2=gerald_D1VCPACXX_6_R2.fastq OUTPUT=gerald_D1VCPACXX_6.bam
Clone this wiki locally