Skip to content

ayoisio/rnaseq-nf-google-cloud

Repository files navigation

RNA-Seq and Protein Structure Prediction on GCP

Successful pipeline execution graph

Summary

We have developed an end-to-end pipeline for RNA-Seq and protein structure prediction that utilizes BigQuery and Vertex AI to efficiently handle and process terabyte-scale data. We hope to provide insights into how Google Cloud can be used to tackle computational challenges in modern biology and medicine, ultimately paving the way for new discoveries and innovations.

Data

FASTQ files are sourced from a public NCBI dataset GSE181830.

Workflow

The steps of the RNA-Seq pipeline are:

  1. Adapter and quality trimming with Trim Galore
  2. Quality control readout with FastQC
  3. Estimation of gene and isoform expression with RSEM
  4. Write of gene and isoform expression data to BigQuery.