Skip to content

wtsi-hgi/MAVEQC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAVEQC

A R package of quality control on SGE data

Table of Contents

[Show or Hide]
  1. Dependencies
  2. Installation
  3. File Format
  4. Import Data
  5. Plasmid QC
  6. Screen QC
  7. Others

Dependencies

install.packages("configr")
install.packages("vroom")
install.packages("data.table")
install.packages("Ckmeans.1d.dp")
install.packages("gplots")
install.packages("ggplot2")
install.packages("plotly")
install.packages("ggcorrplot")
install.packages("corrplot")
install.packages("see")
install.packages("ggbeeswarm")
install.packages("reactable")
install.packages("reshape2")
install.packages("htmltools")
install.packages("sparkline")
install.packages("dendextend")
install.packages("gtools")

install.packages("BiocManager")
BiocManager::install("DESeq2")
BiocManager::install("DEGreport")
BiocManager::install("apeglm")

Load dependencies if required

library(configr)
library(vroom)
library(data.table)
library(Ckmeans.1d.dp)
library(gplots)
library(ggplot2)
library(plotly)
library(ggcorrplot)
library(corrplot)
library(see)
library(ggbeeswarm)
library(reactable)
library(htmltools)
library(sparkline)
library(dendextend)
library(reshape2)
library(gtools)

library(DESeq2)
library(DEGreport)
library(apeglm)

(TOP)

Installation

Install from github

install.packages("devtools")

library(devtools)
install_github("wtsi-hgi/MAVEQC")

Or

Install from the compiled source file

install.packages("/path/of/MAVEQC.tar.gz", type = "source")

(TOP)

File Format

sample sheet -- tsv

sample_name replicate condition ref_time_point library_independent_count library_dependent_count valiant_meta vep_anno adapt5 adapt3 per_r1_adaptor per_r2_adaptor library_name library_type
sample1 R1 D4 D4 s1.allcounts.tsv.gz s1.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.21 0.10 libA screen
sample2 R2 D4 D4 s2.allcounts.tsv.gz s2.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.11 0.02 libA screen
sample3 R3 D4 D4 s3.allcounts.tsv.gz s3.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.01 0.18 libA screen
sample4 R1 D7 D4 s4.allcounts.tsv.gz s4.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.21 0.10 libA screen
sample5 R2 D7 D4 s5.allcounts.tsv.gz s5.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.11 0.02 libA screen
sample6 R3 D7 D4 s6.allcounts.tsv.gz s6.libcounts.tsv.gz meta.csv.gz meta_consequences.tsv.gz CTGACTGGCACCTCTTCCCCCAGGA CCCCGACCCCTCCCCAGCGTGAATG 0.01 0.18 libA screen
  • please use the same headers in the example
  • replicate, condition and ref_time_point are optional, but required for screen qc
  • adapt5 and adapt3 are optional, please leave them blank if you don't have them, but required for reads with primers
  • vep_anno, library_name and library_type are not necessary, leave them blank if not available

library dependent counts -- tsv or tsv.gz

ID NAME SEQUENCE LENGTH COUNT UNIQUE SAMPLE
id1 name1 ACTTTTCT 276 32 1 sample1
id2 name2 ATCTTTCT 275 132 0 sample1
id3 name3 ATTCTTCT 275 2 1 sample1
  • please use the same headers in the example
  • please make sure library dependent sequences match with valiant meta file
  • please refer to pyQUEST

library independent counts -- tsv or tsv.gz

SEQUENCE LENGTH COUNT
ACTTTTCT 276 32
ATCTTTCT 275 132
ATTCTTCT 275 2
  • please use the same headers in the example
  • please refer to pyQUEST

valiant meta file

Please use the VaLiAnT output file, refer to VaLiAnT

vep annotation file

Please use one to one mapping file

(TOP)

Import Data

Import a group of samples from a directory

All the files are in the same directory including library dependent counts, library independent counts, valiant meta csv, vep annotation and the sample sheet.

library(MAVEQC)

sge_objs <- import_sge_files("/path/to/input/directory", "sample_sheet.tsv")

(TOP)

Plasmid QC

Test datasets are not available now, will add them soon

QC 1: Sample QC

output_dir <- "/path/to/output/directory"

samqc <- create_sampleqc_object(sge_objs)
samqc <- run_sample_qc(samqc, "plasmid")

qcplot_samqc_all(samqc, qc_type = "plasmid", plot_dir = output_dir)
qcout_samqc_all(samqc, qc_type = "plasmid", out_dir = output_dir)

(TOP)

Report

This creates a html report concatenating all the results including figures and tables. Please make sure you have generated all the figures and tables, otherwise the report may be incomplete.

create_qc_reports("/path/to/sample/sheet", "plasmid", output_dir)

(TOP)

Screen QC

QC 1: Sample QC

Reference samples must be assigned. MAVEQC automatically creates reference samples (maveqc_ref_time_point and maveqc_ref_time_point_samples) from the sample sheet using ref_time_point and sampe_name.

output_dir <- "/path/to/output/directory"

samqc <- create_sampleqc_object(sge_objs)
samqc <- run_sample_qc(samqc, "screen")

qcplot_samqc_all(samqc, qc_type = "screen", plot_dir = output_dir)
qcout_samqc_all(samqc, qc_type = "screen", out_dir = output_dir)

(TOP)

QC 2: Experimental QC

MAVEQC automatically creates the coldata (maveqc_deseq_coldata) from sample sheet for Screen QC.

coldata example:

sample_name replicate condition
hgsm3_d4_r1 R1 D4
hgsm3_d7_r1 R1 D7
hgsm3_d15_r1 R1 D15
hgsm3_d4_r2 R2 D4
hgsm3_d7_r2 R2 D7
hgsm3_d15_r2 R2 D15
hgsm3_d4_r3 R3 D4
hgsm3_d7_r3 R3 D7
hgsm3_d15_r3 R3 D15
expqc <- create_experimentqc_object(samqc) 
expqc <- run_experiment_qc(expqc) 

qcplot_expqc_all(expqc, plot_dir = output_dir)
qcout_expqc_all(expqc, out_dir = output_dir)

(TOP)

Report

This creates a html report concatenating all the results including figures and tables. Please make sure you have generated all the figures and tables, otherwise the report may be incomplete.

create_qc_reports("/path/to/sample/sheet", "screen", output_dir)

(TOP)

Others

Pandoc

Pandoc is required to generate the R markdown report. Please download and install it from https://pandoc.org/installing.html

(TOP)

apeglm

the version of apeglm must be >= 1.22.1, optimHess problem in the lower version like below.

Error in optimHess(par = init, fn = nbinomFn, gr = nbinomGr, x = x, y = y,  : 
non-finite value supplied by optim

(TOP)

Conda

When installing DESeq2, it may have error (Rlog1) on Mac M1 chip. Try cmd below to fix it.

export PKG_CPPFLAGS="-DHAVE_WORKING_LOG1P"

(TOP)