Skip to content
Dennis Schmitz edited this page Nov 20, 2019 · 14 revisions

Jovian wiki

Table of contents

Jovian description

The pipeline automatically processes raw Illumina NGS data from human clinical matrices (faeces, serum, etc.) into clinically relevant information such as taxonomic classification, viral typing and minority variant identification (quasispecies). Wetlab personnel can start, configure and interpret results via an interactive web-report. This makes doing metagenomics analyses much more accessible and user-friendly since minimal command-line skills are required.

Features

  • Data quality control (QC) and cleaning.
    • Including library fragment length analysis, useful for sample preparation QC.
  • Removal of human* data (patient privacy). *You can use whichever reference you would like. However, Jovian is intended for human clinical samples.
  • Assembly of short reads into bigger scaffolds (often full viral genomes).
  • Taxonomic classification:
    • Every nucleic acid containing biological entity (i.e. not only viruses) is determined up to species level.
    • Lowest Common Ancestor (LCA) analysis is performed to move ambiguous results up to their last common ancestor, which makes results more robust.
  • Viral typing:
    • Several viral families and genera can be taxonomically labelled at the sub-species level as described here.
  • Viral scaffolds are cross-referenced against the Virus-Host interaction database and NCBI host database.
  • Scaffolds are annotated with great detail:
    • Depth of coverage.
    • GC content.
    • Open reading frames (ORFs) are predicted.
    • Minority variants (quasispecies) are identified.
  • Importantly, results of all processes listed above are presented via an interactive web-report including an audit trail.

Visualizations

All data are visualized via an interactive web-report, as shown here, which includes:

  • A collation of interactive QC graphs via MultiQC.
  • Taxonomic results are presented on three levels:
    • For an entire (multi sample) run, interactive heatmaps are made for non-phage viruses, phages and bacteria. They are stratified to different taxonomic levels.
    • For a sample level overview, Krona interactive taxonomic piecharts are generated.
    • For more detailed analyses, interactive tables are included. Similar to popular spreadsheet applications (e.g. Microsoft Excel).
      • Classified scaffolds
      • Unclassified scaffolds (i.e. "Dark Matter")
  • Virus typing results are presented via interactive spreadsheet-like tables.
  • An interactive scaffold alignment viewer (IGVjs) is included, containing:
    • Detailed alignment information.
    • Depth of coverage graph.
    • GC content graph.
    • Predicted open reading frames (ORFs).
    • Identified minority variants (quasispecies).
  • All SNP metrics are presented via interactive spreadsheet-like tables, allowing detailed analysis.

Virus typing

After a Jovian analysis is finished you can perform virus-typing (i.e. sub-species level taxonomic labelling). These analyses can be started by the command bash jovian -vt [virus keyword], where [virus keyword] can be:

Keyword Taxon used for scaffold selection Notable virus species
NoV Caliciviridae Norovirus GI and GII, Sapovirus
EV Picornaviridae Enteroviruses (Coxsackie, Polio, Rhino, etc.), Parecho, Aichi, Hepatitis A
RVA Rotavirus A Rotavirus A
HAV Hepatovirus A Hepatitis A
HEV Orthohepevirus A Hepatitis E
PV Papillomaviridae Human Papillomavirus
Flavi Flaviviridae Dengue (work in progress)

Audit trail

An audit trail, used for clinical reproducability and logging, is generated and contains:

  • A unique methodological fingerprint of the code is generated and accessible via GitHub: allowing to exactly reproduce the analysis, even retrospectively by reverting to old versions of the pipeline code.
  • The following information is also logged:
    • Database timestamps
    • (user-specified) Pipeline parameters

However, it has limitations since several things are out-of-scope for Jovian to control:

  • The virus typing-tools version
    • Currently we depend on a public web-tool hosted by the RIVM. These are developed in close collaboration with - but independently of - Jovian. A versioning system for the virus typing-tools is being worked on, however, this is not trivial and will take some time.
  • Input files and metadata
    • We only save the names and location of input files at the time the analysis was performed. Long-term storage of the data, and documenting their location over time, is the responsibility of the end-user. Likewise, the end-user is responsible for storing datasets with their correct metadata (e.g. clinical information, database versions, etc.). We recommend using iRODS for this as described by Nieroda et al. 2019. While we acknowledge that database versions are vital to replicate results, the databases Jovian uses have no official versioning, hence why we include timestamps only.

Jovian_rulegraph.png