MoMI-G

Modular Multi-scale Integrated Genome Graph Browser

A genome graph browser for visualization of structural variants (SVs) as a variation graph:

Long-read sequencing allows for more sensitive and accurate discovery of SVs, which requires to filter and validate thousands of candidates. However, no existing genome browser meets the demand for cancer genomics. Because most visualization tools show an only range of a linear sequence, making them unsuitable for large (over megabase) or nested SVs. Also, no existing genome browsers allow users to simultaneously inspect the read alignments that span the reference allele and an alternative allele caused by a heterozygous SV. Here we present MoMI-G, a graph genome browser to visualize SVs on the variation graph, that provides a graph-based view that displays a genome with branches and alignments on them. Users can filter, visualize with genomic annotations, and inspect SVs with read alignments.

Demo

Youtube Movies: Short Demo, Short Demo 2, Long Demo

Demo Page: CHM1 Demo

Citation

Yokoyama, T.T., Sakamoto, Y., Seki, M., Suzuki, Y., Kasahara, M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics 20, 548 (2019) doi:10.1186/s12859-019-3145-2.

Installation

Manual: here.

Dependencies

Node.js
npm or yarn
- Docker (optional)

Summary of set up

$ git clone https://github.com/MoMI-G/MoMI-G
$ cd MoMI-G
$ yarn
$ yarn start

And access to http://localhost:3000/. The demo shows CHM1, a human hydatidiform mole cell line dataset from backend server of MoMI-G that MoMI-G developer serves.

Run on your workstation

If you want to run the backend server on your laptop, modify package.json and run docker container which includes backend server and CHM1 chr21 with simulated reads dataset.

$ sed -e "s/\"target/\"target_/g"  -e "s/\_target/target/g" -i.bak package.json
$ yarn start
$ docker run --init -p 8081:8081 momigteam/momig-backend # Run it on another shell. It takes a little long time -- please wait.

You can start docker container by docker-compose up instead of docker run. The source code of the backend server is here.

Features

MoMI-G

allows visualization of (possibly distant) multiple intervals.
displays SVs that span multiple intervals.
displays SVs at varying scales, i.e., chromosome, gene, and nucleotide scales.
allows users to manually inspect hundreds of SVs.

Usage

Select structural variants

Manual screening for SV candidates is still important because of high false positive rate of SV calling, but it is time-consuming so that SVs should be filtered by certain criteria for manual screening. There are two view components to select SVs from candidates. The first view components is Circos, which aligns chromosomes as a circular layout. Thick arcs are chromosomes, and a line connecting to thick arcs indicates a SV. Because Circos enables us to choose and rotate chromosomes, we can focus on the subset of chromosomes. Circos is a suitable for visualizing the distribution of SVs and inter-chromosomal variations as lines. Second, Feature Table shows chromosome names, coordinates, and strand, enabling us to select SVs. We can sort and filter Feature Table by SV type, coordinates, or gene name. We can select a SV from both components.

Interval Card Deck

After you select SVs using Feature Table or Circos Plot, the listed variants are stacked on Interval Card Deck at the bottom of the window. In Interval Card Deck, intervals are displayed as cards, and the interval of the top (leftmost) card of the deck is shown on SequenceTubeMap. Each card can be dragged, and the order of cards can be changed. If one double-clicks on a card, the card moves to the top of the deck. Also, a card can be locked to avoid unintended modification or disposal, and the gene name can be input with autocompletion for specifying the interval of a card.

Visualize as genome graphs

We integrate SequenceTubeMap into MoMI-G with modifying the original implementation so that it can visualize a variation graph converted from SVs for showing the difference between a reference genome and a personal genome. Because there are many types of paths in variation graph, we categorized them for assigning different design as follows:

Chromosome Path (thick, grayscale): A chromosome path is a chromosome in a reference genome; walking on the path provides us a full nucleotide sequence.
Variant Path (colored): A variant path with a variation name represents a personal genome.
Gene Path (thin, colored): A gene path is for gene annotation. Exons are shown as a path with a darker color, and introns are shown as that with a lighter color.
Annotation Path (thin, grayscale): An annotation path is for user-definable bigBed annotations such as repeats. If you have a GFF3 or BED file, you can easily convert into bigBed file.
Read Alignment (thin, grayscale): Read alignments aligned on the graph or lifted over from the original alignments to a reference genome.

A sequence graph as a bi-directed graph composed of a set of multiple DNA sequences as nodes and the corresponding end-to-end connections as edges has the ability to describe duplications, indels and inversions as loops or branching and merging against a reference genome.

Annotations

Gene annotations retrieved from Ensembl or names of the region described in bigbed format are listed on Annotation Table.

Dataset

The demo includes the genome of the CHM1 dataset. CHM1 is a human hydatidiform mole cell line dataset so that we can assume all SVs are homozygous. They identified insertions, deletions, and inversions against hg19, so we converted from the list of SVs into a variation graph with our custom scripts. Alternatively, you can employ your own dataset with the following procedure.

Adapt your own dataset

If you want to run MoMI-G with your own dataset, use our custom scripts scripts/vcf2xg.sh to generate pcf and xg dataset. It requires VG, ruby, bash, and samtools.

Software	Dataset	Supported SV type
Sniffles	for PacBio/Nanopore	INV, DEL, TRA, DUP, (INS)
10X LongRanger	for 10X	INV, DEL, TRA, DUP
SURVIVOR	for merging SV calls	INV, DEL, TRA, DUP

$ bash vcf2xg.sh test.vcf test_output /bin/vg hg[19|38]

You can use singularity instead.

$ singularity -s run docker://momigteam/momig-tools:latest test.vcf test_output vg hg38

After that, these files are required to be mounted on "static/" folder.

static/
- config.yaml: a configuration file
- *.xg: an index of variation graph, generated by vg
- *.pcf: pair of cooridnate format: required to display variants on Feature Table or Circos Plot.
- *.gam(optional): read alignments on the graph
- *.gam.index(optional): index of gam

$ docker run --init -p 8081:8081 -v `pwd`/static:/vg/static momigteam/momig-backend
$ yarn start

Change log

2017.08.09 Ver 0.1 (alpha)
2017.09.26 Ver 0.2 (deploy CHM1 graph)
2017.12.12 Ver 0.3 (beta)
2017.12.19 Ver 0.4 (add gene annotations)
2018.01.02 Ver 0.5 (add English descriptions)
2018.01.11 Ver 0.6 (fix design)
2018.01.28 Ver 1.0RC
2018.02.10 Ver 1.0
2019.01.06 Ver 1.0 Rev.1

Acknowledgements

This work was supported in part by Information-technology Promotion Agency, Japan (IPA), Exploratory IT Human Resources Project (The MITOU Program) in the fiscal year 2017 and in part by JSPS KAKENHI (Grant Number, 16H06279).

References

Chaisson, M. J. P., Huddleston, J., Dennis, M. Y., Sudmant, P. H., Malig, M., Hormozdiari, F., … Eichler, E. E. (2015). Resolving the complexity of the human genome using single-molecule sequencing. Nature, 517(7536), 608–611. http://doi.org/10.1038/nature13907
Garrison, E., Novak, A., Hickey, G., Eizenga, J., Dawson, E., Jones, W., … Durbin, R. (2017). Sequence variation aware references and read mapping with vg : the variation graph toolkit. bioRxiv, 1–27. http://doi.org/10.1101/234856
Beyer, W. SequenceTubeMap, https://github.com/vgteam/sequencetubemap
Vanderkam, D., Aksoy, B. A., Hodes, I., Perrone, J., & Hammerbacher, J. (2016). pileup.js: a JavaScript library for interactive and in-browser visualization of genomic data. Bioinformatics, 32(March), btw167. http://doi.org/10.1093/bioinformatics/btw167
Paten, B., Novak, A. M., Garrison, E., & Hickey, G. (2017). Superbubbles, Ultrabubbles and Cacti. bioRxiv, 1–13. http://doi.org/10.1101/101493
Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., … Hubbard, T. J. (2012). GENCODE: The reference human genome annotation for the ENCODE project. Genome Research, 22(9), 1760–1774. https://doi.org/10.1101/gr.135350.111

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 382 Commits
.github/workflows		.github/workflows
.idea		.idea
docs		docs
public		public
scripts		scripts
src		src
typings		typings
.babelrc		.babelrc
.dockerignore		.dockerignore
.gitignore		.gitignore
DISCLAIMER		DISCLAIMER
Dockerfile		Dockerfile
Dockerfile.backend		Dockerfile.backend
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
_config.yml		_config.yml
config-overrides.js		config-overrides.js
docker-compose.yml		docker-compose.yml
frontend.pu		frontend.pu
h2o.conf		h2o.conf
images		images
package.json		package.json
postcss.config.js		postcss.config.js
run_server.sh		run_server.sh
tsconfig.json		tsconfig.json
tsconfig.prod.json		tsconfig.prod.json
tslint.json		tslint.json
webpack.config.js		webpack.config.js
webpack.config.prod.js		webpack.config.prod.js
webpack.config.server.js		webpack.config.server.js
yarn.lock		yarn.lock

License

MoMI-G/MoMI-G

Folders and files

Latest commit

History

Repository files navigation

MoMI-G

Modular Multi-scale Integrated Genome Graph Browser

Demo

Citation

Installation

Dependencies

Summary of set up

Run on your workstation

Features

Usage

Select structural variants

Interval Card Deck

Visualize as genome graphs

Annotations

Dataset

Adapt your own dataset

Change log

Acknowledgements

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages