Skip to content

marbl/MetagenomeScope

Repository files navigation

MetagenomeScope

MgSc Python CI MgSc JavaScript CI Code Coverage

Screenshot of MetagenomeScope's standard mode, showing an example assembly graph from Nijkamp et al. 2013

(Assembly graph based on Fig. 2(a) in Nijkamp et al. 2013.)

NOTE: MetagenomeScope is currently being refactored!

Some features that were previously in MetagenomeScope are not currently re-implemented yet -- this should be changed soon. Thanks for bearing with me as I work on improving this, and please let me know if you have any questions.

Summary

MetagenomeScope is an interactive visualization tool designed for metagenomic sequence assembly graphs. The tool aims to display a hierarchical layout of the input graph while emphasizing the presence of small-scale details that can correspond to interesting biological features in the data.

To this end, MetagenomeScope highlights certain "structural patterns" of contigs in the graph (repeating the pattern identification hierarchically), splits the graph into its connected components (by default only displaying one connected component at a time), and uses Graphviz' dot tool to hierarchically lay out each connected component of the graph.

MetagenomeScope also contains many other features intended to simplify exploratory analysis of assembly graphs, including tools for scaffold visualization, path finishing, and coloring nodes by biological metadata (e.g. GC content). (As mentioned above, many of these features are not available in the current version yet.)

Quick installation and usage

Probably the easiest way to install MetagenomeScope is using a conda environment:

# Download the YAML file describing the conda packages we'll install
wget https://raw.githubusercontent.com/marbl/MetagenomeScope/main/environment.yml

# Create a new conda environment based on this YAML file
# (by default, it'll be named "mgsc")
conda env create -f environment.yml

# Activate this conda environment
conda activate mgsc

# Install the actual MetagenomeScope software
pip install git+https://github.com/marbl/MetagenomeScope.git

Assuming you are currently in the conda environment we just created, visualizing an assembly graph can be done in one command:

mgsc -i [path to your assembly graph] -o [output directory name]

The output directory will contain an index.html file that can be opened in most modern web browsers. (The file points to other resources within the directory, so please don't move it out of the directory.)

What types of assembly graphs can I use as input?

Currently, MetagenomeScope supports the following filetypes:

Filetype Assemblers that output this filetype Notes
GFA (meta)Flye, LJA, more Both v1 and v2 work, but currently only the raw structure (segments and links) are included
FASTG SPAdes Expects SPAdes-"dialect" FASTG files: see pyfastg's documentation for details
GML MetaCarvel Expects MetaCarvel-"dialect" GML files
LastGraph Velvet Only the raw structure (nodes and arcs) are included

Code structure

MetagenomeScope is composed of two main components:

1. Preprocessing script

MetagenomeScope's preprocessing script (contained in the metagenomescope/ directory of this repository) is a mostly-Python script that takes as input an assembly graph file and produces a directory containing a HTML visualization of the graph. Once installed, it can be run from the command line using the mgsc command.

Note. By default, connected components containing 8,000 or more nodes or edges will not be laid out. These thresholds are configurable using the --max-node-count / --max-edge-count parameters. This default is intended to save time and effort: hierarchical layout can take a really long time for complex and/or large connected components, so oftentimes trying to visualize the largest few components of a graph will take an intractable amount of computational resources / time. Furthermore, really complex components of assembly graphs can be hard to visualize meaningfully.

This isn't always the case (for example, a connected component containing 10,000 nodes all in a straight line will be much easier to lay out and visualize than a connected component with 5,000 nodes and 20,000 edges), but we wanted to be conservative with the defaults.

2. Viewer interface

MetagenomeScope's viewer interface (contained in the metagenomescope/support_files/ directory of this repository) is a client-side web application that visualizes laid-out assembly graphs using Cytoscape.js.

This interface includes various features for interacting with the graph and the identified structural patterns within it.

You should be able to load visualizations created by MetagenomeScope in most modern web browsers (mobile browsers probably will also work, although using a desktop browser is recommended).

Installation notes

Getting Graphviz and PyGraphviz installed -- and getting them to communicate with each other -- can be tricky. I'm looking into ways of making this less painful; for now, if you run into problems, please feel free to contact me and I'll try to help out.

Demos

Some early demos are available online. We'll probably add more of these in the future.

More thorough documentation

Coming soon.

License

MetagenomeScope is licensed under the GNU GPL, version 3.

License information for MetagenomeScope's dependencies is included in the root directory of this repository, in DEPENDENCY_LICENSES.txt. License copies for dependencies distributed/linked with MetagenomeScope -- when not included with their corresponding source code -- are available in the dependency_licenses/ directory.

Acknowledgements

See the acknowledgements page on the wiki for a list of acknowledgements for MetagenomeScope's codebase.

Contact

MetagenomeScope was created by members of the Pop Lab in the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park.

Feel free to email mfedarko (at) ucsd (dot) edu with any questions, suggestions, comments, concerns, etc. regarding the tool. You can also open an issue in this repository, if you'd like.