Anvi'o backend for 'genome view' #1712

meren · 2021-04-12T00:46:06Z

IF THIS NOTE IS HERE, DO NOT MERGE THIS BRANCH. IT IS BROKEN AND WILL RUIN THE ACTIVE DEVELOPMENT BRANCH.

This PR introduces some preliminary backend functionality in anvi'o for 'genome view', a LOOOOONG-waited anvi'o functionality to interactively study large genomic contexts.

@isaacfink21 and @matthewlawrenceklein are already working on the frontend of genome view using some mock data, and the code in this branch will help them test things using real-world data, and figure out what would they like the backend to do for them when it comes to 'massaging' the data structures to their liking.

The most critical class here is AggregateGenomes in anvio/genomedescriptions.py module. For a given set of external and/or internal genomes, the purpose of this class is to aggregate all sorts of information, which is then passed to the interactive world through bottle routes.

When there is an anvi'o pan database that includes all genome names found in internal and/or external genomes files, AggregateGenomes also utilizes gene clusters found in that database to pass it to the interface so genes can be associated with one another.

The class is simple in its design, and will have room for expansion based on our needs. I hope it makes sense so far.

Testing

The purpose of these examples is to make sure you can play with this code to connect its products with the frontend.

Cartoonishly Simple

Download this file and unpack, and run this command in the resulting directory:

anvi-display-genomes -e external-genomes.txt --pan-db PAN.db

FYI, This is how I created this file:

anvi-self-test --suite pangenomics -o PAN
mkdir GENOME_VIEW_TEST_FILES
cp PAN/pan_test/0*db PAN/pan_test/external-genomes.txt GENOME_VIEW_TEST_FILES/
cp PAN/pan_test/TEST/TEST-PAN.db GENOME_VIEW_TEST_FILES/PAN.db
tar -zcvf GENOME_VIEW_TEST_FILES.tar.gz GENOME_VIEW_TEST_FILES/

Somewhat Realistic

Run these steps:

# download a fresh copy of the infant gut data
curl -L https://ndownloader.figshare.com/files/26218961 -o INFANT-GUT-TUTORIAL.tar.gz
tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL

# subset E. faecalis genomes (instead of two distinct species) to simplify the problem
head -n 1 additional-files/pangenomics/external-genomes.txt > additional-files/pangenomics/Enterococcus_faecalis.txt
grep Enterococcus_faecalis additional-files/pangenomics/external-genomes.txt >> additional-files/pangenomics/Enterococcus_faecalis.txt

# generate a pangenome for E. faecalis (should take <5 mins)
anvi-gen-genomes-storage -e additional-files/pangenomics/Enterococcus_faecalis.txt -o Enterococcus-GENOMES.db
anvi-pan-genome -g Enterococcus-GENOMES.db --project-name Enterococcus -T 4

Now you can run this to get genome view data generated for 6 genomes that in the same 'species' WITHOUT the pangenome:

anvi-display-genomes -e additional-files/pangenomics/Enterococcus_faecalis.txt

and WITH the pangenome:

anvi-display-genomes -e additional-files/pangenomics/Enterococcus_faecalis.txt -p Enterococcus/Enterococcus-PAN.db

Upon which this is what you should find in your JavaScript console:

Next steps

Fill in the following two files:

anvio/data/interactive/genomeview.html
anvio/data/interactive/js/genomeview.js

:)

isaacfink21 · 2021-04-14T04:46:45Z

Thanks @meren!

While converting from test data to the new real data from external genomes, I realized it might be beneficial to not only store gene IDs for each gene cluster, but also have a data structure that maps individual gene IDs to gene clusters. For example:

{g01: {0:"GC_00000001", 1:"GC_00000008"}, g02: {0:"GC_00000005", 1:"GC_00000001"}}

This way we wouldn't have to apply a find operation to the gene_associations dataset for each individual gene ID. Let me know if you think this is worth adding, or if it would be better to keep the data simpler :)

meren · 2021-04-14T09:28:56Z

Absolutely! I will add this ASAP. :)

meren · 2021-04-14T14:08:19Z

This is now done, @isaacfink21. Please note that the new data structure is slightly different.

I hope this helps.

meren · 2021-04-14T14:09:04Z

As you can see from 530fe85 it took that much effort :) anvi'o has everything ready at all times! :p

isaacfink21 · 2021-04-14T16:42:14Z

Thanks! This was much easier than I expected :) I will work on using this to fix gene cluster alignment.

For future reference, I'm also attaching the screencaps from my meeting with @meren and @matthewlawrenceklein that illustrate some of the new features we plan to implement going forward. Some of these include:

3 separate windows for scale, genome labels, and the genomes themselves
Shaded background between genes of the same gene cluster
Toggleable scale "rulers" over each genome
Align by gene cluster when there are 2+ genes in a cluster; align by other properties
Show similarity between genomes as % identity
Show a "graph" on each genome with GC content or other info
Editable gene labels

matthewlawrenceklein · 2021-05-24T15:04:49Z

@meren @isaacfink21 Not sure if this would be a major pain point on the backend or with what Isaac's already written, but would it be possible to change the genomes payload from a nested object to an array of objects? I believe that would make it a lot easier to sort genomes in the display (alphabetically, click + drag, etc).

There's also a very real chance I'm misunderstanding the data, so feel free to correct me : )

meren · 2021-05-24T18:12:32Z

Hey @matthewlawrenceklein, do you think it is possible to 'arrayify' the data when it arrives? or is it a bad idea to do it that way?

matthewlawrenceklein · 2021-05-24T18:26:12Z

@meren totally doable on the front end. My only concern is that we'd want to make that change directly after the fetch req so that all front-end processes use the same array-ified genome dataset. I'll touch base with @isaacfink21 tomorrow and make sure that this change doesn't require a ton of refactoring.

Thinking a little further out - we would want (need?) to save these kind of front-end manipulations to state, correct?

meren · 2021-05-24T18:31:40Z

Thinking a little further out - we would want (need?) to save these kind of front-end manipulations to state, correct?

It is always great to think a little further out when it comes to these kinds of decisions! Thank you :)

Just like the other interactive interfaces, I think we need to have a state framework for genome view, too (so people can 'zoom' to a certain area, order their genomes in a particular way, and then if they would store that state, the same interface could greet them when they restart things the next day :)).

isaacfink21 · 2021-05-24T18:58:25Z

@matthewlawrenceklein Just reviewed the code and it shouldn't be a problem for the genomes object - so far it is always iterated through in order, and I can't think of a situation where direct key-value access would be necessary. I think the array format would be helpful :)

isaacfink21

Didn't mean to force-push here, sorry about that--I restored the previous commit since there were some recent changes I accidentally removed

meren · 2021-08-27T16:03:07Z

lots of activity here. I'm hoping to continue working on the backend next week.

opting to disable all genome dragging and proportional scale functionality for v0. from this point forward, the `percentScale` flag should always be set to false

isaacfink21 · 2023-03-17T21:41:26Z

@matthewlawrenceklein, @mschecht, and I decided in our last meeting to disable genome dragging and proportional scale in v0 and keep this functionality under the feature flag percentScale. Starting with 61394b9, percentScale should always be set to false.

To revisit this in the future, the genome ruler, background shades, and ADLs will need to be made selectable again, and the object:moving event listener reenabled. As it stands, the proportional scale is mostly functional across the genome view interface but has several bugs with displaying the correct viewport upon selecting a region of the scale (scaleFactor is being calculated incorrectly), and bookmarks do not work with a proportional scale.

…ve colors to state

This reverts commit c044eb8.

Genome View: Genome Sliding and Gene Centering

…ng centering

isaacfink21 force-pushed the genomeview-backend branch from a5867c9 to 111e877 Compare July 6, 2021 20:17

isaacfink21 reviewed Jul 6, 2021

View reviewed changes

isaacfink21 added 10 commits March 13, 2023 00:52

load correct viewport frame for proportional scale

95e8c50

port all relevant global variables from main.js into state

eb47bde

update

732db04

fix showAllADLPtsPerLayer()

cf72057

Merge branch 'master' into genomeview-backend

7ea6821

set viewport correctly for proportional scale

e59435c

save current function color db to state

449de21

cleanup

3d2d6ac

trigger redraw immediately after using color pickers

4ad4906

feature flag proportional scale & genome dragging functionality

61394b9

opting to disable all genome dragging and proportional scale functionality for v0. from this point forward, the `percentScale` flag should always be set to false

isaacfink21 added 6 commits March 18, 2023 08:42

fix for background shades

18bc6aa

housekeeping

1706c27

Merge branch 'master' into genomeview-backend

b2ada7d

calculate canvas height correctly

2796bcd

clean

f41c4bd

deepdive and tabular modal color pickers show real color of genes; sa…

c548e9d

…ve colors to state

isaacfink21 and others added 30 commits May 18, 2023 00:30

Merge branch 'genomeview-backend' into genomeview/genome-sliding

8b88b9a

combine mouse:down listeners

a288be0

disable shift + click multiselection

f008bb5

slide genomes on alt+click and drag

abcfdd2

fix alt+click dragging + consolidate keydown listeners into one

58f4dfb

clean

e8b0661

shift+alt drag bug fix

85a992c

clean

f65472a

clamp moveTo to genome bounds

c044eb8

Revert "clamp moveTo to genome bounds"

ce710d2

This reverts commit c044eb8.

search accession AND annotation for alignment

4738af3

load in maxGroupSize

cb0ecc8

update query+alignment UI

152ec31

wrap text for genome name in mouseover panel

7f477fb

calculate genomeMax as end of contig not end of last gene

fb1c66f

keep zoom level the same when gene centering

c4e9688

fix shift+alt bug

6dd1104

Merge pull request #2078 from merenlab/genomeview/genome-sliding

bca7b3b

Genome View: Genome Sliding and Gene Centering

fix alt+drag duplication bug

5211f17

warning message if multiple genes per genome

5dcbe8c

final gv fix: lasso menu

6464fd6

final gv fix: query/center by accession _then_ annotation

44b5a78

final gv fix: set label cosmetics

8938071

final gv fix: allocate colors in table by order of count

f3e0f2f

final gv fix: comma-separated annotations

a18208d

final gv fix: sliding

7f31c24

final gv fix: more specific

6655cba

final gv fix: allow user to select anchor gene for multiple hits duri…

12365a7

…ng centering

final gv fix: option to always center to first hit

8d7e514

final gv fix: cosmetics

78113e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anvi'o backend for 'genome view' #1712

Anvi'o backend for 'genome view' #1712

meren commented Apr 12, 2021 •

edited

isaacfink21 commented Apr 14, 2021

meren commented Apr 14, 2021

meren commented Apr 14, 2021

meren commented Apr 14, 2021

isaacfink21 commented Apr 14, 2021 •

edited

matthewlawrenceklein commented May 24, 2021

meren commented May 24, 2021

matthewlawrenceklein commented May 24, 2021

meren commented May 24, 2021

isaacfink21 commented May 24, 2021

isaacfink21 left a comment

meren commented Aug 27, 2021

isaacfink21 commented Mar 17, 2023

Anvi'o backend for 'genome view' #1712

Are you sure you want to change the base?

Anvi'o backend for 'genome view' #1712

Conversation

meren commented Apr 12, 2021 • edited

Testing

Cartoonishly Simple

Somewhat Realistic

Next steps

isaacfink21 commented Apr 14, 2021

meren commented Apr 14, 2021

meren commented Apr 14, 2021

meren commented Apr 14, 2021

isaacfink21 commented Apr 14, 2021 • edited

matthewlawrenceklein commented May 24, 2021

meren commented May 24, 2021

matthewlawrenceklein commented May 24, 2021

meren commented May 24, 2021

isaacfink21 commented May 24, 2021

isaacfink21 left a comment

Choose a reason for hiding this comment

meren commented Aug 27, 2021

isaacfink21 commented Mar 17, 2023

meren commented Apr 12, 2021 •

edited

isaacfink21 commented Apr 14, 2021 •

edited