Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visual clue to the level of zoom on a genome #254

Open
razsultana opened this issue Apr 5, 2024 · 13 comments
Open

Visual clue to the level of zoom on a genome #254

razsultana opened this issue Apr 5, 2024 · 13 comments
Labels
core Related to the Core package enhancement New feature or request good first issue Good for newcomers

Comments

@razsultana
Copy link

I think that a visual clue to the level of zooming on the genome would be very useful, similar with how UCSC does it here (shows a segment below the chromosome ideogram and it labels with its size, in this case 5kb):
Screenshot 2024-04-05 at 11 53 07 AM.

I think that creating an element in the top banner of the genomespy-app (using maybe 1/3 of it and centered), showing the size of a genomic segment closest to 5, 50, 500, 5kb, 50kb, 500kb, 5 Mb, 50 Mb genomic size (or the 1-versions of these), which fits that area, would accomplish this goal very well.

When zooming in and out of an area that has no other elements (genes, chromosomal bands, FASTA sequence, etc), the user only has the coordinates to give a clue at which zoom level the visualisation is located and it takes extra effort and calculations to get that sense of scale. A visual clue provides that in an instant, without any efort.

@tuner tuner added enhancement New feature or request core Related to the Core package good first issue Good for newcomers labels Apr 5, 2024
@tuner
Copy link
Member

tuner commented Apr 5, 2024

Yes, subtracting 123,456,700 from 123,467,800 to figure out the scale is quite burdensome!

The most straightforward way to implement the measure is to provide a new lazy data source similar to axis ticks (source), which would generate a datum that represents a measure centered in the current domain. That could then be used to build the measure by layering some "rule" and "text" marks.

The generated datum could be something like this:

{
  "startChrom": "chr5",
  "startPos": 123465000,
  "endChrom": "chr5",
  "endPos": 123475000,
  "span": "10k",
  "zoomLevel": 123,
  "genomeAssembly": "hg38" // If it's useful to have it visible
}

There are some (minor) issues, however:

  1. In the fully zoomed out view, the measure would span multiple chromosomes, which may not make much sense. On the other hand, maybe it's not a problem.
  2. As the measure's endpoints are expressed as integer genomic coordinates, panning at a very high zoom level (i.e., when single bases are visible) will cause some jitter (or jumping) in the measure.

This should be quite straightforward to implement. I could do this in near future, but now I have to focus on finalizing the revised version of my manuscript. PRs are welcome, of course 🙏.

@tuner
Copy link
Member

tuner commented Apr 5, 2024

In fact, the datum could be even simpler, as the chromosomes/contigs are not really needed. Linearized coordinates (that GenomeSpy uses internally) could be used instead. This would also make the measure applicable to other scales, such "index" or "quantitative". Thus, the datum could like:

{
  "startPos": 1123465000,
  "endPos": 1123475000,
  "span": "10k",
  "zoomLevel": 123,
  "genomeAssembly": "hg38" // If it's useful to have it visible
}

@razsultana
Copy link
Author

I'll give it a go!
I won't let my lack of experience with JavaScript development stop me :)
My PRs will probably be annoying and require a few back-and-forth adjustments, but hopefully it will be a good investment of time on both sides.
Good luck with the paper - I think it deserves a lot of attention to show off the great work that went into this project and to get more users/contributors.

May I ask for a quick word of advice: what tooling (IDE, debugger, linter, etc.) do you use for development?
I've used many over the years for different programming environments, but I'm sure there are options that work best for JavaScript development and I can tell by looking at the code that you have a serious developer's habits and attitude (try to do things "the right way").
At the moment, I am using Visual Studio Code with a few standard extensions, JavaScript Debugger (ms-vscode.js-debug), Javascript Debugger Companion Extension (ms-vscode.js-debug-companion), Babel JavaScript (mgmcdermott.vscode-language-babel), Prettier ESLint (rvest.vs-code-prettier-eslint) , Trunk Check (trunk.io).
If you have any suggestions for useful extensions (there are so many available!), it would be appreciated.

The part that scares me the most about Javascript development are the asynchronous calls and promises - I have never had to deal with that and it looks to me that is the part that requires the most adjustment coming from "classical" synchronous programming. I'm reading about it https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous but I need to experiment with real-life scenarios.

@tuner
Copy link
Member

tuner commented Apr 8, 2024

Awesome, @razsultana!

I sketched a CONTRIBUTING.md file, which should cover most of your concerns. Please let me know if you think something essential is missing.

I'm using plain JavaScript (well, some recent version of EcmaScript) with JSDoc type annotations. Thus, no transpiling (Babel) is needed.

Callbacks and promises may look formidable, but async functions and awaiting makes asynchronous progamming much easier.

@razsultana
Copy link
Author

This is very useful, thank you!
I started an attempt of implementation by following the AxisTicks example, but I'm still trying to wrap my mind around the whole lifecycle of a visualisation, when do the data sources come alive, how do they get updated, etc.
I have also looked at different historical commits before the dynamic data sources were changed to the lazy datasources, in a more generic approach. Those were probably easier to understand, but not so easy to expand. The current implementation seems very smart, but a bit hard to follow. It's probably a good investment of time in understanding the overall picture.
I'm planning on doing a step-by-step debugging of how the different players come alive, using your development debugging advice. Hope to have some working example soon.

@tuner
Copy link
Member

tuner commented Apr 9, 2024

The initialization process is indeed quite intricate and I'm not entirely satisfied of its current state. Using a debugger and setting breakpoints is indeed a good idea.

There are quite a few factors that complicate the initialization:

  • Building the view hierarchy is partially asynchronous, as views can be imported from urls.
  • Remote data loading must be parallelized to minimize the loading time.
  • Shader compilation is blocking but is done in parallel. The WebGL api doesn't support promises and callbacks and thus, we have to first initiate the compilation of shaders, then do something else, when there's nothing more to do, use the shaders, which may or may not block, depending on the compilation status, etc.
  • The data flow is hierarchical, like the views.
  • The views may share scales and axes, that must be resolved.
  • Etc

These are largely initiated at:

async _prepareViewsAndData() {

Particularly the resolution of scales and domains (which behave similarly to Vega-Lite) is quite tricky. Ideally, it should be possible to add and remove views dynamically, but that's not quite compatible with the current design.

When it comes to the implementation, you basically need to implement the onDomainChanged() method, do some calculation based on the scale.domain(), and finally call this.publishData([aDatumRepresentingTheMeasure]);

It may not be necessary to publish new data every time the domain changes. For instance, the axisTickSource publishes data only when the ticks produced by the tickValues function change. Similarly, the singleAxisWindowedSource uses the callIfWindowsChanged() method to initiate data loading only if the visible windows (or bins, or whatever) change.

Hope this helps!

@tuner
Copy link
Member

tuner commented Apr 9, 2024

how do they get updated,

There are basically three ways.

The static case

flow.dataSources.map((dataSource) => dataSource.load())

Named data through the API

The updateNamedData method

updateNamedData(name, data) {

... which is exposed through the API:

updateNamedData: (name: string, data?: any[]) => void;

Lazy data

And then there are the lazy data sources that attach a listener to a ScaleResolution, which manages the scales and has some utility methods for zooming, etc.

this.scaleResolution.addEventListener("domain", fireDomainChanged);

@razsultana
Copy link
Author

Thanks a lot for the very detailed explanation!
I think it will my make journey through the codebase a lot more focused on the essentials.
The one thing that I had figured out was that I need to implement the onDomainChanged() method but I was missing the understanding of the initialisation path, so when I tried to generate some data for the publishData method, the properties were unknown to the method, because they weren't initialised.
Now I can see that just having a "genome":{"name":"hg38"} entry in the spec sets in motion a lot of these datasources and I am very eager to see them come alive by stepping through the code in a simple example.
This week is also very hectic for me, so although I would really like to spend all my time on this, I can't :( , but I hope to get a working prototype soon.

@razsultana
Copy link
Author

Hi Kari, just letting you know I was away for a week with kids, camping (it's school holiday now) so I didn't get to do much, but I'm back now. The development environment that you recommended works like a charm - I have started to dig in the layered/hierarchical view and data sources initialisation and it's indeed complex. I'm though going up the steep part of the learning curve and hope to be able to get some results soon.

@tuner
Copy link
Member

tuner commented Apr 19, 2024

Take your time!

You may find the following function useful:

export async function createAndInitialize(spec, viewClass) {

I'm using that in tests that initialize the view hierarchy and data flow. I have plenty of tests for most of the flow nodes (transforms and eager data sources) but lazy sources lack tests – mostly because I would have to mock network requests and I've been ... lazy!

@razsultana
Copy link
Author

razsultana commented May 2, 2024

I pushed my minimal implementation of AxisMeasureSource to the master on my fork of the repository https://github.com/razsultana/genome-spy
I modified the embed example scaleApi.js to show a measure under the axis, using vconcat in the example spec.

As you have correctly anticipated, when showing the measure at very high zoom levels, where the measure is 10 bases or 1 base wide, it is very jittery because of rounding. I actually don't want to show a measure at these levels, as it provides nothing in addition to seeing the ticks, coordinates and possibly FASTA sequence, but I can't figure a way to have an "empty" measure track, except to publish nulls for startPos and endPos and "" for spanLabel, which generate "undefined" values for the rules - not clean...

I also would like to have way to trigger this datasource for a particular axis, the same way that ticks=True does it for axisTicksSource. I almost did it, but because the axisView class

export default class AxisView extends LayerView {
is returning a layer spec, the measure is overlayed on the axis. I thought about allowing axisView to be a vconcat for generating the axisSpec if the axis param measure==True and leave it a layer otherwise, but it doesn't seem right and I'm not sure I'm not breaking some other assumptions.

Any suggestions for dealing with the above two problems would be appreciated!

@tuner
Copy link
Member

tuner commented May 2, 2024

I pushed my minimal implementation of AxisMeasureSource to the master on my fork of the repository https://github.com/razsultana/genome-spy I modified the embed example scaleApi.js to show a measure under the axis, using vconcat in the example spec.

Awesome! I have yet to give it a try, but at least the code looks good!

Please make a PR. It's easier for me to comment and propose changes that way.

but I can't figure a way to have an "empty" measure track, except to publish nulls for startPos and endPos and "" for spanLabel, which generate "undefined" values for the rules - not clean...

Just publish an empty dataset: []

I also would like to have way to trigger this datasource for a particular axis, the same way that ticks=True does it for axisTicksSource.

Hmm. Interesting idea!

Axes are currently somewhat hacky. There are several challenges:

  • GenomeSpy doesn't currently do any bounds calculation for any marks. It means that if some, let's say, a text mark instance falls outside the view, the view size is not expanded to accommodate the mark instance. That's a problem with long axis tick labels, as the axis extent (its size) has to be configured explicitly.
  • The range of x and y channel is locked to [0, 1], which makes it difficult to work with pixel-based offsets. However, there's also the xOffset and yOffset mark properties, which control the positioning in pixels. I'm planning to make them a channel: Support xOffset and yOffset as encoding channels #230

Anyway, these may not be a problem when integrating the measure to axis for convenience.

I thought about allowing axisView to be a vconcat for generating the axisSpec if the axis param measure==True and leave it a layer otherwise, but it doesn't seem right and I'm not sure I'm not breaking some other assumptions.

That's not necessarily a problem. You may, however, stumble upon some challenges related to sizing of the views, etc. Feel free to try it out, but please make another branch/PR for it.

And the last thing, please don't put stuff under embed-examples, as this feature has nothing to do with embeding. Instead, the PR should, at least initially, introduce the new data source and provide an example in the documentation. (I'll update the CONTRIBUTING.md with instructions for bootstrapping the mkdocs with embeded GenomeSpy). In addition, you can put some example specs under packages/core/examples/, where it's easy to experiment . Current examples are quite random, and thus, just put it there somewhere 🙂.

@razsultana
Copy link
Author

I submitted a proper PR (sorry it took so long).
I actually managed to get mkdocs working and this is what I needed to do:
(I use conda for package management but homebrew can be used as well for the cairo library)

pip cache purge
pip install mkdocs material mkdocs-material mkdocs-material-extensions --upgrade
pip install mkdocs-git-revision-date-localized-plugin
pip install "mkdocs-material[imaging]"
conda install cairo

cd utils/markdown_extension
pip install --editable .

npm run build:docs

cd site
python3 -m http.server --bind 127.0.0.1

I thought it might be helpful as a starting point for the mkdocs bootstrapping section in CONTRIBUTING.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to the Core package enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants