Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone libraries to work with GFA #3

Open
lh3 opened this issue Jul 18, 2019 · 4 comments
Open

Standalone libraries to work with GFA #3

lh3 opened this issue Jul 18, 2019 · 4 comments
Labels
discussion General discussions

Comments

@lh3
Copy link
Owner

lh3 commented Jul 18, 2019

Another discussion thread. It is probably too early to implement libraries now, but it would be good to start thinking about the topic.

Currently, gfatools comes with very preliminary APIs to read rGFA into memory. The memory layout is described in gfa.h. It largely follows the model of string graphs. I quite like model and will stick with it. However, I guess general devs will feel uncomfortable with this representation. I won't have the bandwidth to implement the more general path model any time soon, either. In addition, it is also preferable to have two independent implementations (e.g. samtools vs picard vs bamtools). I wonder if you (@ekg and @benedictpaten) are interested in implementing a standalone library to work with GFA. You already have in vg a GFA parser, an in-memory model and a serialization format. You can isolate the relevant code and expose stable C and C++ APIs to other devs. I know vg has APIs, but I guess other devs will prefer a more focused lightweight library that is easier to build.

@lh3 lh3 added the discussion General discussions label Jul 18, 2019
@ekg
Copy link

ekg commented Jul 18, 2019

@lh3, we have developed a library to provide a standard interface to sequence graphs with embedded paths, https://github.com/vgteam/libhandlegraph.

The idea with this interface hierarchy is to expose something based on a few primitive types without needing to implement the data structure using those types. For instance, we often represent graphs using fully succint data structures, but this means that entities in the graph can't be represented as pointers to nodes or or atomic IDs. The handle concept refers to the bidirectional identifier used by a particular implementation to refer to a node (S line) in the graph.

The class hierarchy includes immutable sequence graphs, graphs with paths (VG model), and mutable versions of them. It also exposes a positional index based on the embedded paths.

Two implementations are based on reading GFA files into a self index and exposing aspects of this API on top of them (xg and odgi). We have a study in progress to compare implementations.

It should be easy enough to add a simpler fixed C and C++ interface on top of these. I don't think the semantics become radically different. There is a mismatch with the number of coordinate spaces. There are some semantic mismatches with rGFA, but they can be resolved.

@lh3
Copy link
Owner Author

lh3 commented Jul 18, 2019

An important question is about the scope of the library. vg is too large. I think in its current form, libhandlegraph is too small. My preference is to include at least a GFA parser and an in-memory data structure like handle graph. I don't have a strong opinion on serialization, indexing and other stuffs.

Another question is about the terminology. The use of "(sequence) segment" and "link" can be traced back to the discussion on the FASTG format. Richard and I wanted to avoid "vertex", "node", "edge" and "arc" because in the assembly world, people always have different opinions. In a de Bruijn graph, "vertex" and "edge" are interchangeable to some extent, and as a result, a graph simplified from a de Bruijn graph is more often represented in the "edge way", with sequences put on edges instead of nodes. Adopting the GFA terminology will help to avoid such confusions.

@ekg
Copy link

ekg commented Jul 18, 2019 via email

@bricoletc
Copy link

How about https://github.com/edawson/gfakluge ? Though i don't think it supports rGFA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion General discussions
Projects
None yet
Development

No branches or pull requests

3 participants