Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAF output #3

Open
ekg opened this issue Dec 14, 2021 · 4 comments
Open

GAF output #3

ekg opened this issue Dec 14, 2021 · 4 comments

Comments

@ekg
Copy link

ekg commented Dec 14, 2021

Could you provide GAF output over the nodes of the graph? Would it be possible to do? I could also submit a patch if you can describe how you'd go about setting this up.

It's defined in the minigraph documentation, and GraphAligner and vg both produce it.

It's like PAF but the target is expressed as a walk through nodes in the graph (your GFA input).

The walk is expressed like >1>2>4>6>7>9>10>11>12>13>14>16>17>19

If you have <19<17<16<14<13<12<11<10<9<7<6<4<2<1< then it might represent the reverse complement of the above walk.

You'll need to create a CIGAR or md tag to express the base-level alignment, for use downstream in vg call for instance.

@pesho-ivanov
Copy link
Collaborator

Thank you for the feedback @ekg.

Yes, it is possible. Currently, the GFA coordinates information is not propagated to the internal graph representation. GAF is on my todo list and I can prioritize it now. Let me update this thread again the next days.

@ekg
Copy link
Author

ekg commented Jan 20, 2022

Is this feasible to do? I'd love to test astarix, and a standard output format like this is critical to do that!

@ChriKub
Copy link

ChriKub commented May 9, 2022

Is this still on the agenda?
GAF, or GAM outpost would be great to enable the use of the alignments in any downstream analysis.

@pesho-ivanov
Copy link
Collaborator

Dear @ekg and @ChriKub, please excuse my delays. I was overoptimistic on implementing the GAF output in time but given my upcoming PhD defense, it may be delayed further.

In case someone wants to implement this sooner, there are several subtleties to be taken care of:

  1. The GFA input coordinates should be dragged along the whole way in the node_t structure (incl. the reverse-complement nodes).
  2. Any path starts at the trie (whose nodes are abstractions of nodes in GFA so they should not hold any GFA information). This means that to reconstruct the GFA coordinates at the beginning of the path, one has to go back in the graph instead of "climbing" the trie.
  3. Reverse complement nodes can be distinguished by their id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants