Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sankey plot visualization for undefined ranks #109

Open
hoelzer opened this issue Jul 14, 2023 · 0 comments
Open

Fix sankey plot visualization for undefined ranks #109

hoelzer opened this issue Jul 14, 2023 · 0 comments

Comments

@hoelzer
Copy link
Collaborator

hoelzer commented Jul 14, 2023

The Sankey plot visualization has difficulties when taxonomic ranks are missing. This can be solved to some extent by introducing "unclassified" ranks based on the parent rank. However, this is currently not working properly for all levels. For example, using the assembly.fasta test file:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow/test/assembly.fasta

produces such a sankey:

Screenshot 2023-07-14 at 16 59 21

but correct is:

Screenshot 2023-07-14 at 17 01 56

As illustrated, the problem is that the subfamily Guernseyvirinae does not have a family or an order rank; only a class _ Caudoviricetes_. Now, the current script introduces Unclassified Caudoviricetes to fill the order rank but then the family rank is still missing and the arrangement will be wrong (see first figure).

I think we can fix that by

a) introducing multiple "unclassified" (or better: "undefined" !) ranks
b) adding the rank level to the label (because we need unique labels)

For example, for Jerseyvirus we would then have in the Sankey:

Caudoviricetes --> Undefined Caudoviricetes (Order) --> Undefined Caudoviricetes (Family) --> Guernseyvirinae --> Jerseyvirus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant