Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaffold longer than reference genome due to NNNNs #172

Open
vappiah opened this issue Nov 20, 2023 · 3 comments
Open

Scaffold longer than reference genome due to NNNNs #172

vappiah opened this issue Nov 20, 2023 · 3 comments

Comments

@vappiah
Copy link

vappiah commented Nov 20, 2023

Hi @malonge

I have used ragtag on different datasets and every time, the final sequence comes out being longer than the reference sequence. I found out that this is due to the introduction of NNNNs by ragtag. Is this behaviour expected?

Vincent

@shivanshss
Copy link

Dear Vincent,

I have also seen this behaviour while working with the Tribolium castaneum genome. If I scaffold my draft genome using the published reference, I see two giant contigs get attached to each other with ~1 Mbp of gap. I believe this behaviour is expected. However, A validation would be to draw a 1v1 dot plot between the reference and query to check if the introduced gaps make sense (you may use SibiliaZ or nucmer for this validation). I hope it helps.

PS: I am not the author of this tool. I just used it for a project

@vappiah
Copy link
Author

vappiah commented Nov 30, 2023

Dear @shivanshss,

Thanks for the information. I will draw the dot plot.
For the query sequence do you mean the one I generated after running ragtag or the assembly fasta file?

@shivanshss
Copy link

Dear @vappiah

Dot plot between your query and reference before using Ragtag would tell you if there is a gap in your query that could have been filled with Ns at the time of scaffolding.

Dot plot between your Ragtag output and your original reference will tell you if the gap position is weird in any way.

You may need to do some breakpoint analysis with original reads used for assembly to further your understanding of the gap.

Additionally I would also draw a kind of synteny plot between your original query and reference (this is similar to the dot plot but slight more informative).

This would be a sanity check just to make sure that something unexpected is not happening. If you find that everything is as expected, then you don't have to worry about the Ns that are introduced at the time of scaffolding.

I would also wait for the author to comment because, as I told earlier, I am not the author of this tool and they would know better.

Hope it helps.

Sincerely,
Shivansh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants