Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing 8Mb sequences in the assembly #644

Open
xzhoubayer opened this issue Apr 29, 2024 · 5 comments
Open

missing 8Mb sequences in the assembly #644

xzhoubayer opened this issue Apr 29, 2024 · 5 comments

Comments

@xzhoubayer
Copy link

We assembled a set of 11 genomes of the same crop species with hifiasm version 0.19.8-r603. One of the 11, lineA, was an outlier in terms of overall assembly size. Particularly, 8Mb of the 5’-end of one of the chromosomes was missing in the assembly (when compared to others in the collection). Given the size and genome content of the missing sequence, we believe this missing region is not a biological difference but must be a technical artifact.

A few simple lines of evidence supported our hypothesis the region was incorrectly missing:

When the HiFi reads of lineA were mapped against the assembly of a highly related line ("lineB") using minimap2, we found that the HiFi reads of lineA had an even distribution of depth of coverage over every chromosome of lineB, including the 8Mb sequences missing at 5’-end of lineA chr4.

Moreover, there are more 11,000 HiFi reads from the lineA HiFi library that mapped to the 5’-end of lineB, including in the 8Mb region in question. Critically, none of these HiFi reads from lineA appeared in the assembly graph file (asm.bp.p_ctg.noseq.gfa). We found the same result using an older version of hifiasm (version 0.18.2).

As a troubleshooting measure, we assembled lineA with the same set of HiFi reads using canu2.2. In the canu assembly, the 8Mb was recovered and assembled in the correct position.

@chhylp123
Copy link
Owner

Hi @xzhoubayer, sorry for the late reply. Could you please double check if these missing contigs could be found with a_ct.gfa?

@xzhoubayer
Copy link
Author

Hi @xzhoubayer, sorry for the late reply. Could you please double check if these missing contigs could be found with a_ct.gfa?

I don't see the a_ct.gfa file. This is the list of gfa file I have:
asm.bp.hap1.p_ctg.gfa
asm.bp.hap1.p_ctg.noseq.gfa
asm.bp.hap2.p_ctg.gfa
asm.bp.hap2.p_ctg.noseq.gfa
asm.bp.p_ctg.gfa
asm.bp.p_ctg.noseq.gfa
asm.bp.p_utg.gfa
asm.bp.p_utg.noseq.gfa
asm.bp.r_utg.gfa
asm.bp.r_utg.noseq.gfa

@chhylp123
Copy link
Owner

Is your sample homozygous? If it is, it would be better to run hifiasm with -l0, which will be able to disable purge_dups. To get a_ctg. please add --primary as well.

@xzhoubayer
Copy link
Author

it is relatively homozygous. I triled -l 0, no difference.
I will try --primary and update you.

@xzhoubayer
Copy link
Author

I tried --primary and -l0, no improvement. The mapped read still do not show up in asm.bp.p_ctg.gfa and asm.bp.a_ctg.gfa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants