Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

larger assembly size than kmer estimation genome size #622

Open
leon945945 opened this issue Mar 10, 2024 · 2 comments
Open

larger assembly size than kmer estimation genome size #622

leon945945 opened this issue Mar 10, 2024 · 2 comments

Comments

@leon945945
Copy link

Hi, I estimated the genome size with HiFi data, the estimated genome size is 328Mb with 1.02% hetorozygosity:
plot

I assembled the primary genome and phased genome with HiC data by hifiasm. The size of primary genome is 409Mb and two phased haplotype are 389Mb and 366Mb with default hifiasm -s 0.55. They are larger than the estimation genome size.

Then I adjusted the parameter to -s 0.3, the primary genome size decreased to 396Mb, two phased haplotype size decreased to 377Mb and 356Mb. They are still larger than the estimation size.

Could you please give me some suggestions on how to adjust the assembly size. Thanks.

@olekto
Copy link

olekto commented Mar 10, 2024

Why do you need to adjust the assembly size? The output from GenomeScope is an estimate, and not the ground truth. It is better to evaluate the assemblies with such tools as BUSCO and Merqury. If you then see you are missing a lot of data (but you have the opposite case in a way), then you can try different measures.

In short, it is likely fine.

@leon945945
Copy link
Author

Thanks for your suggestions @olekto . I assembled the scaffolds from hifiasm to super-scaffolds with HiC data by 3d-DNA, the depth of HiC data is around ~100X. It resulted in ~310Mb super-scaffold sequences (9 chromosomes of this species), ~60-90 Mb fragment sequences. I thought the fragment sequences are too many, therefore I want to adjust the assembly size.

By the way, what the fragment sequences usually are? Do the T2T genomes have no fragment sequences when assembled with HiC data?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants