Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly？ #632

GLking123 · 2024-04-01T04:25:22Z

Dear author,
Thank you for developing such a milestone software, which greatly accelerates the efficiency of assembly.

I am currently conducting assembly of a large genome of a plant species, which is a homologous tetraploid with a genome size of approximately 55 gigabases (G). Currently, I only have HiFi data available. I have employed three assembly strategies, outlined as follows:

hifiasm -t 120 -l 0 the generated .p_ctg.gfa file is of size 55G, and the .p_utg.gfa file is of size 75G.
hifiasm --n-hap 2 -t 120 -l 0 the generated .p_ctg.gfa file is of size 55G, and the .p_utg.gfa file is of size 56G.
hifiasm --n-hap 4 -t 120 -l 0 the generated .p_ctg.gfa file is of size 56G, and the .p_utg.gfa file is of size 76G.

Using flow cytometry, the estimated genome size is approximately 50 G.

I used HapHic to scaffold chromosomes, but encountered numerous errors. Perhaps using p_utg would yield better results?

Currently, the generated size of p_utg with the --n-hap 2 parameter meets expectations. Can the generated p_utg be used?

What is the difference between using --n-hap without specifying a number and using --n-hap 4? Why is the size of p_utg significantly larger when using --n-hap 4 compared to --n-hap 2?

The following is the k-mer graph generated by Hifiasm：

For the above question, could you provide some debugging suggestions? Thank you for your valuable time and assistance. I sincerely look forward to your response！

chhylp123 · 2024-04-04T03:08:37Z

--n-hap is used to determine the coverage of heterozygous nodes or contigs. For your sample, hifiasm thinks the homozygous coverage is 26, and the heterozygous coverages are 26/2 = 13 and 26/4 = 6 using --n-hap 2 and --n-hap 4, respectively. Hifiasm keeps any node in the assembly graph with coverage above the heterozygous coverage threshold as a real node, instead of sequencing errors. This is why --n-hap 4 leads to a larger graph. Could you please have a try with --hom-cov 55 and --n-hap 2? Since bv looking at the k-mer plot, there are only two peaks and the homozygous coverage should be 55.

GLking123 · 2024-06-02T14:13:54Z

Dear author,
I tried your suggestions, and here are the results:

hifiasm --n-hap 2 -t 120 -l 0 --hom-cov 55 the generated .p_ctg.gfa file is of size 56G, and the .p_utg.gfa file is of size 66G.

Since mine is a homologous tetraploid, which form should I choose for assembly, p_ctg or p_utg?

p_ctg N50: 100MB
p_utg N50: 1MB

For the above question, could you provide some debugging suggestions? Thank you for your valuable time and assistance. I sincerely look forward to your response！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly？ #632

Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly？ #632

GLking123 commented Apr 1, 2024 •

edited

chhylp123 commented Apr 4, 2024

GLking123 commented Jun 2, 2024 •

edited

Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly？ #632

Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly？ #632

Comments

GLking123 commented Apr 1, 2024 • edited

chhylp123 commented Apr 4, 2024

GLking123 commented Jun 2, 2024 • edited

GLking123 commented Apr 1, 2024 •

edited

GLking123 commented Jun 2, 2024 •

edited