Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to make the correct genome size estimation for allotetraploid species? #634

Open
RezwanCAAS opened this issue Apr 4, 2024 · 2 comments

Comments

@RezwanCAAS
Copy link

Hi, I assembled the genome of allotetraploid species using hifiasm with size of ~3.7gb. I used the PacBio HiFi reads in Merqury for kmer analysis of genome estimation of our allotetraploid species. I have shared the figure of genomescope plot, which showing the size of 1.5gb. I am astonished what's wrong here. Can someone guide me in this?

Second question is why the observed peaks are going out of the model peak? I shall be grateful to you.

Regards
Rezwan
linear_plot

@chhylp123
Copy link
Owner

How did you run hifiasm? A good k-mer plot should be: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash. If only primary assembly is required, you could have a try to run purge_dups after hifiasm assemmbly.

@RezwanCAAS
Copy link
Author

RezwanCAAS commented Apr 16, 2024

Sorry for the late reply because I was traveling and busy with field experiment. I used these commands for the assembly

#1st command

module load hifiasm/0.19.8
hifiasm -o yellow_assembly -t 32 --hom-cov 63 \
 --h1 yellow_1.fastq.gz \
 --h2 yellow_2.fastq.gz \
 reads_cell_*

output

-rw-r--r-- 1 tariqr ibex-c2141 44943554304 Mar  2 02:38 yellow_assembly.ec.bin
-rw-r--r-- 1 tariqr ibex-c2141  3020953966 Mar 25 17:29 yellow_assembly.hic.hap1.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3083618349 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16185036 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    62763143 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3603444541 Mar 25 17:30 yellow_assembly.hic.hap2.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3680868301 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16712429 Mar  2 10:54 yellow_assembly.hic.hap2.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    77494725 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3358681400 Mar  2 10:04 yellow_assembly.hic.lk.bin
-rw-r--r-- 1 tariqr ibex-c2141  3728413366 Mar 25 17:31 yellow_assembly.hic.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3807131425 Mar  2 04:21 yellow_assembly.hic.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16786239 Mar  2 04:22 yellow_assembly.hic.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    78785721 Mar  2 04:21 yellow_assembly.hic.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  7065869776 Mar  2 04:16 yellow_assembly.hic.p_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    36327989 Mar  2 04:18 yellow_assembly.hic.p_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   141553288 Mar  2 04:17 yellow_assembly.hic.p_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  8681089843 Mar  2 04:12 yellow_assembly.hic.r_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    47969038 Mar  2 04:14 yellow_assembly.hic.r_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   156694833 Mar  2 04:13 yellow_assembly.hic.r_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141 50678500976 Mar  2 06:18 yellow_assembly.hic.tlb.bin
-rw-r--r-- 1 tariqr ibex-c2141 29932238864 Mar  2 03:49 yellow_assembly.ovlp.reverse.bin
-rw-r--r-- 1 tariqr ibex-c2141 20184090104 Mar  2 03:02 yellow_assembly.ovlp.source.bin

#2nd command

module load hifiasm/0.19.8

hifiasm -o yellow_assembly -t 32 -s 0.30 -D 10 \
 --h1 yellow_1.fastq.gz \
 --h2 yellow_2.fastq.gz \
 reads_cell_*

output

-rw-r--r-- 1 tariqr ibex-c2141  4039031425 Feb 29 12:02 yellow_assembly.hic.hap1.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  4124692575 Feb 28 07:17 yellow_assembly.hic.hap1.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    19638440 Feb 28 07:18 yellow_assembly.hic.hap1.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    85772645 Feb 28 07:18 yellow_assembly.hic.hap1.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  2589525069 Feb 29 12:03 yellow_assembly.hic.hap2.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  2643948816 Feb 28 07:18 yellow_assembly.hic.hap2.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    13358259 Feb 28 07:19 yellow_assembly.hic.hap2.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    54482262 Feb 28 07:18 yellow_assembly.hic.hap2.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3348883208 Feb 28 06:28 yellow_assembly.hic.lk.bin
-rw-r--r-- 1 tariqr ibex-c2141  3619840841 Feb 29 12:01 yellow_assembly.hic.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3696066233 Feb 28 01:27 yellow_assembly.hic.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16596961 Feb 28 01:27 yellow_assembly.hic.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    76289876 Feb 28 01:27 yellow_assembly.hic.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  7100699338 Feb 28 01:23 yellow_assembly.hic.p_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    36971704 Feb 28 01:24 yellow_assembly.hic.p_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   141880425 Feb 28 01:23 yellow_assembly.hic.p_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  8644503429 Feb 28 01:20 yellow_assembly.hic.r_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    48118375 Feb 28 01:22 yellow_assembly.hic.r_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   156440319 Feb 28 01:21 yellow_assembly.hic.r_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141 50669605160 Feb 28 03:46 yellow_assembly.hic.tlb.bin
-rw-r--r-- 1 tariqr ibex-c2141 35754767954 Feb 28 00:57 yellow_assembly.ovlp.reverse.bin
-rw-r--r-- 1 tariqr ibex-c2141 20290235360 Feb 28 00:40 yellow_assembly.ovlp.source.bin

I used 1st command output file for making kmer analysis with merqury. Please check the result and let me know some great suggestions. Moreover, I want to add here that the parents of polyploid species have high homology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants