Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve allotetraploid scaffolding in haphic? #22

Open
RezwanCAAS opened this issue Apr 17, 2024 · 13 comments
Open

How to improve allotetraploid scaffolding in haphic? #22

RezwanCAAS opened this issue Apr 17, 2024 · 13 comments

Comments

@RezwanCAAS
Copy link

Hi , I used haphic for allotetraploid genome (2n=4x=44). It makes 22 groups like the following which have huge variations in chromosome sizes as given below and can be seen in juicebox plot. Could you suggest how to improve this

group1 335147534
group2 265712237
group3 259566223
group4 256679759
group5 252275747
group6 223506148
group7 216891361
group8 204149503
group9 178470265
group10 169460411
group11 165539612
group12 151015244
group13 150193252
group14 124757396
group15 106023119
group16 103438627
group17 102100592
group18 94194491
group19 71377479
group20 51412018
group21 41958356
group22 41183465

Screenshot 2024-04-17 at 6 04 38 AM
@zengxiaofei
Copy link
Owner

Hi @RezwanCAAS,

According to your heatmap, it seems that the homologous chromosomes were incorrectly clustered. How did you assemble the genome and which assembly did you use for scaffolding (e.g., p_utg, p_ctg, or hap*.p_ctg)?

Best,
Xiaofei

@RezwanCAAS
Copy link
Author

Hi @zengxiaofei I used hifiasm with following command and got these outputs

module load hifiasm/0.19.8
hifiasm -o yellow_assembly -t 32 --hom-cov 63 \
 --h1 yellow_1.fastq.gz \
 --h2 yellow_2.fastq.gz \
 reads_cell_*

output

-rw-r--r-- 1 tariqr ibex-c2141 44943554304 Mar  2 02:38 yellow_assembly.ec.bin
-rw-r--r-- 1 tariqr ibex-c2141  3020953966 Mar 25 17:29 yellow_assembly.hic.hap1.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3083618349 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16185036 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    62763143 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3603444541 Mar 25 17:30 yellow_assembly.hic.hap2.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3680868301 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16712429 Mar  2 10:54 yellow_assembly.hic.hap2.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    77494725 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3358681400 Mar  2 10:04 yellow_assembly.hic.lk.bin
-rw-r--r-- 1 tariqr ibex-c2141  3728413366 Mar 25 17:31 yellow_assembly.hic.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3807131425 Mar  2 04:21 yellow_assembly.hic.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16786239 Mar  2 04:22 yellow_assembly.hic.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    78785721 Mar  2 04:21 yellow_assembly.hic.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  7065869776 Mar  2 04:16 yellow_assembly.hic.p_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    36327989 Mar  2 04:18 yellow_assembly.hic.p_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   141553288 Mar  2 04:17 yellow_assembly.hic.p_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  8681089843 Mar  2 04:12 yellow_assembly.hic.r_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    47969038 Mar  2 04:14 yellow_assembly.hic.r_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   156694833 Mar  2 04:13 yellow_assembly.hic.r_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141 50678500976 Mar  2 06:18 yellow_assembly.hic.tlb.bin
-rw-r--r-- 1 tariqr ibex-c2141 29932238864 Mar  2 03:49 yellow_assembly.ovlp.reverse.bin
-rw-r--r-- 1 tariqr ibex-c2141 20184090104 Mar  2 03:02 yellow_assembly.ovlp.source.bin

later, I used yellow_assembly.hic.p_ctg.fasta file for scaffolding with haphic. Please guide some points to improve the scaffolding.

@zengxiaofei
Copy link
Owner

zengxiaofei commented Apr 17, 2024

Hi @RezwanCAAS,

You need to concatenate the contigs in the hap*.p_ctg files for scaffolding, rather than the p_ctg file. This is because that the contigs in p_ctg are not phased. Additionally, the nchrs parameter should be set to 44.

Best,
Xiaofei

@RezwanCAAS
Copy link
Author

@zengxiaofei thank you so much for helping. I will let you know soon after getting the results.

@RezwanCAAS
Copy link
Author

RezwanCAAS commented Apr 21, 2024

Hi @zengxiaofei following the above given suggestions. I have this output in form of 44 groups. The groups are shown here in the hic plot. So what do you suggest here? how can I improve it


group1  268313361
group2  229185811
group3  213565163
group4  211333994
group5  192355810
group6  178114398
group7  175672271
group8  173070932
group9  167268982
group10 166994227
group11 165224190
group12 163976005
group13 160510958
group14 156551379
group15 154125454
group16 151485858
group17 149388751
group18 149068743
group19 147517207
group20 146026689
group21 145559499
group22 143005682
group23 141123267
group24 137886426
group25 137438397
group26 135923374
group27 130012929
group28 129423837
group29 128388296
group30 127998997
group31 124813519
group32 124052846
group33 118674522
group34 117104620
group35 112908057
group36 108358901
group37 101293320
group38 97290716
group39 90849898
group40 85567886
group41 74996220
group42 72339106
group43 44300534
group44 `41067405`

Screenshot 2024-04-21 at 9 26 45 PM

@zengxiaofei
Copy link
Owner

Hi @RezwanCAAS,

It seems that the heatmap is clear enough. You can manually adjust it in Juicebox after importing the .assembly file with the "balanced" normalization.

Best,
Xiaofei

@RezwanCAAS
Copy link
Author

RezwanCAAS commented Apr 22, 2024

@zengxiaofei thank you for your great help. I will add the final figure here after correction with juicebox as reference for other users as well.

@zengxiaofei
Copy link
Owner

@RezwanCAAS Thanks for your sharing!

@RezwanCAAS
Copy link
Author

@zengxiaofei Please check this plot having 44 chromosomes. How does this look?

contact_map.pdf

@RezwanCAAS
Copy link
Author

One more question, why these red circled lines are not contacting to the main scaffolds? I tried to make their curation but didn’t work. Is it due to artifacts of homologous regions.
?
IMG_1921

@zengxiaofei
Copy link
Owner

Sorry for the delay. I'm quite busy these days. Your contact heatmap shows that there are still many errors in the contig assignment, as well as the ordering and orientation. The Hi-C signals you highlighted with red circles are signals between the homologous chromosomes. They mainly derive from assembly errors (but it's normal for haplotype-phased assemblies). You could check out our manuscript for more information about collapsed contigs, chimeric contigs, and switch errors.

You could also have a look at the heatmaps we generated in the tests of real cases, especially those haplotype-phased assemblies. These figures are in the Supplementary Information. I believe it is helpful for you in curating your assembly.

@zengxiaofei
Copy link
Owner

Here are two examples of S. spontaneum Np-X (sugarcane) and C. sinensis Tieguanyin (tea plant):

image

image

@RezwanCAAS
Copy link
Author

@zengxiaofei thank you so much for your great support, and shared shared examples will be helpful to improve my genome plot accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants