pore-C data #5

socialhang · 2023-12-18T01:13:34Z

Can HapHiC input porec data? If not, what adjustments do I need to make?

Thank you in advance for your response

zengxiaofei · 2023-12-18T05:12:37Z

I have a plan to support Pore-C data, but it is not a high priority at the moment.
I haven't conducted any tests on Pore-C data yet, so I am unable to provide any useful information about it at this time.

Best regards,
Xiaofei

zengxiaofei · 2023-12-18T08:38:16Z

Hi @socialhang,

I have conducted an experiment on the test data of https://github.com/epi2me-labs/wf-pore-c. You can try the following steps with your own FASTA and FASTQ files:

(1) Execute the wf-pore-c pipeline to align Pore-C data and create a mock paired-end BAM file:

source ~/software/miniconda3/bin/activate nextflow

# modify --fastq, --ref, --cutter, and --chunk_size
nextflow run epi2me-labs/wf-pore-c \
             --fastq /work/software/genome/wf-pore-c-265a1a4/test_data/porec_test.concatemers.fastq --chunk_size 100 \
             --ref /work/software/genome/wf-pore-c-265a1a4/test_data/porec_test.fasta \
             --cutter NlaIII \
             --paired_end_minimum_distance 100 --paired_end_maximum_distance 200 --hi_c --mcool --paired_end

conda deactivate

Then, you will obtain a paired-end BAM file named null.ns.bam in the output/paired_end directory.

(2) Rename the Hi-C reads in null.ns.bam using a custom script rename_reads_for_bam.py and filter Hi-C reads:

python3 rename_reads_for_bam.py null.ns.bam | samtools view - -@ 14 -S -h -b -F 3340 -o HiC.bam
/path/to/HapHiC/utils/filter_bam.py HiC.bam 1 --threads 14 | samtools view -b -@ 14 -o HiC.filtered.bam

Finally, you can use HiC.filtered.bam for HapHiC scaffolding pipeline.

Here is the custom script:
rename_reads_for_bam.py.gz

Please be aware that this method has not been fully tested. I welcome your feedback on any attempts you make. This will be of great help. Once this method is validated, I will integrate it into HapHiC.

Best regards,
Xiaofei

socialhang · 2023-12-18T08:44:46Z

Thanks a lot！

I will try today！

socialhang · 2023-12-20T00:55:06Z

hello! @zengxiaofei

I'm back! I'm so sorry, the wf-pore-c software could not run on my server. So could I use falign to make the bam for replacing wf-pore-c？

zengxiaofei · 2023-12-20T02:17:25Z

Hi @socialhang,

You could try it. It seems that falign can produce pairwise SAM files that compatible with SALSA2 using the command sam2salsa2 in u4falign.

Best,
Xiaofei

socialhang · 2023-12-20T02:29:27Z

OK！

socialhang · 2023-12-24T05:13:23Z

Hello! @zengxiaofei

I produce pairwise bam file using falign , but some errors happend.

Traceback (most recent call last):
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 483, in <module>
    main()
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 464, in main
    inflation = haphic_cluster(args)
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 323, in haphic_cluster
    HapHiC_cluster.run(args, log_file=LOG_FILE)
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_cluster.py", line 2361, in run
    flank_link_matrix, frag_index_dict = dict_to_matrix(
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_cluster.py", line 283, in dict_to_matrix
    shape = len(frag_set)
TypeError: object of type 'NoneType' has no len()
Traceback (most recent call last):
  File "/root/software/HapHiC_1.0.2/haphic", line 96, in <module>
    subprocess.run(commands, check=True)
  File "/root/miniconda3/envs/haphic/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py', 'hifiasm_hic_ul_p_utg.fa', 'small.1.bam', '32', '--remove_allelic_links', '4', '--Nx', '100', '--gfa', 'hifiasm.asm.hic.p_utg.gfa', '--min_RE_sites', '1', '--min_links', '1', '--min_link_density', '0', '--threads', '40', '--max_ctg_len', '400000']' returned non-zero exit status 1.

I think its because of my bam format.
small.1.bam.gz

Can you help me point out the errors in my bam file？

Thanks a lot!

zengxiaofei · 2023-12-25T03:17:55Z

Hi @socialhang,

Could you please paste or upload 01.cluster/HapHiC_cluster.log?

Best,
Xiaofei

socialhang · 2023-12-25T03:19:22Z

OK, waiting a minute

socialhang · 2023-12-25T03:22:31Z

HapHiC_cluster.log
HapHiC_cluster.e.txt

zengxiaofei · 2023-12-25T04:19:55Z

Is the smalll.1.bam that you uploaded the same one you used for HapHiC scaffolding, or just a portion of it?

socialhang · 2023-12-25T05:58:21Z

Is the smalll.1.bam that you uploaded the same one you used for HapHiC scaffolding, or just a portion of it?

Since my bam file is 400Gb in size, smalll.1.bam is part of the small data I extracted. In addition, I have also run haphic with this data, and the error reporting is completely consistent with big datasmalll.1.bam.

zengxiaofei · 2023-12-25T06:46:55Z

Please use this script (modify_bam_for_falign.py.gz) to modify the bam file:

python3 modify_bam_for_falign.py small.1.bam | samtools view -b -o new.bam

socialhang · 2023-12-25T06:47:59Z

Please use this script (modify_bam_for_falign.py.gz) to modify the bam file:
python3 modify_bam_for_falign.py small.1.bam | samtools view -b -o new.bam

OK! Thanks! I will try!

socialhang · 2023-12-25T09:34:42Z

some errors happen:

2023-12-25 09:02:22 <HapHiC_pipeline.py> [main] Pipeline started, HapHiC version: 1.0.2 (update: 2023.12.16)
2023-12-25 09:02:22 <HapHiC_pipeline.py> [main] Python version: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
2023-12-25 09:02:22 <HapHiC_pipeline.py> [haphic_cluster] Step1: Execute preprocessing and Markov clustering for contigs...
2023-12-25 09:02:22 <HapHiC_cluster.py> [run] Program started, HapHiC version: 1.0.2 (update: 2023.12.16)
2023-12-25 09:02:22 <HapHiC_cluster.py> [run] Python version: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
2023-12-25 09:02:22 <HapHiC_cluster.py> [parse_fasta] Parsing input FASTA file...
2023-12-25 09:02:34 <HapHiC_cluster.py> [parse_gfa] Parsing input gfa file(s)...
2023-12-25 09:02:38 <HapHiC_cluster.py> [stat_fragments] Making some statistics of fragments (contigs / bins)
2023-12-25 09:02:38 <HapHiC_cluster.py> [stat_fragments] bin_size is calculated to be 2000000 bp
2023-12-25 09:02:41 <HapHiC_cluster.py> [parse_bam] Parsing input BAM file...
Traceback (most recent call last):
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 483, in <module>
    main()
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 464, in main
    inflation = haphic_cluster(args)
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py", line 323, in haphic_cluster
    HapHiC_cluster.run(args, log_file=LOG_FILE)
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_cluster.py", line 2304, in run
    full_link_dict, flank_link_dict, HT_link_dict, clm_dict, frag_link_dict, ctg_coord_dict, ctg_pair_to_frag = parse_bam(
  File "/root/software/HapHiC_1.0.2/scripts/HapHiC_cluster.py", line 1481, in parse_bam
    frag_len_i, frag_len_j = frag_len_dict[frag_i], frag_len_dict[frag_j]
KeyError: 'utg000028l_bin4'
Traceback (most recent call last):
  File "/root/software/HapHiC_1.0.2/haphic", line 96, in <module>
    subprocess.run(commands, check=True)
  File "/root/miniconda3/envs/haphic/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/root/software/HapHiC_1.0.2/scripts/HapHiC_pipeline.py', 'hifiasm_hic_ul_p_utg.fa', 'ul9.HiC.filter.correct0.bam', '32', '--remove_allelic_links', '4', '--Nx', '100', '--gfa', 'hifiasm.asm.hic.p_utg.gfa', '--min_RE_sites', '1', '--min_links', '1', '--min_link_density', '0', '--threads', '40', '--max_ctg_len', '400000']' returned non-zero exit status 1.

@zengxiaofei

socialhang · 2023-12-25T09:37:11Z

HapHiC_cluster.2.log
HapHiC_cluster.e.2.log

zengxiaofei · 2023-12-25T10:19:59Z

It seems that the position in the BAM file (the fourth column for pos or the eighth column for mpos ) exceeds the length of the contig.

The error message shows that the bin_size was calculated to be 2 Mb, and the error occurred when the mapping position of a read in the BAM file was within the range of [6000001, 8000000] (bin ID utg000028l_bin4). However, based on the bam file you uploaded, the contig utg000028l is only 4,798,928 bp in length.

socialhang · 2023-12-26T03:55:16Z

Hi！@zengxiaofei

I've found the problem and I've updated the script.
correct_sam.py.zip
Now the process is running normally, but there are clusters of homologous chromosomes in the results, and I think the signal between the different haploids is not screened clean. How can I adjust the parameters?

zengxiaofei · 2023-12-26T04:12:23Z

Could you please provide a full view of the Hi-C contact heatmap that includes scaffold boundaries (called superscaffold in Juicebox, and surrounded by blue outlines)?

socialhang · 2023-12-26T05:38:56Z

Could you please provide a full view of the Hi-C contact heatmap that includes scaffold boundaries (called superscaffold in Juicebox, and surrounded by blue outlines)?

could you give me an email address？I will send you the hic heat map privately. @zengxiaofei

zengxiaofei · 2023-12-26T06:22:49Z

You could send email to xiaofei_zeng#whu.edu.cn (replace # with @), or contact me on WeChat: zeng_xiaofei.

LikeLucasLab · 2024-04-25T16:31:46Z

@socialhang or @zengxiaofei have you somehow managed to adjust the parameters to improve your contactmap and distinguish between the different haploids ?

zengxiaofei · 2024-05-07T03:50:12Z

Hi @LikeLucasLab,

The primary challenge faced when working with Pore-C data lies in achieving accurate mapping. We conducted some tests using falign, and adapted HapHiC to be compatible with this aligner through a Python script. Although Pore-C data showed performance comparable to NGS-based Hi-C data in scaffolding haplotype-collapsed assemblies, it exhibited much lower mapping accuracy in the context of distinguishing haplotypes, resulting in much stronger Hi-C signals being observed between haplotigs. We tried different MAPQ filtering criteria, but the results did not have significant improvements.

There may be some adjustments in HapHiC for Pore-C data in the future updates. When using the current version of HapHiC for scaffolding haplotype-phased assemblies, please always add the --remove_allelic_links parameter to remove Hi-C links between allelic contigs. Lower --nwindows or --concordance_ratio_cutoff value could potentially improve the scaffolding results.

csxie-666 · 2024-05-27T04:13:28Z

Hi @zengxiaofei
Is it possible to improve mapping accuracy by correcting pore-c reads using herro (https://github.com/lbcb-sci/herro)?

zengxiaofei · 2024-05-28T09:38:26Z

Hi @csxie-666,

As a update to this issue, I currently recommend using minimap2 other than falign for Pore-C reads mapping. HapHiC has supported the BAM file output by minimap2 using a Python script.

Hi @zengxiaofei Is it possible to improve mapping accuracy by correcting pore-c reads using herro (https://github.com/lbcb-sci/herro)?

According to the author's description, herro was exclusively trained on human data (lbcb-sci/herro#23). I'm not sure whether it is versatile for other species and ploidy levels. I'm trying out another haplotype-aware ONT read correction tool called DeChat on ONT ultra-long reads. This tool has been validated in simulated autopolyploids. However, I encounter a difficulty in running the program (refer to this issue: LuoGroup2023/DeChat#3).

It is worth noting that Pore-C data may significantly differ from ONT whole-genome sequencing data in the context of read correction. This is because Pore-C reads consist of sequences originating from different loci, resulting in lower coverage between overlapping Pore-C reads compared to ONT whole-genome sequencing reads.

Best,
Xiaofei

zengxiaofei added the enhancement New feature or request label Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pore-C data #5

pore-C data #5

socialhang commented Dec 18, 2023

zengxiaofei commented Dec 18, 2023

zengxiaofei commented Dec 18, 2023 •

edited

socialhang commented Dec 18, 2023

socialhang commented Dec 20, 2023

zengxiaofei commented Dec 20, 2023

socialhang commented Dec 20, 2023

socialhang commented Dec 24, 2023 •

edited by zengxiaofei

zengxiaofei commented Dec 25, 2023

socialhang commented Dec 25, 2023

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023 •

edited

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023

socialhang commented Dec 25, 2023

socialhang commented Dec 25, 2023 •

edited by zengxiaofei

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023 •

edited

socialhang commented Dec 26, 2023

zengxiaofei commented Dec 26, 2023

socialhang commented Dec 26, 2023

zengxiaofei commented Dec 26, 2023

LikeLucasLab commented Apr 25, 2024 •

edited

zengxiaofei commented May 7, 2024 •

edited

csxie-666 commented May 27, 2024

zengxiaofei commented May 28, 2024 •

edited

pore-C data #5

pore-C data #5

Comments

socialhang commented Dec 18, 2023

zengxiaofei commented Dec 18, 2023

zengxiaofei commented Dec 18, 2023 • edited

socialhang commented Dec 18, 2023

socialhang commented Dec 20, 2023

zengxiaofei commented Dec 20, 2023

socialhang commented Dec 20, 2023

socialhang commented Dec 24, 2023 • edited by zengxiaofei

zengxiaofei commented Dec 25, 2023

socialhang commented Dec 25, 2023

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023 • edited

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023

socialhang commented Dec 25, 2023

socialhang commented Dec 25, 2023 • edited by zengxiaofei

socialhang commented Dec 25, 2023

zengxiaofei commented Dec 25, 2023 • edited

socialhang commented Dec 26, 2023

zengxiaofei commented Dec 26, 2023

socialhang commented Dec 26, 2023

zengxiaofei commented Dec 26, 2023

LikeLucasLab commented Apr 25, 2024 • edited

zengxiaofei commented May 7, 2024 • edited

csxie-666 commented May 27, 2024

zengxiaofei commented May 28, 2024 • edited

zengxiaofei commented Dec 18, 2023 •

edited

socialhang commented Dec 24, 2023 •

edited by zengxiaofei

zengxiaofei commented Dec 25, 2023 •

edited

socialhang commented Dec 25, 2023 •

edited by zengxiaofei

zengxiaofei commented Dec 25, 2023 •

edited

LikeLucasLab commented Apr 25, 2024 •

edited

zengxiaofei commented May 7, 2024 •

edited

zengxiaofei commented May 28, 2024 •

edited