NextPolish2

Telomere-to-telomere (T2T) genome has been emerging as a new hotspot in the field of genomics. Typically, we obtain a T2T genome with datasets including both high-accuracy PacBio HiFi long reads and Oxford Nanopore Technologies (ONT) ultra-long reads. Although genomes obtained using HiFi long reads have considerably higher qualities, however, they still contain a handful of assembly errors in regions where HiFi long reads stumble as well, such as homopolymer or low-complexity microsatellite regions. Additionally, a typical gap-filling step is accomplished using ONT ultra long reads which contain a certain amount of errors. Hence, the current T2T genomes assembled still require further improvement in terms of consensus accuracy. NextPolish2 can be used to fix these errors (SNV/Indel) in a high quality assembly. Through the built-in phasing module, it can only correct the error bases while maintaining the original haplotype consistency. Therefore, even in the regions with complex repeat elements, NextPolish2 will still not produce overcorrections. In fact, in some cases it can reduce switching errors in the heterozygous region. NextPolish2 is not an upgraded version of NextPolish, but an additional supplement for the pursuit of extremely-high-quality genome assemblies.

meryl count k=15 output merylDB asm.fa.gz
meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt
winnowmap -t 5 -W repetitive_k15.txt -ax map-pb asm.fa.gz hifi.fasta.gz|samtools sort -o hifi.map.sort.bam -

# or mapping using minimap2
# minimap2 -ax map-hifi -t 5 asm.fa.gz hifi.fasta.gz|samtools sort -o hifi.map.sort.bam -

# indexing
samtools index hifi.map.sort.bam

Prepare k-mer dataset files (yak). Here we only produce 21-mer and 31-mer datasets, you can produce more k-mer datasets with different k-mer size.

# produce a 21-mer dataset, remove -b 37 if you want to count singletons
./yak/yak count -o k21.yak -k 21 -b 37 <(zcat sr.R*.fastq.gz) <(zcat sr.R*.fastq.gz)

# produce a 31-mer dataset, remove -b 37 if you want to count singletons
./yak/yak count -o k31.yak -k 31 -b 37 <(zcat sr.R*.fastq.gz) <(zcat sr.R*.fastq.gz)

Run NextPolish2.

./target/release/nextPolish2 -t 5 hifi.map.sort.bam asm.fa.gz k21.yak k31.yak > asm.np2.fa

# or try with -r
# ./target/release/nextPolish2 -r -t 5 hifi.map.sort.bam asm.fa.gz k21.yak k31.yak > asm.np2.fa

Optional: If your genome is assembled via trio binning. You can discard reads that have different haplotype with the reference before the mapping procedure, see here for an example.

More options

Use ./target/release/nextPolish2 -h to see options.

Getting help

Help

Feel free to raise an issue at the issue page.

Note: Please ask questions on the issue page first. They are also helpful to other users.

Contact

For additional help, please send an email to huj_at_grandomics_dot_com.

Citation

Jiang Hu, Zhuo Wang, Fan Liang, Shan-Lin Liu, Kai Ye, De-Peng Wang, NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads, Genomics, Proteomics & Bioinformatics, 2024;, qzad009, https://doi.org/10.1093/gpbjnl/qzad009

License

NextPolish2 is only freely available for academic use and other non-commercial use.

Limitations

NextPolish2 can only correct the regions that are mapped by HiFi reads. For regions without HiFi reads mapping (usually cause by high error rate), you can try to adjust mapping parameters.
The performance of NextPolish2 relies heavily on the quality of short reads.
NextPolish2 can only fix some structural misassemblies.

Benchmarking

Source	Software	QV	Switch error rate (‱)
A. thaliana	Hifiasm (primary)	47.67	1.99
^(simulated data, primary contigs)^	NextPolish2	65.42	0.35
A. thaliana	Hifiasm (primary)	58.03
^(Col-XJTU, primary contigs)^	NextPolish2	64.26
H. sapiens	Hifiasm (primary)	60.25	0.15
^(HG002, primary contigs)^	NextPolish2	62.87	0.14
H. sapiens	Hifiasm (trio)	59.77	0.21
^(HG002, paternal contigs)^	NextPolish2	63.49	0.20
H. sapiens	Hifiasm (trio)	59.78	0.33
^(HG002, maternal contigs)^	NextPolish2	63.29	0.30

Star

You can track updates by tab the Star button on the upper-right corner at the github page.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
doc		doc
other		other
src		src
test		test
yak @ 076f587		yak @ 076f587
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs

License

Nextomics/NextPolish2

Folders and files

Latest commit

History

Repository files navigation

NextPolish2

Table of Contents

Installation

Installing from bioconda

Installing from source

Dependencies

Download and install

Test

General usage

More options

Getting help

Help

Contact

Citation

License

Limitations

Benchmarking

Star

About

Topics

Resources

License

Stars

Watchers

Forks

Languages