Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jagged kmer coverage profiles with gzipped FASTA #46

Open
warrenlr opened this issue Dec 17, 2020 · 3 comments
Open

Jagged kmer coverage profiles with gzipped FASTA #46

warrenlr opened this issue Dec 17, 2020 · 3 comments

Comments

@warrenlr
Copy link

warrenlr commented Dec 17, 2020

We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.

NA12878 ONT FASTA
HG12878_FASTAlog10

NA12878 ONT FASTA GZIPPED
HG12878_GZFASTA_log10

====

NA19240 ONT FASTA
NA19240log10FASTAuncompressed

NA19240 ONT FASTA GZIPPED
NA19240log10FASTAcompressed

*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data

@hmohamadi
Copy link
Collaborator

Might be due to streaming in compressed multiline/single-line fasta records. Can you give this a try with ntCard v1.1.1?

@warrenlr
Copy link
Author

yes, "Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2"

@hmohamadi
Copy link
Collaborator

thanks. will investigate this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants