Fix exception when scanning small fastq files #28

zwets · 2021-10-19T23:49:49Z

Scanning a file with fewer than 100 reads yields a StopIteration exception, due to an exhausted iterator in next(all_lines).

zwets · 2021-10-20T08:36:23Z

Actually, looking over the code for check_sequence_identifier_format(file) again, that whole block of code doesn't do what it is supposed to do, namely check the first 100 and last 100 reads for the 'new format'. Instead it returns true if and only if the last read in the file has the new format (but does so in a very roundabout way).

There is also no reason to collect the 2x100 read headers. All the code needs to do is scan headers until it encounters a "new format" header - which will most likely happen on the first read already. I'll be happy to submit a patch.

zwets · 2021-10-20T09:10:50Z

Given that checking just the final read has apparently always worked (or downstream bugs would have shown up), I'd suggest replacing the whole block by just checking the first and last read in the file. That would also eliminate the brittle tail -100 system call.

zwets · 2021-10-20T14:30:17Z

Given that checking just the final read has apparently always worked (or downstream bugs would have shown up), I'd suggest replacing the whole block by just checking the first and last read in the file.

I have just added a second commit to the PR that does just that: check the first and last read.

If you'd prefer to check more reads at the head of the file, then that should be doable with two extra lines lines of code, but remember that the old code never even checked the first read, so it's probably not worth the bother.

wbazant · 2022-04-19T09:17:53Z

I can't see clearly what your PR does, but based on issue description I might have fixed this too. This is what I thought what the fix was, when I ran into it: rdemko2332@379a7c8

wbazant · 2022-04-19T09:23:16Z

Right actually the latest commit is about this too! 23a3834

@ljmciver sorry to open a conversation here rather than on your forum but I hope it works well enough!

You may not need to kill the run if the file is too short - I was able to run kneaddata successfully on files with less than a 100 lines, with rdemko2332@379a7c8 . I think I've found and fixed this when I was doing some testing for the memory issue I once opened.

Fix exception when scanning small fastq files

b4a845a

Fix scan logic, scan first and last header

a607d47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix exception when scanning small fastq files #28

Fix exception when scanning small fastq files #28

zwets commented Oct 19, 2021

zwets commented Oct 20, 2021 •

edited

zwets commented Oct 20, 2021

zwets commented Oct 20, 2021

wbazant commented Apr 19, 2022

wbazant commented Apr 19, 2022

Fix exception when scanning small fastq files #28

Are you sure you want to change the base?

Fix exception when scanning small fastq files #28

Conversation

zwets commented Oct 19, 2021

zwets commented Oct 20, 2021 • edited

zwets commented Oct 20, 2021

zwets commented Oct 20, 2021

wbazant commented Apr 19, 2022

wbazant commented Apr 19, 2022

zwets commented Oct 20, 2021 •

edited