Finder on PacBio data #79

WietseHR · 2023-07-13T09:24:14Z

Hello,
I am currently trying to run finder on three whole genome samples:

Sequenced with Illumina HiSeq x ten
Sequenced with Illumina Novaseq 6000
Sequenced with PacBio SMRT

Samples 1 and 2 are doing fine at the moment but sample 3 generates the following error with star:

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR12124361.1
GCGTCGGATAAGCCTGTCATAAGTCATAAATTACACAATACACATCAGCCATTTTGGAAGACCCGATGATTGGTTTGTTTGACCATACCATCTTCATCGCGGAAGATCTCCATCATCGCATGTCCCAACCAAAATTCCGATCCTCCGGCAACCTCGTGTAGCCCCCTCTTGGAATAAAACCTAGTTACAGGAGAAGCGGCCGGCATGGTCCATTTCCGATCAAAGCTCACCGCTCTCACATGGACGGGAATATCGCAGTGTTCCGGTTTGCCTGTATATAGCTTCTGTTATGTAGCGGTAACTGTGAGGGAAATGTCGCATGACGATATAACGAAAGCTTACCTTGCCTTACGCGAAGGGGTAGTGTGCGAGACTGTGAAGGTAGGCTGACGTGGACTACGCCAAGTAGCCATCGATAGCGACAGCCCATGTATATAGGTATAAACTAAGCCATATTACTATATCCAATCTCGCGTTGAACATCTTGGTGAGCGAAATGAGTCTTCCGCCGTACATAATGGGATGTCAGCGAGAGTCATCTGTGCGAGAGCACAGGGTAAAATCTCCAAGCCAAATAGGAATACATTTTGTTACAGGGATCAGACGTCGTCCTTCACTTCGGGGGGACAAAACCAGTCCTGTGAGGCAAA
SOLUTION: fix your fastq file

Jul 12 09:43:06 ...... FATAL ERROR, exiting
Segmentation fault (core dumped)

If I check this read ID in the FASTQ file I see that the quality string length and the sequence length are both the same length: 1979
I think it has something to do with the long reads from PacBio sequencing (the error sequence is just a small part of the original sequence).
My question is if there's a workaround for Finder to work with Long read data?
Thanks in advance!

The text was updated successfully, but these errors were encountered:

sagnikbanerjee15 · 2023-07-16T23:47:13Z

Hello @WietseHR,

Thank you for your patience. I appreciate your interest in finder. The current version of finder will not be able to handle long reads. We currently use STAR to perform alignment, designed to work only with short reads. We are designing a new version of finder which will be able to work with long reads.

Thank you,
Sagnik

RacheliHadjez · 2023-09-26T08:08:32Z

Hi,
I also wanted to use finder for PacBio data, is that still the case? It won't work for long reads?

Thank you,
Rachel

Maxim-Karpov · 2024-05-07T09:48:51Z

@RacheliHadjez @WietseHR

It is possible to tweak the code to run STARlong which is a modified version of STAR designed for aligning long reads, however, its performance in this use case is not the best out of the available open source aligners you can find, as per: https://academic.oup.com/bioinformatics/article/34/5/748/4562330. Nonetheless, it should work.

sagnikbanerjee15 self-assigned this Jul 16, 2023

sagnikbanerjee15 added the enhancement New feature or request label Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finder on PacBio data #79

Finder on PacBio data #79

WietseHR commented Jul 13, 2023

sagnikbanerjee15 commented Jul 16, 2023

RacheliHadjez commented Sep 26, 2023

Maxim-Karpov commented May 7, 2024 •

edited

Finder on PacBio data #79

Finder on PacBio data #79

Comments

WietseHR commented Jul 13, 2023

sagnikbanerjee15 commented Jul 16, 2023

RacheliHadjez commented Sep 26, 2023

Maxim-Karpov commented May 7, 2024 • edited

Maxim-Karpov commented May 7, 2024 •

edited