Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finder on PacBio data #79

Open
WietseHR opened this issue Jul 13, 2023 · 3 comments
Open

Finder on PacBio data #79

WietseHR opened this issue Jul 13, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@WietseHR
Copy link

Hello,
I am currently trying to run finder on three whole genome samples:

  1. Sequenced with Illumina HiSeq x ten
  2. Sequenced with Illumina Novaseq 6000
  3. Sequenced with PacBio SMRT

Samples 1 and 2 are doing fine at the moment but sample 3 generates the following error with star:

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR12124361.1
GCGTCGGATAAGCCTGTCATAAGTCATAAATTACACAATACACATCAGCCATTTTGGAAGACCCGATGATTGGTTTGTTTGACCATACCATCTTCATCGCGGAAGATCTCCATCATCGCATGTCCCAACCAAAATTCCGATCCTCCGGCAACCTCGTGTAGCCCCCTCTTGGAATAAAACCTAGTTACAGGAGAAGCGGCCGGCATGGTCCATTTCCGATCAAAGCTCACCGCTCTCACATGGACGGGAATATCGCAGTGTTCCGGTTTGCCTGTATATAGCTTCTGTTATGTAGCGGTAACTGTGAGGGAAATGTCGCATGACGATATAACGAAAGCTTACCTTGCCTTACGCGAAGGGGTAGTGTGCGAGACTGTGAAGGTAGGCTGACGTGGACTACGCCAAGTAGCCATCGATAGCGACAGCCCATGTATATAGGTATAAACTAAGCCATATTACTATATCCAATCTCGCGTTGAACATCTTGGTGAGCGAAATGAGTCTTCCGCCGTACATAATGGGATGTCAGCGAGAGTCATCTGTGCGAGAGCACAGGGTAAAATCTCCAAGCCAAATAGGAATACATTTTGTTACAGGGATCAGACGTCGTCCTTCACTTCGGGGGGACAAAACCAGTCCTGTGAGGCAAA
SOLUTION: fix your fastq file

Jul 12 09:43:06 ...... FATAL ERROR, exiting
Segmentation fault (core dumped)

If I check this read ID in the FASTQ file I see that the quality string length and the sequence length are both the same length: 1979
I think it has something to do with the long reads from PacBio sequencing (the error sequence is just a small part of the original sequence).
My question is if there's a workaround for Finder to work with Long read data?
Thanks in advance!

@sagnikbanerjee15 sagnikbanerjee15 self-assigned this Jul 16, 2023
@sagnikbanerjee15 sagnikbanerjee15 added the enhancement New feature or request label Jul 16, 2023
@sagnikbanerjee15
Copy link
Owner

Hello @WietseHR,

Thank you for your patience. I appreciate your interest in finder. The current version of finder will not be able to handle long reads. We currently use STAR to perform alignment, designed to work only with short reads. We are designing a new version of finder which will be able to work with long reads.

Thank you,
Sagnik

@RacheliHadjez
Copy link

Hi,
I also wanted to use finder for PacBio data, is that still the case? It won't work for long reads?

Thank you,
Rachel

@Maxim-Karpov
Copy link

Maxim-Karpov commented May 7, 2024

@RacheliHadjez @WietseHR

It is possible to tweak the code to run STARlong which is a modified version of STAR designed for aligning long reads, however, its performance in this use case is not the best out of the available open source aligners you can find, as per: https://academic.oup.com/bioinformatics/article/34/5/748/4562330. Nonetheless, it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants