Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_io_reads() fails with Move table discordant with signal on some reads #158

Closed
drivenbyentropy opened this issue Jan 22, 2024 · 7 comments

Comments

@drivenbyentropy
Copy link

Hi @marcus1487,

I am running into an issue that I was hoping you could provide some insight into:

I have a dataset from a P2 sequencer that was basecalled and aligned using dorado v0.5.1. When working with these data using remora v3.1.0, a small portion of reads (about 1 in 5000 reads) fail when calling get_io_reads with the following error

import pysam
import pod5
from remora import io

# prepare the sequence data
print("Loading Pod5 and BAM files")
can_pod5_fh = pod5.Reader("read.pod5")
can_bam_fh = pysam.AlignmentFile("read.bam")

bam_read = [ read for read in can_bam_fh.fetch() if io.read_is_primary(read)][0]
io_read = io.get_io_reads([bam_read], can_pod5_fh, reverse_signal=False)[0]
RemoraError                               Traceback (most recent call last)
File [...]/remora/io.py:756, in get_io_reads(bam_reads, pod5_dr, reverse_signal, missing_ok)
    755 try:
--> 756     io_read = Read.from_pod5_and_alignment(
    757         pod5_read_record=pod5_reads[get_parent_id(bam_read)],
    758         alignment_record=bam_read,
    759         reverse_signal=reverse_signal,
    760     )
    761 except Exception:

File [...]/remora/io.py:1992, in Read.from_pod5_and_alignment(cls, pod5_read_record, alignment_record, reverse_signal)
   1986 read = Read(
   1987     read_id=str(pod5_read_record.read_id),
   1988     dacs=dacs,
   1989     shift_dacs_to_pa=pod5_read_record.calibration.offset,
   1990     scale_dacs_to_pa=pod5_read_record.calibration.scale,
   1991 )
-> 1992 read.add_alignment(alignment_record, reverse_signal=reverse_signal)
   1993 return read

File [...]/remora/io.py:1912, in Read.add_alignment(self, alignment_record, parse_ref_align, reverse_signal)
   1911 try:
-> 1912     self.query_to_signal, self.mv_table, self.stride = parse_move_tag(
   1913         tags["mv"],
   1914         sig_len=self.dacs.size,
   1915         seq_len=len(self.seq),
   1916         reverse_signal=reverse_signal,
   1917     )
   1918 except KeyError:

File [...]/remora/io.py:401, in parse_move_tag(mv_tag, sig_len, seq_len, check, reverse_signal)
    397     LOGGER.debug(
    398         f"Move table (len: {mv_table.size}) discordant with "
    399         f"signal (sig len // stride: {sig_len // stride})"
    400     )
--> 401     raise RemoraError("Move table discordant with signal")
    402 return query_to_signal, mv_table, stride

RemoraError: Move table discordant with signal

During handling of the above exception, another exception occurred:

RemoraError                               Traceback (most recent call last)
Cell In[22], line 11
      8 can_bam_fh = pysam.AlignmentFile("read.bam")
     10 bam_read = [ read for read in can_bam_fh.fetch() if io.read_is_primary(read)][0]
---> 11 io_read = io.get_io_reads([bam_read], can_pod5_fh, reverse_signal=False)[0]

File [...]/remora/io.py:765, in get_io_reads(bam_reads, pod5_dr, reverse_signal, missing_ok)
    763             continue
    764         else:
--> 765             raise RemoraError("BAM record not found in POD5")
    766     io_reads.append(io_read)
    767 return io_reads

RemoraError: BAM record not found in POD5

I have attached the corresponding read to replicate the issue. I am curious as to what could cause these cases and if this is anything that can be fixed from a programmatic standpoint or whether these reads should simply be filtered out.

In the the only other issue raising this problem you hinted this could be related to read splitting. In this instance however, the read does not stem from a split read:

bam_read.get_tag("pi")
KeyError: "tag 'pi' not present"

Any insight into this matter would be greatly appreciated.

Thank you in advance!

@drivenbyentropy drivenbyentropy changed the title get_io_reads fails with Move table discordant with signal on some reads get_io_reads() fails with Move table discordant with signal on some reads Jan 22, 2024
@MSajek
Copy link

MSajek commented Feb 14, 2024

Hi. I also played a lot with move tables. Check if you don't have duplicated read IDs in your pod5 files that you use for basecalling. or basecalled bam They may cause this kind of issues according to my experience

@marcus1487
Copy link
Collaborator

I can reproduce the bug with the BAM file you sent, but I'm not able to replicate the issue with dorado 0.5.3 and without reference mapping. It is possible that the issues is due to a bug in dorado concerning adapter trimming (related to the splitting bug). Could you upgrade to dorado 0.5.3 and let me know if this resolves your issue?

@marcus1487 marcus1487 reopened this Mar 7, 2024
@jorisbalc
Copy link

jorisbalc commented Apr 25, 2024

Hi marcus, I'm getting a similar issue with the newest dorado version (0.6.1) while trying to run remora analyze plot ref_region as written in the readme:

remora analyze plot ref_region --pod5-and-bam can_pod5/barcode02.pod5 can_pod5/calls_fast.bam --pod5-and-bam mod_pod5/barcode02.pod5 mod_pod5/calls_fast.bam --ref-regions analyze_region_signal.bed --refine-kmer-level-table /home/v313/kmer_models/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt --refine-rough-rescale --log-filename log.txt

Getting a similar output:

Traceback (most recent call last):
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 756, in get_io_reads
    io_read = Read.from_pod5_and_alignment(
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1992, in from_pod5_and_alignment
    read.add_alignment(alignment_record, reverse_signal=reverse_signal)
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1912, in add_alignment
    self.query_to_signal, self.mv_table, self.stride = parse_move_tag(
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 401, in parse_move_tag
    raise RemoraError("Move table discordant with signal")
remora.RemoraError: Move table discordant with signal

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/v313/.local/bin/remora", line 8, in <module>
    sys.exit(run())
  File "/home/v313/.local/lib/python3.8/site-packages/remora/main.py", line 71, in run
    cmd_func(args)
  File "/home/v313/.local/lib/python3.8/site-packages/remora/parsers.py", line 1874, in run_plot_ref_region
    io.plot_signal_at_ref_region(
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1521, in plot_signal_at_ref_region
    samples_read_ref_regs, reg_bam_reads = get_reads_reference_regions(
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 787, in get_reads_reference_regions
    io_reads = get_io_reads(sample_bam_reads, pod5_dr, reverse_signal)
  File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 765, in get_io_reads
    raise RemoraError("BAM record not found in POD5")
remora.RemoraError: BAM record not found in POD5

I'm unsure about the first error. However I did check the bams and pod5s and it does seem like there are some read ids that are not present in the pod5 and vice versa, unsure how this happens. Trying this with unmapped reads gives me the same outcome. It might be a read splitting issue, however I'm not able to resolve it.

@marcus1487
Copy link
Collaborator

This should be resolved with the v3.2 release yesterday. Please reopen this issue if you have further issues.

@drivenbyentropy
Copy link
Author

@marcus1487 Could you please link to the changes in the code for this issue and if you have the time, elaborate on what caused this and how it was solved? I have been trying to debug multiple instances of this issue over time and I am curious what changes were made to the logic.

Thanks!

@marcus1487
Copy link
Collaborator

The changes are buried in this rather large merge request. Specifically note the changes in io.py regarding the ts, sp, and ns tags.

db7ff94

@drivenbyentropy
Copy link
Author

I can confirm that this issue is solved with v3.2. On a dataset of 60000 reads mapping onto a plasmid, prior versions failed on 50000 of them. Now, all are processed correctly. Thank you very much for this fix @marcus1487 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants