Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query to ref coord mapping #120

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

chilampoon
Copy link

targeting spliced read issue #118 (comment)


return knots


def get_ref_coords(ops, lens):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dumb version of this function is:

def get_ref_coords(ops, lens):
    ref_coords = []
    s = 1

    for op, l in zip(ops, lens):
        if op in [3]:
            s += l
        elif op in [1,4,5]:
            continue
        elif op in [0,2,7,8]:
            # mapped
            pos_list = list(range(s, s + l))
            ref_coords.extend(pos_list)
            s += l
        
    return ref_coords

src/remora/data_chunks.py Outdated Show resolved Hide resolved
for op, start, l in zip(ops, starts, lens)
if REFCOORD_OPS[op]
]
ref_coords = np.concatenate(ranges) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel like there is extra logic here. I think the fix here might be as simple as completely dropping the skip cigar ops. Do you get the same results if you switch the REF_OPS skip code value to False without any other code changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same thing I think you're right - just changing the intron operation to False in REF_OPS can work as well... got the same results for genomic reads and didn't raise errors for transcriptomic reads

@chilampoon
Copy link
Author

I put the same reference coordinate function to io.py now because it's a way to get a ref coordinate to query mapping with the exact same length as read.ref_to_signal. The get_reference_positions() from pysam doesn't include deletion bases, whereas its get_reference_sequence() includes deletions...
I tested the gapless reads without introns with updated code and got the same outputs, seems like there're many other issues for intronic reads, I'll stick to gapless read analysis for a while

@marcus1487
Copy link
Collaborator

I'm not sure I understand the issue. The get_reference_positions function is never used in Remora. Why is support for this function required in Remora?

I'll stick to gapless read analysis for a while

Are you saying that I should close this PR? Or would you like for this PR to be considered still?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants