Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly not calculating s array in ksw_extd2_see() correctly #21

Open
mimaric opened this issue Jul 25, 2021 · 2 comments
Open

Possibly not calculating s array in ksw_extd2_see() correctly #21

mimaric opened this issue Jul 25, 2021 · 2 comments

Comments

@mimaric
Copy link

mimaric commented Jul 25, 2021

In ksw_extd2_see() if valid elements of an antidiagonal have target indices in range [st0, en0] all elements in [st=st0/16*16, en=(en0+16)/16*16-1] are going to be processed, i.e. all elements within 16-element vectors that antidiagonal spans.

When calculating s array the loop starts from st0, skipping elements in [st, st0):

for (t = st0; t <= en0; t += 16) {

With this approach elements in [st, st0) are actually processed with s-values from previous antidiagonals.
What is the reasoning behind this decision? Elements in [st, st0) are not part of the antidiagonal, but as off[r] = st, off_end[r] = en; they are considered valid elements during backtracking, whereas elements behind en whose s-values might also get calculated in this case are not.

Is this the intended behavior or should the loop on line 158 better read for (t = st; t <= en; t += 16)?

@lh3
Copy link
Owner

lh3 commented Jul 29, 2021

You are probably right. Have you tried that? One thing I haven't thought through is whether this would lead to out-of-bound access to qr[].

@mimaric
Copy link
Author

mimaric commented Sep 2, 2021

I went into more details on this.
All line numbers are based on this file versions: https://github.com/lh3/ksw2/blob/4e0a1ccba8c6ccc87e0342c9712531bde783bf90/ksw2_extd2_sse.c

On L99 memory buffer is allocated with (tlen_ * 8 + qlen_ + 1) * 16B, where tlen_ = (tlen + 15) / 16 and qlen_ = (qlen + 15) / 16, i.e. multiplying them with 16 gives the multiple of 16 large enough to fit all elements.
u, v, x, y, x2, y2, s and sf each take tlen_ * 16B. Address of u is 16B aligned (L100), meaning that up to 15B can be "wasted", leaving qlen_ * 16B + 1B to qr.

On L120 qr is filled by reversed queries, such that qr[0] = query[qlen-1], qr[1] = query[qlen-2] ... qr[qlen-1] = query[0].
On L126 qrr is set to point to qr + (qlen - 1 - r), i.e. query[0] for antidiagonal 0, query[1] for antidiagonal 1 and so on.

Query values are loaded on L161 in blocks of 16 starting from qrr + t. This means that for antidiagonal 0 query[0, -1 ... -15] are loaded. For antidiagonal 1 that is query[1, 0 ... -14]. In this part I am acutally surprised that there is no seqfault because for e.g. qlen==32 qr would only have 32 elements allocated, but for antidiagonal 0 the program would be accessing qr[32, 33 ... 47]. I assume that the reason is that the allocation on L99 is always 16B aligned so there is no shift on L100 and qr is effectively left with (qlen_ + 1) * 16B. That's why I would suggest allocating (tlen_ * 8 + qlen_ + 2) * 16B instead of (tlen_ * 8 + qlen_ + 1) * 16B on L99.
As both the current and the proposed solutions start with r == 0 and st == 0 this cannot cause any additional out-of-bound accesses on the right side of the buffer.

On the other side, the last element to be read by the current implementaion is going to be query[qlen-1], i.e. qr[0] (for sufficiently large tlen, otherwise it is query[<qlen-1], i.e. qr[>0]).
As st0 >= st in the proposed implementaion it could happen that the accessed element is qr[<0] (qrr + st = qrr + st0 - n instead of qrr + st), which does lead to out-of-bound accesses. However, beacause all these arrays are part of one big allocation this would mean reading (junk) data from sf which is allocated before qr (L102). This would obviously lead to wrong results, but those results would not affect the final result. The reason for this is that the query index of the "st" element is either the last valid query coordiante or it is n places from the last valid query index, in which case those n elements are going to have valid query values and any value behind the last valid query index does not influence the the output, including cigar string which starts at max query index and can then only get smaller.

Does this make sense? If you want we can have a more detailed chat about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants