Possibly not calculating s array in ksw_extd2_see() correctly #21

mimaric · 2021-07-25T21:57:02Z

In ksw_extd2_see() if valid elements of an antidiagonal have target indices in range [st0, en0] all elements in [st=st0/16*16, en=(en0+16)/16*16-1] are going to be processed, i.e. all elements within 16-element vectors that antidiagonal spans.

When calculating s array the loop starts from st0, skipping elements in [st, st0):

ksw2/ksw2_extd2_sse.c

Line 158 in 4e0a1cc

for (t = st0; t <= en0; t += 16) {

With this approach elements in [st, st0) are actually processed with s-values from previous antidiagonals.
What is the reasoning behind this decision? Elements in [st, st0) are not part of the antidiagonal, but as off[r] = st, off_end[r] = en; they are considered valid elements during backtracking, whereas elements behind en whose s-values might also get calculated in this case are not.

Is this the intended behavior or should the loop on line 158 better read for (t = st; t <= en; t += 16)?

The text was updated successfully, but these errors were encountered:

lh3 · 2021-07-29T23:19:39Z

You are probably right. Have you tried that? One thing I haven't thought through is whether this would lead to out-of-bound access to qr[].

mimaric · 2021-09-02T09:00:42Z

I went into more details on this.
All line numbers are based on this file versions: https://github.com/lh3/ksw2/blob/4e0a1ccba8c6ccc87e0342c9712531bde783bf90/ksw2_extd2_sse.c

On L99 memory buffer is allocated with (tlen_ * 8 + qlen_ + 1) * 16B, where tlen_ = (tlen + 15) / 16 and qlen_ = (qlen + 15) / 16, i.e. multiplying them with 16 gives the multiple of 16 large enough to fit all elements.
u, v, x, y, x2, y2, s and sf each take tlen_ * 16B. Address of u is 16B aligned (L100), meaning that up to 15B can be "wasted", leaving qlen_ * 16B + 1B to qr.

On L120 qr is filled by reversed queries, such that qr[0] = query[qlen-1], qr[1] = query[qlen-2] ... qr[qlen-1] = query[0].
On L126 qrr is set to point to qr + (qlen - 1 - r), i.e. query[0] for antidiagonal 0, query[1] for antidiagonal 1 and so on.

Query values are loaded on L161 in blocks of 16 starting from qrr + t. This means that for antidiagonal 0 query[0, -1 ... -15] are loaded. For antidiagonal 1 that is query[1, 0 ... -14]. In this part I am acutally surprised that there is no seqfault because for e.g. qlen==32 qr would only have 32 elements allocated, but for antidiagonal 0 the program would be accessing qr[32, 33 ... 47]. I assume that the reason is that the allocation on L99 is always 16B aligned so there is no shift on L100 and qr is effectively left with (qlen_ + 1) * 16B. That's why I would suggest allocating (tlen_ * 8 + qlen_ + 2) * 16B instead of (tlen_ * 8 + qlen_ + 1) * 16B on L99.
As both the current and the proposed solutions start with r == 0 and st == 0 this cannot cause any additional out-of-bound accesses on the right side of the buffer.

On the other side, the last element to be read by the current implementaion is going to be query[qlen-1], i.e. qr[0] (for sufficiently large tlen, otherwise it is query[<qlen-1], i.e. qr[>0]).
As st0 >= st in the proposed implementaion it could happen that the accessed element is qr[<0] (qrr + st = qrr + st0 - n instead of qrr + st), which does lead to out-of-bound accesses. However, beacause all these arrays are part of one big allocation this would mean reading (junk) data from sf which is allocated before qr (L102). This would obviously lead to wrong results, but those results would not affect the final result. The reason for this is that the query index of the "st" element is either the last valid query coordiante or it is n places from the last valid query index, in which case those n elements are going to have valid query values and any value behind the last valid query index does not influence the the output, including cigar string which starts at max query index and can then only get smaller.

Does this make sense? If you want we can have a more detailed chat about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly not calculating s array in ksw_extd2_see() correctly #21

Possibly not calculating s array in ksw_extd2_see() correctly #21

mimaric commented Jul 25, 2021

lh3 commented Jul 29, 2021

mimaric commented Sep 2, 2021

Possibly not calculating s array in ksw_extd2_see() correctly #21

Possibly not calculating s array in ksw_extd2_see() correctly #21

Comments

mimaric commented Jul 25, 2021

lh3 commented Jul 29, 2021

mimaric commented Sep 2, 2021