Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Optimizer with negation incorrect output #4395

Open
keithchew opened this issue Jan 27, 2024 · 14 comments
Open

[BUG] Optimizer with negation incorrect output #4395

keithchew opened this issue Jan 27, 2024 · 14 comments
Assignees

Comments

@keithchew
Copy link
Contributor

Testing on v2.8.11

Not sure why I did not catch this in my earlier tests, but here it is...

Test case:

hset namespace:test1 address test1 totalSold 1000000
hset namespace:test2 address test2 totalSold 1000000

Queries:

ft.profile testidx search query "(-@address:{test1}) @totalSold:[1000000 1000000]" SORTBY totalSold asc WITHCOUNT LIMIT 0 5 RETURN 2 totalSold address

ft.profile testidx search query "(-@address:{test1}) @totalSold:[1000000 1000000]" SORTBY totalSold asc WITHOUTCOUNT LIMIT 0 5 RETURN 2 totalSold address

Non optimized query above is correct:

1) 1) "1"
   2) "namespace:test2"
   3) 1) "totalSold"
      2) "1000000"
      3) "address"
      4) "test2"
2) 1) 1) "Total profile time"
      2) "0"
   2) 1) "Parsing time"
      2) "0"
   3) 1) "Pipeline creation time"
      2) "0"
   4) 1) "Warning"
   5) 1) "Iterators profile"
      2) 1) "Type"
         2) "INTERSECT"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "1"
         7) "Child iterators"
         8) 1) "Type"
            2) "NUMERIC"
            3) "Term"
            4) "86 - 1e+06"
            5) "Time"
            6) "0"
            7) "Counter"
            8) "4"
            9) "Size"
            10) "167"
         9) 1) "Type"
            2) "NOT"
            3) "Time"
            4) "0"
            5) "Counter"
            6) "4"
            7) "Child iterator"
            8) 1) "Type"
               2) "TAG"
               3) "Term"
               4) "test1"
               5) "Time"
               6) "0"
               7) "Counter"
               8) "3"
               9) "Size"
               10) "5"
   6) 1) "Result processors profile"
      2) 1) "Type"
         2) "Index"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "1"
      3) 1) "Type"
         2) "Sorter"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "1"
      4) 1) "Type"
         2) "Loader"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "1"

The optimized version is incorrect:


1) 1) "2"
   2) "namespace:test1"
   3) 1) "totalSold"
      2) "1000000"
      3) "address"
      4) "test1"
   4) "namespace:test2"
   5) 1) "totalSold"
      2) "1000000"
      3) "address"
      4) "test2"
2) 1) 1) "Total profile time"
      2) "0"
   2) 1) "Parsing time"
      2) "0"
   3) 1) "Pipeline creation time"
      2) "0"
   4) 1) "Warning"
   5) 1) "Iterators profile"
      2) 1) "Type"
         2) "OPTIMIZER"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "2"
         7) "Optimizer mode"
         8) "Query partial range"
         9) "Child iterator"
         10) 1) "Type"
            2) "NOT"
            3) "Time"
            4) "0"
            5) "Counter"
            6) "8"
            7) "Child iterator"
            8) 1) "Type"
               2) "TAG"
               3) "Term"
               4) "test1"
               5) "Time"
               6) "0"
               7) "Counter"
               8) "4"
               9) "Size"
               10) "5"
   6) 1) "Result processors profile"
      2) 1) "Type"
         2) "Index"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "2"
      3) 1) "Type"
         2) "Sorter"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "2"
      4) 1) "Type"
         2) "Loader"
         3) "Time"
         4) "0"
         5) "Counter"
         6) "2"

I still need to dig a bit deeper to see why we are getting the error above, but looks like the optimizer does not like negations.

@keithchew
Copy link
Contributor Author

Hmm, my apologies, I actually have 170K more entries in the dataset which has totalSold below 100000. I will prepare a more concrete test case, please hold.

@keithchew
Copy link
Contributor Author

keithchew commented Jan 27, 2024

OK, I have a better test case below:

In bash, create 200,000 records:

for i in {1..200000}
do
redis-cli hset namespace:test$i address test$i totalSold $i
done

Create index:

FT.CREATE testidx ON hash PREFIX 1 'namespace:' SCHEMA address TAG totalSold numeric SORTABLE

Then, execute the queries:

ft.profile testidx search query "(-@address:{test200000}) @totalSold:[200000 200000]" SORTBY totalSold asc WITHCOUNT LIMIT 0 5 RETURN 2 totalSold address
ft.profile testidx search query "(-@address:{test200000}) @totalSold:[200000 200000]" SORTBY totalSold asc WITHOUTCOUNT LIMIT 0 5 RETURN 2 totalSold address

First one will return 0 items (since address is negated), but the second optimized query will return the record.

@keithchew
Copy link
Contributor Author

If it helps, the query:

ft.profile testidx search query "(-@address:{test200000}) @totalSold:[199999 200000]" SORTBY totalSold asc WITHOUTCOUNT LIMIT 0 5 RETURN 2 totalSold address

returns record 199999 and excluded 200000 as expected. Perhaps this bug is only triggered when the optimizer has to SkipTo the a record which is part of the negation, in which case it accepts the record incorrectly as a result.

@keithchew
Copy link
Contributor Author

Hmm, not all the time. If I drop and re-create the index, [199999 200000] still includes record 200000, so it is dependent on where the document sits in the numeric index. Will keep testing and post here if I find anything else useful to fix the bug.

@keithchew
Copy link
Contributor Author

keithchew commented Jan 27, 2024

OK, I have dug deeper into the code, and NOT iterator's NI_SkipTo() function returns INDEXREAD_NOTFOUND if it is not a match. But the loop in OPT_Read() does not actually check the results, and treats it as good if the docid's match, ie:

...
      if (childRes->docId == numericRes->docId) {
...

As a quick test, I updated the logic to be:

...
      if (childRes->docId == numericRes->docId && rc1 == INDEXREAD_OK) {
...

and this fixes the issue. Will keep testing to see this breaks other test cases which passed perviously.

@keithchew
Copy link
Contributor Author

keithchew commented Jan 27, 2024

The fix above is not quite right. NOT iterators return INDEXREAD_NOTFOUND when the next record does not match, but for other iterators, it points to the next valid record when INDEXREAD_NOTFOUND is returned.

To cater for this, the fix should be:

      bool readOk = child->type == NOT_ITERATOR ? rc1 == INDEXREAD_OK : true;
      if (childRes->docId == numericRes->docId && readOk) {

I am not sure what happens if the iterator is an intersect or union and the NOT iterator is further down. Will need to keep testing.

@keithchew
Copy link
Contributor Author

Tested with NOT clause within a UNION (below), and it breaks the fix above... So, not too sure what is a general robust way to handle NOTs in the query:

ft.profile testidx search query "((-@address:{test200000}) | @address:{test199999}) @totalSold:[199999 200000]" SORTBY totalSold asc WITHOUTCOUNT LIMIT 0 5 RETURN 2 totalSold address

Will keep playing around with it, but if anyone has some pointers to this bug, please let me know.

@keithchew
Copy link
Contributor Author

keithchew commented Jan 28, 2024

I have abandoned the above fix, and found a simpler one, but with greater risks as it is further down at the iterator level. In index.c, we can see the NotInterator's skip to function NI_SkipTo():

  // If the child docId is the one we are looking for, it's an anti match!
  if (childId == docId) {
    nc->base.current->docId = docId;
    nc->lastDocId = docId;
    *hit = nc->base.current;
    return INDEXREAD_NOTFOUND;
  }

Since the the Optimizer ignores INDEXREAD_NOTFOUND and INDEXREAD_OK and only reads the docId, I updated the above to:

  // If the child docId is the one we are looking for, it's an anti match!
  if (childId == docId) {
    nc->base.current->docId = docId + 1; <---- Added + 1 here
    nc->lastDocId = docId + 1; <---- Added + 1 here
    *hit = nc->base.current;
    return INDEXREAD_NOTFOUND;
  }

Surprisingly, this has passed all the Optimizer test cases that failed before! I would like to ask if there is test suite in RediSearch which I can run to see the effect of the change above? And if so, how do I run it?

@meiravgri
Copy link
Collaborator

Hi @keithchew
Thank you for bringing this to our attention! I have confirmed that there is indeed a bug.
We are aware of it and are looking into potential solutions.
As a temporary workaround, I recommend using the non-optimized version.

@keithchew
Copy link
Contributor Author

keithchew commented Jan 28, 2024

Hi @meiravgri

Thank you for your confirmation of the bug. I have a few updates. The docId + 1 fix above broke some test cases in redisearch's "make pytest", so that fix is not good, I thought it was too deep in anyway.

I am currently testing the original fix above, but applying it in a few more places. In OPT_Read(), UI_SkipToHigh(), UI_SkipTo() and II_SkipTo(), I am checking if the type is NOT and only setting the minId on INDEXREAD_OK, ie:

      bool readOk = it->type == NOT_ITERATOR ? rc == INDEXREAD_OK : true;
      if (res && readOk) {
        it->minId = res->docId;
      }

So far, it seems to have passed all the test cases, but I am not comfortable with it as it touches quite a number of critical places in the code. Please advise here when a PR is ready, so I can help test the official fix and remove my changes.

Thanks again.

@keithchew
Copy link
Contributor Author

keithchew commented Feb 5, 2024

Hi @meiravgri

Just wanted to note here another test case (can probably be classified as another bug since it is further down at the index level). In index.c for NI_SkipTo(), it says:

  // If the child is ahead of the skipto id, it means the child doesn't have this id.
  // So we are okay!
  if (childId > docId || !IITER_HAS_NEXT(nc->child)) {
    goto ok;
  }

This logic might apply to other iterators, but for the NOT iterator, because it can either be INDEXREAD_OK or INDEXREAD_NOTFOUND if the childId is > dockId (you can verify this by by reading the code further down the snipper above). Here is a case where the Optimizer's OPT_Read() is requesting these SkipTos to the NOT iterator:

10:M 04 Feb 2024 23:33:03.842 # <module> rc1 [2] SkipTo [2169]
10:M 04 Feb 2024 23:33:03.842 # <module> rc1 [2] SkipTo [2175]
10:M 04 Feb 2024 23:33:03.842 # <module> rc1 [1] SkipTo [2169]

You can see above the first call's rc value is INDEXREAD_NOTFOUND but the third call with the same skip to docId returns INDEXREAD_OK, which is incorrect.

@keithchew
Copy link
Contributor Author

keithchew commented Feb 6, 2024

Hi @meiravgri

Do you have any ideas/suggestions on how to solve this. I had another good look at the code, and this appears to be non-trivial to fix. Without the fix, it is too dangerous to use, eg we can have a query like "-@type:{critical}" to purge non critical records, but the optimizer can return critical records that will be accidentally wiped out! And without using the optimizer, there can be a huge performance hit (up to 10x to 100x) depending on the dataset.

Since I have had a good look into the code, let me know if you need me to try out some ideas/suggestions, or even contribute to the fix.

@meiravgri
Copy link
Collaborator

@keithchew fixing this bug is on our roadmap.
If you have specific suggestions or ideas, please feel free to open a PR with your proposed fix.
Thanks!

Copy link

github-actions bot commented Apr 8, 2024

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants