Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anti-select uses full scan #7449

Open
swingbit opened this issue Jan 26, 2024 · 3 comments
Open

anti-select uses full scan #7449

swingbit opened this issue Jan 26, 2024 · 3 comments

Comments

@swingbit
Copy link

swingbit commented Jan 26, 2024

Describe the bug
On a column that has a persistent hash, anti-select (and anti-join) resorts to full scan.

To Reproduce
Create a column with unique values

\t clock
set optimizer='sequential_pipe';

create table t as select cast(uuid() as string) as u from generate_series(0,10000000);
analyze sys.t; -- this creates a hash table

Test a point select :

trace select * from t where u = 'a0a3139d-3742-4ae3-a757-ed28386e4f05' limit 1;

The point select uses the hash table:

+---+
| u |
+===+
+---+
0 tuples
clk: 1.654 ms
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec | statement                                                                                                                                                                       |
+======+=================================================================================================================================================================================+
|    2 |     X_1=0@0:void := querylog.define("trace select * from t where u = 'a0a3139d-3742-4ae3-a757-ed28386e4f05' limit 1;":str, "sequential_pipe":str, 22:int);                      |
|    0 |     X_4=0:int := sql.mvc();                                                                                                                                                     |
|    7 |     C_5=[10000000]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "t":str);                                                                                                         |
|    7 |     X_8=[10000000]:bat[:str] := sql.bind(X_4=0:int, "sys":str, "t":str, "u":str, 0:int);                                                                                        |
|    5 |     (X_11=[0]:bat[:oid], X_12=[0]:bat[:str]) := sql.bind(X_4=0:int, "sys":str, "t":str, "u":str, 2:int);                                                                        |
|   20 |     C_80=[0]:bat[:oid] := algebra.thetaselect(X_8=[10000000]:bat[:str], C_5=[10000000]:bat[:oid], "a0a3139d-3742-4ae3-a757-ed28386e4f05":str, "==":str); # hashselect on parent |
|    5 |     C_81=[0]:bat[:oid] := algebra.thetaselect(X_12=[0]:bat[:str], nil:bat[:oid], "a0a3139d-3742-4ae3-a757-ed28386e4f05":str, "==":str); # select: trivially empty               |
|    0 |     C_17=[0]:bat[:oid] := sql.subdelta(C_80=[0]:bat[:oid], C_5=[10000000]:bat[:oid], X_11=[0]:bat[:oid], C_81=[0]:bat[:oid]);                                                   |
|    7 |     X_19=[0]:bat[:str] := sql.projectdelta(C_17=[0]:bat[:oid], X_8=[10000000]:bat[:str], X_11=[0]:bat[:oid], X_12=[0]:bat[:str]);                                               |
|    3 |     C_27=[0]:bat[:oid] := algebra.subslice(X_19=[0]:bat[:str], 0:lng, 0:lng);                                                                                                   |
|    5 |     X_28=[0]:bat[:str] := algebra.projection(C_27=[0]:bat[:oid], X_19=[0]:bat[:str]);                                                                                           |
|    4 |     X_30=[1]:bat[:str] := bat.pack("sys.t":str);                                                                                                                                |
|    4 |     X_31=[1]:bat[:str] := bat.pack("u":str);                                                                                                                                    |
|    4 |     X_32=[1]:bat[:str] := bat.pack("clob":str);                                                                                                                                 |
|    3 |     X_33=[1]:bat[:int] := bat.pack(0:int);                                                                                                                                      |
|   19 |     X_29=1:int := sql.resultSet(X_30=[1]:bat[:str], X_31=[1]:bat[:str], X_32=[1]:bat[:str], X_33=[1]:bat[:int], X_33=[1]:bat[:int], X_28=[0]:bat[:str]);                        |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
16 tuples
clk: 1.723 ms

Test a point anti-select:

trace select * from t where u <> 'a0a3139d-3742-4ae3-a757-ed28386e4f05' limit 1;

The point anti-select uses a full scan:

+--------------------------------------+
| u                                    |
+======================================+
| 2c383fbe-eece-4021-8a41-c0290424f7c3 |
+--------------------------------------+
1 tuple
clk: 534.385 ms
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec   | statement                                                                                                                                                                               |
+========+=========================================================================================================================================================================================+
|      2 |     X_1=0@0:void := querylog.define("trace select * from t where u <> 'a0a3139d-3742-4ae3-a757-ed28386e4f05' limit 1;":str, "sequential_pipe":str, 22:int);                             |
|      0 |     X_4=0:int := sql.mvc();                                                                                                                                                             |
|      5 |     C_5=[10000000]:bat[:oid] := sql.tid(X_4=0:int, "sys":str, "t":str);                                                                                                                 |
|      6 |     X_8=[10000000]:bat[:str] := sql.bind(X_4=0:int, "sys":str, "t":str, "u":str, 0:int);                                                                                                |
|      3 |     (X_11=[0]:bat[:oid], X_12=[0]:bat[:str]) := sql.bind(X_4=0:int, "sys":str, "t":str, "u":str, 2:int);                                                                                |
| 532823 |     C_80=[10000000]:bat[:oid] := algebra.thetaselect(X_8=[10000000]:bat[:str], C_5=[10000000]:bat[:oid], "a0a3139d-3742-4ae3-a757-ed28386e4f05":str, "!=":str); # select: fullscan anti |
|     15 |     C_81=[0]:bat[:oid] := algebra.thetaselect(X_12=[0]:bat[:str], nil:bat[:oid], "a0a3139d-3742-4ae3-a757-ed28386e4f05":str, "!=":str); # select: trivially empty                       |
|      2 |     C_17=[10000000]:bat[:oid] := sql.subdelta(C_80=[10000000]:bat[:oid], C_5=[10000000]:bat[:oid], X_11=[0]:bat[:oid], C_81=[0]:bat[:oid]);                                             |
|     10 |     X_19=[10000000]:bat[:str] := sql.projectdelta(C_17=[10000000]:bat[:oid], X_8=[10000000]:bat[:str], X_11=[0]:bat[:oid], X_12=[0]:bat[:str]);                                         |
|      3 |     C_27=[1]:bat[:oid] := algebra.subslice(X_19=[10000000]:bat[:str], 0:lng, 0:lng);                                                                                                    |
|      6 |     X_28=[1]:bat[:str] := algebra.projection(C_27=[1]:bat[:oid], X_19=[10000000]:bat[:str]);                                                                                            |
|      5 |     X_30=[1]:bat[:str] := bat.pack("sys.t":str);                                                                                                                                        |
|      4 |     X_31=[1]:bat[:str] := bat.pack("u":str);                                                                                                                                            |
|      4 |     X_32=[1]:bat[:str] := bat.pack("clob":str);                                                                                                                                         |
|      4 |     X_33=[1]:bat[:int] := bat.pack(0:int);                                                                                                                                              |
|     32 |     X_29=3:int := sql.resultSet(X_30=[1]:bat[:str], X_31=[1]:bat[:str], X_32=[1]:bat[:str], X_33=[1]:bat[:int], X_33=[1]:bat[:int], X_28=[1]:bat[:str]);                                |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
16 tuples
clk: 534.527 ms

Expected behavior
Use the existing hash.

Software versions

  • MonetDB version number v11.49.2 (hg id: fc35cd120e)
  • OS and version: Fedora 39
  • self-installed and compiled
@sjoerdmullender
Copy link
Member

Initial thought is, why should it use a hash? The result of the select itself is (almost) the complete column, so we basically have to go through it anyway.
But thinking a little more about it, you're right in that this can be done more efficiently using the hash than by doing a scan. We do have to take NULLs into account, so perhaps two hash lookups are needed.
I'll look into this.

@swingbit
Copy link
Author

swingbit commented Jan 26, 2024

My reasoning (ignoring NULLs) was: use a normal point select (with the hash) to find the value, and then you just need to return the original input except that one. It's true that you still need to go through all tuples, but without accessing the heap for all of them.

@njnes
Copy link
Contributor

njnes commented Jan 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants