Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windowing] Rewrite window function implementation to use real SQLite windows #169

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ohaibbq
Copy link
Contributor

@ohaibbq ohaibbq commented Feb 15, 2024

Hi @goccy @totem3, I have an exciting PR here.

I implemented the ability to register user-defined window functions in the SQLite driver at mattn/go-sqlite3#1220.

This allows us to remove the exponential complexity of formatted queries using window functions, as well as the bulk of go-zetasqlite's custom windowing. Previously, the formatter took the query currently being processed, turned it into a new SELECT * FROM (...) clause as found here:

return fmt.Sprintf(
"SELECT %s FROM (SELECT *, ROW_NUMBER() OVER() AS `row_id` %s) %s",
strings.Join(columns, ","),
formattedInput,
orderBy,
), nil

For one of our more complex queries, the formatted query goes from ~278 SELECT statements, with ~150 nested SELECT .. FROM (SELECT ...) queries, down to 78 SELECT statements with only ~50 nested SELECT queries.

This amount of nesting is still too high for SQLite to handle (#153), but the 3.46.0 release of SQLite will allow the parser to dynamically grow its stack size in order to support this.

Previously, the go-zetasqlite functions needed to implement their own sorting and partitioning of values in the windows. This caused various issues (see #139 and #101). Now SQLite will handle ordering and partitioning for us via zetasqlite_collate.

The SQLite window functions do not allow the use of DISTINCT or IGNORE NULLS, so we still implement those via the zetasqlite_window_option_distinct() and zetasqlite_window_option_ignore_nulls() options respectively.

Closes #161

@@ -63,3 +63,5 @@ require (
google.golang.org/grpc v1.54.0 // indirect
google.golang.org/protobuf v1.30.0 // indirect
)

replace github.com/mattn/go-sqlite3 => github.com/ohaibbq/go-sqlite3 v0.0.0-20240211011509-f8d4d3382d11
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be replaced with upstream once my PR makes it into the new release.

@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2024

Codecov Report

Merging #169 (bc44157) into main (961ce06) will decrease coverage by 0.25%.
The diff coverage is 64.70%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
- Coverage   46.39%   46.14%   -0.25%     
==========================================
  Files          47       47              
  Lines       17828    16884     -944     
==========================================
- Hits         8271     7791     -480     
+ Misses       8111     7684     -427     
+ Partials     1446     1409      -37     

@goccy
Copy link
Owner

goccy commented Mar 10, 2024

First, Thank you for your contribution.

I think this PR is great, but I can't merge forked go-sqlite3. I would like to reflect this after go-sqlite3 side's PR is merged.

@goccy goccy added the reviewed label Mar 10, 2024
@ohaibbq
Copy link
Contributor Author

ohaibbq commented Mar 15, 2024

I'll need to cherry-pick Recidiviz#33 into this branch before we merge.

It looks like the go-sqlite3 PR is getting close to being merged.

@ohaibbq
Copy link
Contributor Author

ohaibbq commented Apr 3, 2024

We will also need to cherry-pick Recidiviz#42 in order to fix #200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Windowing] Complex queries utilizing window functions become too slow to reasonably execute
3 participants