This directory holds the source code for a compiler translating SQL view definitions into DBSP circuits. The SQL compiler is based on the Apache Calcite compiler infrastructure https://calcite.apache.org/
The code has been tested on Windows 10/11 and Linux.
You need Java 19 or later to build the compiler, and the maven (mvn) Java build program. Maven will take care of installing all required Java dependencies.
The code generated by the compiler is Rust. To run it you need a working installation of Rust (we recommend using rustup to install: https://www.rust-lang.org/tools/install). The compiler uses the current version of the DBSP library sources from github (https://github.com/feldera/feldera).
If you want to generate images of the query plans you need to also install graphviz as described here https://graphviz.org/download/.
To build the compiler run:
mvn -DskipTests package
If you get Rust compilation errors you should try to make sure you have the latest version of the Rust libraries and toolchain:
$ rustup update
$ cd temp
$ cargo update
(temp
is the directory where the tests write the generated Rust code.)
The DBSP runtime is optimized for performing incremental view maintenance. In consequence, DBSP programs in SQL are expressed as VIEWS, or standing queries. A view is essentially a computation that describes a relation that is computed from other tables or views.
For example, the following query defines a view:
CREATE VIEW V AS SELECT T.COL1 FROM T WHERE T.age > 18
In order to interpret this query the compiler needs to have been given a definition of table (or view) T. The table T should be defined using a SQL Data Definition Language (DDL) statement, e.g.:
CREATE TABLE T
(
name VARCHAR,
age INT,
present BOOLEAN
)
The compiler must be given the table definition first, and then the view definition. The compiler generates a Rust library which implements the query as a function: given the input data, it produces the output data.
The compiler can (optionally) generate a library which will
incrementally maintain the view V
when presented with changes to
table T
:
table changes
V
tables -----> SQL-to-DBSP compiler ------> DBSP circuit
views V
view changes
See the documentation in docs/contributors/compiler.md
in this repo,
or online.
A presentation about the internals of the compiler implementation.
Compilation proceeds in several stages:
- SQL statements are parsed using the calcite SQL parser
generating an IR representation using the Calcite
SqlNode
data types - the SQL IR tree is validated, optimized, and converted to the Calcite
RelNode
representation - The result of this stage is a
CalciteProgram
data structure, which packages together all the views that are being compiled (multiple views can be maintained simultaneously) CalciteToDBSPCompiler
converts aCalciteProgram
data structure into aDBSPCircuit
data structure.- The
CircuitOptimizer
class can optimize the generated circuit, optionally converting it into an incremental circuit, which is expected to compute only on changes. - The
circuit
can be serialized as Rust using theToRustString
visitor.
Unit tests are written using JUnit and test pointwise parts of the
compiler. They can be executed using mvn test
. They also run as a
side effect of mvn package
(omitting -DskipTests
).
The unit tests are implemented as individual Java files under
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/
.
Each unit test uses the SQL to DBSP compiler to generate Rust code under
sql-to-dbsp-compiler/temp
and compiles and runs that code.
If one of the unit tests fails, you may re-populate the Rust code for
it into temp
by telling mvn
to run only that particular test. If
an error message indicates that
org.dbsp.sqlCompiler.compiler.postgres.PostgresTimestampTests
failed, for example, you may rerun just that test, first with mvn test
and then again with cargo test
, via:
$ mvn test -Dtest=PostgresTimestampTests
$ cd ../temp
$ cargo test
One of the means of testing the compiler is using sqllogictests: https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki.
The sqllogictests includes more than 5 million tests. It takes weeks to run all of them. Most of the time is spent compiling Rust. We hope to speed that up at some point.
To start running these tests:
$ ./run-tests.sh
We have implemented a general-purpose testing framework in Java for running SqlLogicTest programs. The framework parses SqlLogicTest files and creates an internal representation of these files. The files are executed by "test executors".
The tests are run by a standalone executable (the executable is
invoked by the run-tests.sh
script). The executable supports the
following command-line arguments:
slt [options] files_or_directories_with_tests
Executes the SQL Logic Tests using a SQL execution engine
See https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
Options:
-h Show this help message and exit
-x Stop at the first encountered query error
-n Do not execute, just parse the test files
-e executor Executor to use
-b filename Load a list of buggy commands to skip from this file
-v Increase verbosity (can be repeated)
-inc Incremental validation
-u username Postgres user name
-p password Postgres password
Registered executors:
hybrid
dbsp
hsql
psql
none
We have multiple executors. Some executors are inherited from the hydromatic project.
This executor is part of the base hydromatic package; it does not really run any tests. But it can still be used by the test loading mechanism to check that we correctly parse all SQL logic test files.
The model of SQLLogicTest has to be adapted for testing using DBSP.
Since DBSP is not a database, but a streaming system, some SQL
statements are ignored (e.g., CREATE INDEX
) and some other
cannot be supported (e.g., CREATE UNIQUE INDEX
).
- The DDL statements to create tables are compiled into definitions of circuit inputs.
- The DML INSERT statements are converted into input-generating functions. In the absence of a database we cannot really execute statements that are supposed to fail, so we ignore such statements.
- Some DML statements like DELETE based on a WHERE clause cannot be compiled at all
- The queries are converted into DDL VIEW create statements, which are compiled into circuits.
- The validation rules in SQLLogicTest are compiled into Rust functions that compare the outputs of queries.
So a SqlLogicTest script is turned into multiple DBSP tests, each of which creates a circuit, feeds it one input, reads the output, and validates it, executing exactly one transaction. When testing incremental circuits the test code feeds multiple inputs and only checks the final output.
This executor is a combination of the DBSP executor and the JDBC executor, using a real database to store data in tables, but using DBSP as a query engine. It should be able to execute all SqlLogicTest queries that are supported by the underlying database.
The 'inc' column shows tests for incremental circuits, the other columns show tests for non-incremental (standard) implementations.
The matrix represents the current test results; the numbers indicate the passing/failing tests in each category. The failing test cases are detailed below.
Test group | DBSP | JDBC_DBSP | JDBC_DBSP inc |
---|---|---|---|
random/select | 1,120,329/0 | 1,120,329/0 | 1,120,329/0 |
random/groupby | 118,757/0 | 118,757/0 | 118,757/0 |
random/expr | 1,317,682/0 | 1,198,926/0 | 1,198,926/0 |
random/aggregates | 1,172,825/2 | 1,172,825/0 | 1,172,825/0 |
select1 | 1,000/0 | 1,000/0 | 1,000/0 |
select2 | 1,000/0 | 1,000/0 | 1,000/0 |
select3 | 3,320/0 | 3,320/0 | 3,320/0 |
select4 | 2,832/0 | 2,832/0 | 2,832/0 |
select5 | 732/0 | 732/0 | 732/0 |
index/delete | N/A | 40,525/0 | 40,525/0 |
index/in | N/A | 130,065/0 | 130,065/0 |
index/commute | N/A | 507,514/0 | 507,514/0 |
index/between | N/A | 121,811/0 | 121,811/0 |
index/orderby_nosort | N/A | 490,986/0 | 490,986/0 |
index/view | N/A | 53,490/0 | 53,490/0 |
index/random | N/A | 188,449/0 | 188,449/0 |
index/orderby | N/A | 310,630/0 | 310,630/0 |
evidence | N/A | 153/25 |
The "index" tests cannot be executed with the DBSPExecutor
since it
does not support the "unique index" SQL statement.
We are manually converting Postgres tests https://github.com/postgres/postgres/blob/master/src/test/regress/expected to run on DBSP, since the Postgres SQL syntax is too different from the Calcite syntax (especially in the functions supported). In some cases the semantics of Postgres SQL is different from Calcite; for example, in Postgres the days of the week are numbered from 0, where 0 is Sunday, but in Calcite the days of the week are numbered from 1, where 1 is Sunday. In such cases we have to change the tests.