Proof of concept of CDC on libsql-server #1388

athoscouto · 2024-05-13T14:31:25Z

Uses SQLite's update hooks to allow users to listen for database changes.
Listening is pull-based, based on SSE.

This aims to offer a simple (implementation and API) and efficient (stateless, no need to query the table, nor store multiple versions of data) solution.
Users who want stricter guarantees should be able to combine this with other features to achieve their goals.
For instance, to get at least once delivery you can create a trigger that appends to an event table and listen to inserts to it.
To recover from crashes, clients can store the last processed entry somewhere (e.g. another libsql-server table).

Quirks to document and/or address:

This will only work for rowid tables. Which excludes any virtual tables.
This will not notify users of whole table deletions. From the link above:

In the current implementation, the update hook is not invoked when conflicting rows are deleted because of an ON CONFLICT REPLACE clause. Nor is the update hook invoked when rows are deleted using the truncate optimization. The exceptions defined in this paragraph might change in a future release of SQLite.

Updates are broadcasted when they are run, not when they are committed.

Before merging this we should discuss the ideal API.
My plan is to use server-sent events to notify listeners of any changes.
This is kind of implemented, I need to add the proper headers to make it 100% though.
Sending single-line JSON objects to notify changes seems like a good streaming strategy, it allows us to expand the API in the future and add arbitrary data.

Example of request and response:

➜  libsql git:(main) ✗ curl http://localhost:6060/beta/listen -H "Host: main.turso" -v
* Host localhost:6060 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6060...
* Connected to localhost (::1) port 6060
> GET /beta/listen HTTP/1.1
> Host: main.turso
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< access-control-allow-origin: *
< vary: origin
< vary: access-control-request-method
< vary: access-control-request-headers
< transfer-encoding: chunked
< date: Sun, 12 May 2024 17:54:07 GMT
< 
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}

…ate hooks

glommer · 2024-05-14T12:50:23Z

My main question is whether the restrictions make this useful enough. My intuition is that the rowid restriction is ok, but the deletion one, will make the API not that useful. @notrab very curious to hear your thoughts.

notrab · 2024-05-14T13:49:48Z

I really like the direction of this! Super work guys.

One thing that is a concern to me is triggering the events when ran not committed.

This would mean anyone tailing the event stream could end up with events that aren't a true representation of the database state. If someone is tailing new users created to send an email, configure some third party or whatever, they're going to have a bad time.

....Perhaps we handle that logically internally just using commits, OR at a worst case trigger another event on the rollback hook?

The rowid thing is also a valid concern. It would be useful to listen to full table operators without WHERE but it's probably not a deal breaker.

As for the API design... It looks good so far but we could potentially send the delta if the user requests it with a header or something...

MarinPostma · 2024-05-14T13:56:49Z

@notrab I don't know if that's a big issue, since we don't send any content, it's just a false positive, so one would ready the row, and realise it hasn't changed, which I don't think is a big deal

notrab · 2024-05-14T13:58:19Z

@MarinPostma yeah so I guess I wrote my comment based on the fact the data was sent. I think that's where this becomes really valuable to users.

If we don't, we should at least include the rowid, table name, database name, and what occured (create/update/delete)... which could still have the problem I was describing.

libsql-server/src/broadcaster.rs

libsql-server/src/http/user/listen.rs

glommer · 2024-05-17T14:27:40Z

Got curious: would a javascript user consume this via fetch ?
How do they develop against this if they are using a file API ?

…nt namespace

notrab · 2024-05-23T11:15:56Z

@glommer @athoscouto if we're using server-sent-events, we could hook into the EventSource API to give developers a CLI to listen and forward events, as well as specific libraries to attach events.

I'd love to contribute the following when I return from vacation:

1. CLI

npx libsql-listen libsql://my-db-org.turso.io --token [TOKEN]

This would then forward events to a PORT. Perhaps 3000 by default since that's a popular one for JS developers. Users could override this by passing a custom --forward-to flag or something.

This is useful because it can simulate the "webhook" experience by creating a proxy between your local development environment and the event stream.

2. Node Client

Something as basic as this could work for our MVP:

import { libSQLClient } from "@libsql/events"

const libsql = new libSQLClient({
  url: 'libsql://my-db-org.turso.io'
  authToken: '...'
})

libsql.on('SOMETHING', () => {
  // ...
})

I haven't looked at the type of events yet, but we could format those events nicely in the handler.

Creating this is one (max two) days work.

Lmkwyt

athoscouto added 2 commits May 13, 2024 11:17

Add UpdateBroadcaster to connection make and register update hook

6642d57

Add user /beta/listen endpoint that allow clients to subscribe to upd…

ca3e82c

…ate hooks

athoscouto requested review from haaawk, LucioFranco and MarinPostma May 13, 2024 14:31

cargo fmt

8667580

athoscouto added 6 commits May 14, 2024 11:35

Also notify action type

432fa4c

Add per table notification

bdf9c81

Allow clients to listen to a specific action type

2dc99ef

Handle listener errors

403d6d4

Replace hack to skip sending action when user specified it

6c03075

Stop panicking when we're unable to write listen response

4c5f57c

MarinPostma reviewed May 15, 2024

View reviewed changes

athoscouto and others added 6 commits May 15, 2024 16:01

Refactor to use BroadcastStream

f46f51b

Drop broadcaster sender when last listener is dropped

f95f7a3

namespaces broadcaster

21d09ae

Use global UpdateBroadcaster and add handle to access it from namespaces

1cbc872

Implement notifications for commit and rollbacks

c18d155

Allow listening multiple hook types - including commit and rollback

069bcda

MarinPostma and others added 4 commits May 17, 2024 16:57

action query params

3ca37be

Optimize cases where there are no listeners globally and in the curre…

a17243d

…nt namespace

consume full broadcast channel

6d04317

Remove unnecessary optimizations

1a18330

Aggregate changes before streaming them to clients

1c86aed

abdelhameedhamdy mentioned this pull request May 28, 2024

Libsql support OP-Engineering/op-sqlite#95

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept of CDC on libsql-server #1388

Proof of concept of CDC on libsql-server #1388

athoscouto commented May 13, 2024 •

edited

glommer commented May 14, 2024

notrab commented May 14, 2024

MarinPostma commented May 14, 2024

notrab commented May 14, 2024

glommer commented May 17, 2024

notrab commented May 23, 2024

Proof of concept of CDC on libsql-server #1388

Are you sure you want to change the base?

Proof of concept of CDC on libsql-server #1388

Conversation

athoscouto commented May 13, 2024 • edited

glommer commented May 14, 2024

notrab commented May 14, 2024

MarinPostma commented May 14, 2024

notrab commented May 14, 2024

glommer commented May 17, 2024

notrab commented May 23, 2024

1. CLI

2. Node Client

athoscouto commented May 13, 2024 •

edited