Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept of CDC on libsql-server #1388

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

Proof of concept of CDC on libsql-server #1388

wants to merge 20 commits into from

Conversation

athoscouto
Copy link
Contributor

@athoscouto athoscouto commented May 13, 2024

Uses SQLite's update hooks to allow users to listen for database changes.
Listening is pull-based, based on SSE.

This aims to offer a simple (implementation and API) and efficient (stateless, no need to query the table, nor store multiple versions of data) solution.
Users who want stricter guarantees should be able to combine this with other features to achieve their goals.
For instance, to get at least once delivery you can create a trigger that appends to an event table and listen to inserts to it.
To recover from crashes, clients can store the last processed entry somewhere (e.g. another libsql-server table).


Quirks to document and/or address:

  • This will only work for rowid tables. Which excludes any virtual tables.
  • This will not notify users of whole table deletions. From the link above:

In the current implementation, the update hook is not invoked when conflicting rows are deleted because of an ON CONFLICT REPLACE clause. Nor is the update hook invoked when rows are deleted using the truncate optimization. The exceptions defined in this paragraph might change in a future release of SQLite.

  • Updates are broadcasted when they are run, not when they are committed.

Before merging this we should discuss the ideal API.
My plan is to use server-sent events to notify listeners of any changes.
This is kind of implemented, I need to add the proper headers to make it 100% though.
Sending single-line JSON objects to notify changes seems like a good streaming strategy, it allows us to expand the API in the future and add arbitrary data.

Example of request and response:

➜  libsql git:(main) ✗ curl http://localhost:6060/beta/listen -H "Host: main.turso" -v
* Host localhost:6060 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:6060...
* Connected to localhost (::1) port 6060
> GET /beta/listen HTTP/1.1
> Host: main.turso
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< access-control-allow-origin: *
< vary: origin
< vary: access-control-request-method
< vary: access-control-request-headers
< transfer-encoding: chunked
< date: Sun, 12 May 2024 17:54:07 GMT
< 
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}
{"db":"main","table":"t3","rowid":1}

@glommer
Copy link
Contributor

glommer commented May 14, 2024

My main question is whether the restrictions make this useful enough. My intuition is that the rowid restriction is ok, but the deletion one, will make the API not that useful. @notrab very curious to hear your thoughts.

@notrab
Copy link
Member

notrab commented May 14, 2024

I really like the direction of this! Super work guys.

One thing that is a concern to me is triggering the events when ran not committed.

This would mean anyone tailing the event stream could end up with events that aren't a true representation of the database state. If someone is tailing new users created to send an email, configure some third party or whatever, they're going to have a bad time.

....Perhaps we handle that logically internally just using commits, OR at a worst case trigger another event on the rollback hook?

The rowid thing is also a valid concern. It would be useful to listen to full table operators without WHERE but it's probably not a deal breaker.

As for the API design... It looks good so far but we could potentially send the delta if the user requests it with a header or something...

@MarinPostma
Copy link
Collaborator

@notrab I don't know if that's a big issue, since we don't send any content, it's just a false positive, so one would ready the row, and realise it hasn't changed, which I don't think is a big deal

@notrab
Copy link
Member

notrab commented May 14, 2024

@MarinPostma yeah so I guess I wrote my comment based on the fact the data was sent. I think that's where this becomes really valuable to users.

If we don't, we should at least include the rowid, table name, database name, and what occured (create/update/delete)... which could still have the problem I was describing.

libsql-server/src/broadcaster.rs Outdated Show resolved Hide resolved
libsql-server/src/broadcaster.rs Outdated Show resolved Hide resolved
libsql-server/src/http/user/listen.rs Outdated Show resolved Hide resolved
libsql-server/src/http/user/listen.rs Outdated Show resolved Hide resolved
libsql-server/src/http/user/listen.rs Outdated Show resolved Hide resolved
@glommer
Copy link
Contributor

glommer commented May 17, 2024

Got curious: would a javascript user consume this via fetch ?
How do they develop against this if they are using a file API ?

@notrab
Copy link
Member

notrab commented May 23, 2024

@glommer @athoscouto if we're using server-sent-events, we could hook into the EventSource API to give developers a CLI to listen and forward events, as well as specific libraries to attach events.

I'd love to contribute the following when I return from vacation:

1. CLI

npx libsql-listen libsql://my-db-org.turso.io --token [TOKEN]

This would then forward events to a PORT. Perhaps 3000 by default since that's a popular one for JS developers. Users could override this by passing a custom --forward-to flag or something.

This is useful because it can simulate the "webhook" experience by creating a proxy between your local development environment and the event stream.

2. Node Client

Something as basic as this could work for our MVP:

import { libSQLClient } from "@libsql/events"

const libsql = new libSQLClient({
  url: 'libsql://my-db-org.turso.io'
  authToken: '...'
})

libsql.on('SOMETHING', () => {
  // ...
})

I haven't looked at the type of events yet, but we could format those events nicely in the handler.

Creating this is one (max two) days work.

Lmkwyt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants