Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database changes: support higher level changes (arbitrary queries) and optionally include initial data #303

Open
Prinzhorn opened this issue Oct 8, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@Prinzhorn
Copy link

Prinzhorn commented Oct 8, 2022

Feature request

Is your feature request related to a problem? Please describe.

I just today started exploring Supabase and browsed through the open feature requests here. This is somewhat related to #282, which shows a fundamental issue in how Realtime currently works or is supposed to be used. There is a disconnect between fetching the current data and then listening to changes. These are two separate processes which opens room for race conditions and "losing" data. In addition it requires manually merging the data and solving conflicts. I understand that Realtime is super new and right now it feels like it offers super low level functionality that in the future nobody should be directly exposed to by default (unless they opt to for some reason).

Describe the solution you'd like

I think this feature request can be summarized by the following part of the RethinkDB docs https://rethinkdb.com/docs/changefeeds/javascript/

I don't know how feasible this is with PostgreSQL. But an API that eliminates the aforementioned issues would:

  1. Allow waching for changes for arbitrary queries including joins (and not manually juggle with INSERT and DELETE events)
  2. Allow optionally to get the current data for the query as well in addition to future changes (includeInitial in RethinkDB)

I assume you have plans like this in the long run, maybe I just missed it and this feature request is pointless. If technically possible this should be the end goal. Developers shouldn't have to manually worry about low level change events like this.

Describe alternatives you've considered

I actually don't know with the current architecture how I could guarantee that I have the latest data and get all future changes without either:

  1. Having a window where I miss some recent updates
  2. Write de-duplication logic to avoid change events from giving me data I also fetched manually (e.g. treating INSERT and UPDATE identically). This should be solved once (on the Realtime's end) and not client side in every app.

And all of this while providing a good user experience with no long wait times.

@Prinzhorn Prinzhorn added the enhancement New feature or request label Oct 8, 2022
@Prinzhorn
Copy link
Author

Prinzhorn commented Oct 9, 2022

I wanted to elaborate some more where I coming from and because I also saw #301

In the past years I've worked with both RethinkDB (Horizon) and Meteor. While both of these system have their own shortcomings and cannot be directly compared to Supabase, they have to be compared because they solve a similar issue that Realtime tries to solve. And from a developer experience point they are miles ahead (and that was almost ten years ago). In a recent Meteor project with Svelte I can do this:

<script>
  export let filter;

  import { Contacts } from '/imports/db/Contacts.js';

  $m: contacts = Contacts.find(filter).fetch();
</script>

{#each contacts as contact (contact._id)}
  <div>{contact.name}</div>
{/each}

This will give me all Contacts that match filter now and in the future. The list is magically guaranteed to be always up to date and also updates when the filter changes.

For RethinkDB you could abstract this in the same way but even support joins over multiple tables and get the results for now and in the future.

Even if you tried to abstract the current Realtime API into something that is similarly easy to use (albeit limited to a single table) it would be impossible, because querying data and using Realtime are two separate systems and there are no guarantees. You'd also need to manually update the data from raw INSERT, UPDATE and DELETE events.

Apart from using Meteor and RethinkDB in the past I've also spent part of the last two years building a realtime changefeed on top of SQLite (for a data processing desktop application). While SQLite and Supabase cannot be compared (SQLite is offline and has a single user), that's not the point. The point is that from a developer perspective this is as powerful as RethinkDB. I can write arbitrary joins and get all the data plus all updates in the future. In short this works by setting up triggers that call user-defined functions to collect all the IDs that have changed in a set (this part is solved by using PostgreSQL's replication functionality on your end). These IDs are then send to the next system in batches (see linked issue above). Batch size can be as little as 30ms or as much as 1000ms, depending on how "live" I need this particular part of the app to be. This vastly improves performance and it also means all changes inside a single transaction are always guaranteed to land in the same batch (this is trivial for SQLite because everything is synchronous). Now I use these IDs to essentially append AND id IN (....) to the original join (they need to follow a certain pattern and always have an ID, usually of the "main" table). I can then compare the number of results I get with the number of changed IDs. If they're different I know that some rows need to be deleted (because the data was actually deleted or changed in a way that made a WHERE become false). The end result is an object with the following structure that is send over WS:

{
    changed: [],
    removedIds: []
}

Both inserts and changes are treated identical, they both get a full row in changes). It's up to the consumer to update the local copy accordingly. When the WS opens initially, it gets all matching rows in changed and just uses it as the local copy. removedIds are simply deleted from the local copy.

Why am I writing all this? Because when I first found out about Supabase and read that it has "realtime subscriptions" I got excited that it might be able to finally replace the hole that RethinkDB left while using PostgreSQL that everyone loves. But now I'm disappointed that Realtime (as of now) feels like a proof-of-concept. I'm hoping this can change in the future?

It would be insane if it could do a similar thing I'm doing with SQLite. I understand that the docs say:

The usual solution is to send an ID and then fetch the record, but that's heavy on the database

but they don't provide any benchmarks. I would argue querying data by PK is anything but heavy. It's still way better than polling for the full dataset every second or so to update your client, when 99% of it didn't even change. And it also depends entirely on the use-case. I get that Supabase's current USP/catch phrase is "Build in a weekend. Scale to millions." but you have to understand that the overwhelming amount of projects never need to scale to millions, but they still want a reliable system that is fun to work with. So maybe some version of "send an ID and then fetch the record" could be implemented and put it on the developers to make a judgement if this works for their use-case?

@Prinzhorn
Copy link
Author

Prinzhorn commented Oct 9, 2022

I'm just throwing things in here: what if "arbitrary joins" would work by using (Materialized) Views? So it doesn't actually work on arbitrary joins, but developers decide what type of data they want to work with realtime. So for realtime it would just be another "table". Another benefit would be that you can only have a subset of your data inside the view, making in even less "heavy on the database". Because I think it would be much more realistic to get this to work with single tables and not with arbitrary SQL. I mean vanilla Meteor also only works with single collections. But by using PostgreSQL and views we still get all the benefits of PostgreSQL without writing duplication logic inside the application logic (i.e. Meteor sometimes forces you to de-normalize just so that you have all data you need in a single collection).

w3b6x9 pushed a commit that referenced this issue Nov 4, 2022
@chasers
Copy link
Contributor

chasers commented Dec 14, 2022

Thanks for this @Prinzhorn.

I actually think this could be implemented pretty easily. I've already built this functionality in Logflare for a beta feature called Logflare Endpoints. You define an arbitrary SQL query and you can create an API Endpoint from your log events. Supabase uses this for the charts on the main project page, for instance.

On the server it proactively keeps those results fresh so you get ~100ms response times. It's configurable so you can set them to update on whatever interval you like. And they stop polling after some configurable time so when the results aren't accessed for, say 5 minutes, the polling process dies.

I actually think this will scale pretty well. And it can scale horizontally on Realtime as long as the database can handle the load (and you could add read replicas there if needed).

And we have "Extensions" now in Realtime so we could build it as an Extension to test out this concept on production.

@Lan-Hekary
Copy link

Hello,
Any News on this Feature request ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants