Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime: Self Hosting - Docker Swarm mode #645

Open
2 tasks done
lwjameson opened this issue Aug 22, 2023 · 7 comments
Open
2 tasks done

Realtime: Self Hosting - Docker Swarm mode #645

lwjameson opened this issue Aug 22, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@lwjameson
Copy link

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

We have a project destined for open source project targeted at colleges and universities which our client requires be appropriate for self-hosting. Although the docker-compose version is fine, we really do not feel it is a viable production platform, and do not feel that a Kubernetes based solution is achievable for many institutions . I have been working on a Docker swarm implementation and am very close, with the last remaining issue being realtime.

The realtime service is driven by the following compose yaml file:

# Starts the client service for a Supabase install

version: "3.8"

services:
  realtime-dev:
    image: supabase/realtime:v2.10.1
    healthcheck:
      test:
        [
          "CMD",
          "bash",
          "-c",
          "printf \\0 > /dev/tcp/localhost/4000"
        ]
      timeout: 5s
      interval: 5s
      retries: 3
    ulimits:
      nofile:
        soft: 100000
        hard: 200000
    networks:
      - supabase
    environment:
      - PORT=4000
      - DB_HOST=supabase-db_db
      - DB_PORT=5432
      - DB_USER=supabase_admin
      - DB_PASSWORD=<redacted>
      - DB_NAME=postgres
      - DB_ENC_KEY=supabaserealtime
      - API_JWT_SECRET=<redacted>
      - FLY_ALLOC_ID=fly123
      - FLY_APP_NAME=realtime-dev
      - SECRET_KEY_BASE=UpNVntn3cDxHJpq99YMc1T1AQgQpc8kfYTuRgBiYa15BLrx8etQoXz3gZv1/u2oq
      - ERL_AFLAGS=-proto_dist inet_tcp
      - ENABLE_TAILSCALE=false
      - DNS_NODES=''
      - DB_AFTER_CONNECT_QUERY=SET search_path TO _realtime
    command: >
      sh -c "/app/bin/migrate && /app/bin/realtime eval 'Realtime.Release.seeds(Realtime.Repo)' && /app/bin/server"
networks:
  supabase:
    name: supabase-test
    external: true

My kong.yml file is:

<snip>
  ## Secure Realtime routes
  - name: realtime-v1
    _comment: 'Realtime: /realtime/v1/* -> ws://realtime:4000/socket/*'
    url: http://realtime-dev:4000/socket/
    routes:
      - name: realtime-v1-all
        strip_path: true
        paths:
          - /realtime/v1/
    plugins:
      - name: cors
      - name: key-auth
        config:
          hide_credentials: false
      - name: acl
        config:
          hide_groups_header: true
          allow:
            - admin
            - anon

<snip>

When our client (which works against hosted supabase) connects via wss we see the following output from the realtime logs:

20:36:05.994 [debug] QUERY OK source="tenants" db=0.4ms idle=1099.6ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:36:05.994 [debug] QUERY OK source="extensions" db=0.3ms idle=1100.2ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
20:36:36.530 [debug] QUERY OK source="tenants" db=0.5ms idle=1635.7ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:36:36.530 [debug] QUERY OK source="extensions" db=0.4ms idle=636.4ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
20:37:07.750 [debug] QUERY OK source="tenants" db=0.5ms idle=1856.1ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:37:07.751 [debug] QUERY OK source="extensions" db=0.4ms idle=1856.9ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]

The output to the kong logs 👍

10.0.0.2 - - [22/Aug/2023:20:36:57 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:01 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:07 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:13 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"

To Reproduce

requires swarm setup

Expected behavior

I expect to have my websocket upgraded, but that appears to be failing

System information

  • OS: Amazon Linux 2023
  • Docker: Docker version 20.10.23, build 7155243

Additional context

Thank you for any help you can provide here. I have been struggling with this all day.

@lwjameson lwjameson added the bug Something isn't working label Aug 22, 2023
@lwjameson
Copy link
Author

OK. I have made it a little farther but am still having issues.

the dock-compose.yml file uses:

container_name: realtime-dev.supabase-realtime

In order to address Realtime's use of sub-domain to determine tenants as discussed here.

container_name is ignored by swarm mode, and it is not possible to create a docker network name that will be usable by Realtime.

To get around this I have updated my NGINX config as follows:

   server {
       	server_name realtime-dev.<domain>.org;
        location / {
                proxy_pass http://127.0.0.1:4000;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_read_timeout 86400;
        }

and my kong.yml:

  - name: realtime-v1
    _comment: 'Realtime: /realtime/v1/* -> ws://realtime:4000/socket/*'
    url: http://realtime-dev.<domain>.org/socket/
    routes:
      - name: realtime-v1-all
        strip_path: true
        paths:
          - /realtime/v1/
    plugins:
      - name: cors
      - name: key-auth
        config:
          hide_credentials: false
      - name: acl
        config:
          hide_groups_header: true
          allow:
            - admin
            - anon

It started looking for a tenant named '127' sigh, which I added to the system via the instructions here. I will figure out this later...

I am now reaching the Realtime service and getting these logs:

18:17:14.784 [debug] QUERY OK source="tenants" db=0.5ms idle=1417.3ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:17:14.785 [debug] QUERY OK source="extensions" db=0.3ms idle=1418.1ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]
18:17:47.379 [debug] QUERY OK source="tenants" db=0.5ms idle=1012.2ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:17:47.380 [debug] QUERY OK source="extensions" db=0.6ms idle=1013.0ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]
18:18:19.384 [debug] QUERY OK source="tenants" db=0.6ms idle=16.5ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:18:19.384 [debug] QUERY OK source="extensions" db=0.3ms idle=17.4ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]

Which is giving me a 400 error from Kong:

10.0.0.2 - - [23/Aug/2023:18:19:10 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:14 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:20 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:26 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:31 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"

I am not sure what I am doing wrong in terms of the proxy pass to get a 400.

Again, any help would be greatly appreciated.

@filipecabaco
Copy link
Contributor

Be aware that we use the "sub domain" as a way to understand what is the tenant being accessed so I'm not sure if that nginx conf removes that information.

@menasheh
Copy link

@lwjameson did you ever scale past 1 node?

@lwjameson
Copy link
Author

@menasheh Yes I did, but the client changed strategies and went with Kubernettes instead. Docker Swarm was very tricky but it does work.

@menasheh
Copy link

@lwjameson How did you get the elixir nodes to connect?

@lwjameson
Copy link
Author

It has been a little bit, but I believe I got it to work by using a separate stack for realtime with the service named 'supabase-realtime'. Then when creating the stack I named it 'realtime-dev', and then updated the kong.yml file route for realtime to:

url: http://realtime-dev.supabase-realtime:4000/socket/

@ConProgramming
Copy link

@menasheh if you figured this out would love to hear, running into similar on AWS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants