Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supabase functions serve runs out of memory and crashes with basic usage #212

Open
meyer9 opened this issue Oct 16, 2023 · 8 comments
Open

Comments

@meyer9
Copy link

meyer9 commented Oct 16, 2023

Describe the bug
The edge-runtime does not terminate the worker for a few minutes after starting. This causes a pretty severe memory leak since a new worker is created for each request and never terminated: https://github.com/supabase/cli/blob/a0c8644deeef5a72f99687cc897eadc3dce256f1/internal/functions/serve/templates/main.ts#L135

This makes it effectively impossible to use supabase functions serve for local development.

To Reproduce

  1. Start worker runtime
  2. Send ~1000 requests within a few mins
  3. Notice extremely high memory usage on the edge-runtime and crash.

Expected behavior
I expect the worker runtimes to be cleaned up after the request is complete if a new worker runtime is started for each request. Even nicer would be to reuse a single worker runtime and refresh it when files change, similar to deno --watch.

Screenshots
image
image

Desktop (please complete the following information):

  • OS: macOS 14.0 (13-inch, M2, 2022)
  • Version of supabase-js: beta (1.102.2)
  • Version of Node.js: v20.5.0

Additional context
Disabling forceCreate solves the crashing issue, but breaks auto-reload.

I filed an issue on the edge-runtime here since I'm not sure which should be fixed to resolve the crashing problem. #192

@bombillazo
Copy link

We've noticed this to, having to start the serve functionality periodically when the edge functions dies.

@sweatybridge sweatybridge transferred this issue from supabase/cli Nov 9, 2023
@sweatybridge
Copy link

Transferring to edge-runtime repo since it likely requires changes to the container.

@nyannyacha
Copy link
Collaborator

Hey! @meyer9

The root reason causing the leak memory is at the Deno code base I think.
I've already submitted such concerns to their repository, And they said they would rework the problematic parts over the next few weeks 😋

denoland/deno_core#386 (comment)

nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Dec 27, 2023
This policy forces the supervisor to terminate the isolation immediately if the
request is complete.

Using this policy with development will make sense because it terminates the
isolation immediately if the request is complete, so developers will not have
to restart runtime.

This commit solves cases such as supabase#192 and supabase#212
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 1, 2024
I had to use the cargo patch to fix the memory leakage problem because the root
cause of the memory leak belonged to `deno_core`.

Eventually, these changes should be tracked at `deno_core`; so until fixing this
problem upstream, we have to use the patch.

It could be the substantial solution for supabase#212 and
supabase#192 (on the assumption that I found all memory leakage
places of `JsRuntime` 😋 For reference, Valgrind no longer reported definite
memory leakage after this patch)
@jeremyisatrecharm
Copy link

Until then, are there any suggestions on how to do some sort of hacky reboot-serve every n requests without the server responses failing?

@nyannyacha
Copy link
Collaborator

Hi! @jeremyisatrecharm 😋

Yeah, the Deno team seems to be taking longer time than I expected to fix the memory leak. It may not be a priority for them.

So, I've already written some commits to fix the memory leak into my fork. However, since these changes modify the upstream directly, it may be necessary to talk with the supabase team about whether to accept this.

Just in time, @laktek is back from holiday, so I'd like to take the time to discuss this 😁

nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 8, 2024
I had to use the cargo patch to fix the memory leakage problem because the root
cause of the memory leak belonged to `deno_core`.

Eventually, these changes should be tracked at `deno_core`; so until fixing this
problem upstream, we have to use the patch.

It could be the substantial solution for supabase#212 and
supabase#192 (on the assumption that I found all memory leakage
places of `JsRuntime` 😋 For reference, Valgrind no longer reported definite
memory leakage after this patch)

(cherry picked from commit bc631b4)
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 8, 2024
This policy forces the supervisor to terminate the isolation immediately if the
request is complete.

Using this policy with development will make sense because it terminates the
isolation immediately if the request is complete, so developers will not have
to restart runtime.

This commit solves cases such as supabase#192 and supabase#212

(cherry picked from commit 0b1ddd0)
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 8, 2024
This policy forces the supervisor to terminate the isolation immediately if the
request is complete.

Using this policy with development will make sense because it terminates the
isolation immediately if the request is complete, so developers will not have
to restart runtime.

This commit solves cases such as supabase#192 and supabase#212

(cherry picked from commit 0b1ddd0)
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 10, 2024
This policy forces the supervisor to terminate the isolation immediately if the
request is complete.

Using this policy with development will make sense because it terminates the
isolation immediately if the request is complete, so developers will not have
to restart runtime.

This commit solves cases such as supabase#192 and supabase#212

(cherry picked from commit 0b1ddd0)
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 17, 2024
I had to use the cargo patch to fix the memory leakage problem because the root
cause of the memory leak belonged to `deno_core`.

Eventually, these changes should be tracked at `deno_core`; so until fixing this
problem upstream, we have to use the patch.

It could be the substantial solution for supabase#212 and
supabase#192 (on the assumption that I found all memory leakage
places of `JsRuntime` 😋 For reference, Valgrind no longer reported definite
memory leakage after this patch)

(cherry picked from commit bc631b4)

# Conflicts:
#	Cargo.lock
#	Cargo.toml
nyannyacha added a commit to nyannyacha/edge-runtime that referenced this issue Jan 20, 2024
I had to use the cargo patch to fix the memory leakage problem because the root
cause of the memory leak belonged to `deno_core`.

Eventually, these changes should be tracked at `deno_core`; so until fixing this
problem upstream, we have to use the patch.

It could be the substantial solution for supabase#212 and
supabase#192 (on the assumption that I found all memory leakage
places of `JsRuntime` 😋 For reference, Valgrind no longer reported definite
memory leakage after this patch)

(cherry picked from commit bc631b4)
@JTInfinite
Copy link

I'm running into this issue as well (at least I think it is the same issue) - I can't run any function that attempts to work any sort of embeddings. Each invocation fails with:

CPU time hard limit reached. isolate: bd382b3a-43a9-41de-87dc-2aa3ec7b8524 ReferenceError: Status is not defined at Server.<anonymous> (file:///home/deno/main/index.ts:164:13) at eventLoopTick (ext:core/01_core.js:64:7) at async #respond (https://deno.land/std@0.182.0/http/server.ts:220:18) failed to send request to user worker: request has been cancelled by supervisor user worker failed to respond: request has been cancelled by supervisor WorkerRequestCancelled: request has been cancelled by supervisor at async Promise.allSettled (index 1) at async UserWorker.fetch (ext:sb_user_workers/user_workers.js:70:21) at async Server.<anonymous> (file:///home/deno/main/index.ts:146:12) at async #respond (https://deno.land/std@0.182.0/http/server.ts:220:18) { name: "WorkerRequestCancelled" } ReferenceError: Status is not defined at Server.<anonymous> (file:///home/deno/main/index.ts:164:13) at eventLoopTick (ext:core/01_core.js:64:7) at async #respond (https://deno.land/std@0.182.0/http/server.ts:220:18)

@prvind-panday
Copy link

ese changes modify the upstream directly, it may be necessary to talk with the supabase team about whether to accept

I also face the same issue when I run my supabase edge function. It ran perfectly fine for a few seconds and then I got the below response in Postman

{ "message": "The upstream server is timing out" }

Check the below image or attached image for the postman reference
Screenshot 2024-04-03 183206

And in the console I see the below response

CPU time hard limit reached. isolate: 3cbe8cb8-d7de-4bc8-8eca-7273473cf2dc failed to send request to user worker: request has been cancelled by supervisor user worker failed to respond: request has been cancelled by supervisor WorkerRequestCancelled: request has been cancelled by supervisor at async Promise.allSettled (index 1) at async UserWorker.fetch (ext:sb_user_workers/user_workers.js:70:21) at async Server.<anonymous> (file:///home/deno/main/index.ts:146:12) at async #respond (https://deno.land/std@0.182.0/http/server.ts:220:18) { name: "WorkerRequestCancelled" } ReferenceError: Status is not defined at Server.<anonymous> (file:///home/deno/main/index.ts:164:13) at eventLoopTick (ext:core/01_core.js:64:7) at async #respond (https://deno.land/std@0.182.0/http/server.ts:220:18) serving the request with /home/deno/functions/parse_foca_geojson

Did anyone find any solution to this? Is this issue related to supabase or docker itself?

@AntonOfTheWoods
Copy link

There seem to be a few issues mentioned here but the project documentation is completely absent (except for the examples), so this ticket appears to be accumulating a lot of cruft...

If you are getting errors like :

CPU time hard limit reached...

Then make sure you are passing sufficiently large limits to your worker. See https://github.com/supabase/edge-runtime/blob/main/examples/main/index.ts#L98 for an example. The defaults appear to be set extremely low (1000ms or something) so they are easy to hit if you are doing anything serious. Have a look at all the options. Increasing these made all my issues go away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants