Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel dispatch calls freeze when Lua resolver yields #524

Open
hishamhm opened this issue Mar 26, 2024 · 7 comments · May be fixed by #546
Open

Parallel dispatch calls freeze when Lua resolver yields #524

hishamhm opened this issue Mar 26, 2024 · 7 comments · May be fixed by #546
Labels
area/lua-bridge Area: Lua bridge bug Something isn't working

Comments

@hishamhm
Copy link
Collaborator

Problem summary: If you dispatch two parallel HTTP calls, and both call into Lua for DNS resolution, and both yield to wait for the DNS response, only the second one gets a response. The first one waits forever.

Background and some observations of the behavior

This has been observed first with Kong Gateway, but it is also reproducible in this module's test suite (see test cases in PR #523).

Using two different hostnames allows us to see in the logs that the second DNS request is the one that gets a response.

Yielding affects the behavior:

  • if you issue a request that triggers two parallel calls for the same hostname, it hangs because the first DNS request yields and never resumes. If you Ctrl-C the request and try again, the overall request succeeds, because the hostname was cached from the successful call of the previous request, and both succeed.
  • if you try to do the same with a request that triggers two parallel calls for the different hostnames, the first request gets the same behavior (it hangs because the first DNS request yields and never resumes), but the second request hangs again, because it will try the first hostname which never resolved, yields, then tries the second hostname which is cached, and succeeds, but the overall request is still waiting for the first hostname.

In short, it seems that triggering the Lua resolver a second time discards the pending DNS resolution coroutine.

Logs

Enabling "debug_mail" logs on the Kong Gateway showed some interesting bits which may provide some clues. Here's what I've observed:

First we create a new Lua thread to resolve gobolinux.org, all is normal so far:

2024/03/22 16:25:45 [debug] 90985#0: *2 lua creating new thread
2024/03/22 16:25:45 [debug] 90985#0: lua ref lua thread 00005A6A986C62C0 (ref 649)
2024/03/22 16:25:45 [debug] 90985#0: *2 code cache lookup (key='wasm_lua_resolver_chunkdd7007ea3c1caf68390921fc5ddb939c', ref=0)
2024/03/22 16:25:45 [debug] 90985#0: *2 code cache miss (key='wasm_lua_resolver_chunkdd7007ea3c1caf68390921fc5ddb939c', ref=0)
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm running lua thread (lctx: 00005A6A96928E80, L: 00005A6A95FFCD30, co: 00005A6A986C62C0)
2024/03/22 16:25:45 [debug] 90985#0: *2 lua reset ctx
2024/03/22 16:25:45 [debug] 90985#0: *2 http lua finalize threads
2024/03/22 16:25:45 [debug] 90985#0: *2 lua run thread, top:0 c:1
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:13: wasm lua resolver thread
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:20: wasm lua resolver using existing dns_client
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:42: wasm lua resolving "gobolinux.org"

This causes the DNS client to trigger the UDP message for DNS resolution, epoll fd 34. It does not get an immediate response, so it yields:

2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket network address given directly
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket resolve retval handler
2024/03/22 16:25:45 [debug] 90985#0: *2 UDP socket 34
2024/03/22 16:25:45 [debug] 90985#0: *2 connect to 127.0.0.53:53, fd:34 #646
2024/03/22 16:25:45 [debug] 90985#0: *2 epoll add event: fd:34 op:1 ev:80002001
2024/03/22 16:25:45 [debug] 90985#0: *2 add cleanup: 00005A6A964C8870
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket connect: 0
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] client.lua:737: lookup(): WILL QUERY gobolinux.org
2024/03/22 16:25:45 [debug] 90985#0: *2 send: fd:34 31 of 31
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket calling receive() method
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket read timeout: 2000
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket receive buffer size: 4096
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket read data: waiting: 0
2024/03/22 16:25:45 [debug] 90985#0: *2 recv: fd:34 -1 of 4096
2024/03/22 16:25:45 [debug] 90985#0: *2 recv() not ready (11: Resource temporarily unavailable)
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp recv returned -2
2024/03/22 16:25:45 [debug] 90985#0: *2 event timer add: 34: 2000:10471706
2024/03/22 16:25:45 [debug] 90985#0: *2 lua resume returned 1
2024/03/22 16:25:45 [debug] 90985#0: *2 lua thread yielded

(we then get this which I don't if it's at all related)

2024/03/22 16:25:45 [debug] 90985#0: posted event 00005A6A9868DF30
2024/03/22 16:25:45 [debug] 90985#0: *2 delete posted event 00005A6A9868DF30

what follows immediately is the next dispatch, which needs resolving example.com:

2024/03/22 16:25:45 [debug] 90985#0: *2 proxy_wasm http dispatch connecting...
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm tcp socket resolving: example.com
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm tcp socket using default resolver
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm tcp socket resolving...
2024/03/22 16:25:45 [debug] 90985#0: *2 lua creating new thread
2024/03/22 16:25:45 [debug] 90985#0: lua ref lua thread 00005A6A9678DC20 (ref 650)
2024/03/22 16:25:45 [debug] 90985#0: *2 code cache lookup (key='wasm_lua_resolver_chunkdd7007ea3c1caf68390921fc5ddb939c', ref=0)
2024/03/22 16:25:45 [debug] 90985#0: *2 code cache miss (key='wasm_lua_resolver_chunkdd7007ea3c1caf68390921fc5ddb939c', ref=0)
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm running lua thread (lctx: 00005A6A97EE2D90, L: 00005A6A95FFCD30, co: 00005A6A9678DC20)

but unlike the dispatch above where the three lines lua reset ctx, http lua finalize threads and lua run thread were next to each other, we get instead this:

2024/03/22 16:25:45 [debug] 90985#0: *2 lua reset ctx
2024/03/22 16:25:45 [debug] 90985#0: *2 http lua finalize threads
2024/03/22 16:25:45 [debug] 90985#0: *2 lua finalize socket
2024/03/22 16:25:45 [debug] 90985#0: *2 lua close socket connection
2024/03/22 16:25:45 [debug] 90985#0: *2 event timer del: 34: 10471706
2024/03/22 16:25:45 [debug] 90985#0: *2 reusable connection: 0
2024/03/22 16:25:45 [debug] 90985#0: *2 lua deleting light thread 00005A6A986C62C0 (ref 649)
2024/03/22 16:25:45 [debug] 90985#0: *2 lua caching unused lua thread 00005A6A986C62C0 (ref 649)
2024/03/22 16:25:45 [debug] 90985#0: *2 lua run thread, top:0 c:1

Looks like it deleted coroutine ending-62C0, which was the one from the gobolinux.org DNS resolution above!?

The query for example.com then proceeds is a similar fashion to the previous one... (even with the same fd:34)

2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:13: wasm lua resolver thread
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:20: wasm lua resolver using existing dns_client
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:42: wasm lua resolving "example.com"
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] [string "wasm_lua_resolver_chunk"]:44: has individual_toip: function: 0x5a6a962ef8f0
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket network address given directly
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket resolve retval handler
2024/03/22 16:25:45 [debug] 90985#0: *2 UDP socket 34
2024/03/22 16:25:45 [debug] 90985#0: *2 connect to 127.0.0.53:53, fd:34 #647
2024/03/22 16:25:45 [debug] 90985#0: *2 epoll add event: fd:34 op:1 ev:80002001
2024/03/22 16:25:45 [debug] 90985#0: *2 add cleanup: 00005A6A96281D30
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket connect: 0
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] client.lua:737: lookup(): WILL QUERY example.com
2024/03/22 16:25:45 [debug] 90985#0: *2 send: fd:34 29 of 29

...but then after a while we do get data coming from fd:34 and we get a query result for example.com.

2024/03/22 16:25:45 [debug] 90985#0: epoll timer: 837
2024/03/22 16:25:45 [debug] 90985#0: epoll: fd:34 ev:0001 d:00005A6A9888E191
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket handler for "/anything?", wev 0
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket read handler
2024/03/22 16:25:45 [debug] 90985#0: *2 event timer del: 34: 10471706
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket read data: waiting: 1
2024/03/22 16:25:45 [debug] 90985#0: *2 recv: fd:34 85 of 4096
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp recv returned 85
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket waking up the current request
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm rctx reused: 00005A6A964161C0 (r: 00005A6A97B8D430, main: 1)
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm wev handler "/anything?" - timeout: 0, ready: 1 (main: 1, count: 1, resp_finalized: 0, state: 2)
2024/03/22 16:25:45 [debug] 90985#0: *2 wasm resuming lua thread (lctx: 00005A6A97EE2D90, L: 00005A6A95FFCD30, co: 00005A6A9678DC20)
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp operation done, resuming lua thread
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket calling prepare retvals handler 00005A6A943F1684, u:00005A6A98CEEED0
2024/03/22 16:25:45 [debug] 90985#0: *2 lua udp socket receive return value handler
2024/03/22 16:25:45 [debug] 90985#0: *2 lua run thread, top:0 c:1
2024/03/22 16:25:45 [debug] 90985#0: *2 [lua] client.lua:739: lookup(): QUERY GOT RESULT
@hishamhm
Copy link
Collaborator Author

Interesting: running #523 on CI with Valgrind produced the following after the test cases timed out:

https://github.com/Kong/ngx_wasm_module/actions/runs/8438836263/job/23111996715

 ==46021== 556 (48 direct, 508 indirect) bytes in 1 blocks are definitely lost in loss record 146 of 173
==46021==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==46021==    by 0x1E464A: ngx_alloc (ngx_alloc.c:22)
==46021==    by 0x1ACD49: ngx_create_pool (ngx_palloc.c:21)
==46021==    by 0x3ACBA9: ngx_wasm_lua_thread_new (ngx_wasm_lua.c:128)
==46021==    by 0x3AE14A: ngx_wasm_lua_resolver_resolve (ngx_wasm_lua_resolver.c:145)
==46021==    by 0x391B02: ngx_wasm_socket_tcp_connect (ngx_wasm_socket_tcp.c:406)
==46021==    by 0x3B7A0F: ngx_http_proxy_wasm_dispatch_resume_handler (ngx_http_proxy_wasm_dispatch.c:734)
==46021==    by 0x3B8628: ngx_http_proxy_wasm_dispatch_handler (ngx_http_proxy_wasm_dispatch.c:467)
==46021==    by 0x1DEB89: ngx_event_process_posted (ngx_event_posted.c:35)
==46021==    by 0x1DE086: ngx_process_events_and_timers (ngx_event.c:273)
==46021==    by 0x1ED637: ngx_single_process_cycle (ngx_process_cycle.c:323)
==46021==    by 0x1A8038: main (nginx.c:384)
==46021== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:ngx_alloc
   fun:ngx_create_pool
   fun:ngx_wasm_lua_thread_new
   fun:ngx_wasm_lua_resolver_resolve
   fun:ngx_wasm_socket_tcp_connect
   fun:ngx_http_proxy_wasm_dispatch_resume_handler
   fun:ngx_http_proxy_wasm_dispatch_handler
   fun:ngx_event_process_posted
   fun:ngx_process_events_and_timers
   fun:ngx_single_process_cycle
   fun:main
}

...so it seems that an lctx is indeed being lost along the way 🤔

@thibaultcha thibaultcha added bug Something isn't working area/lua-bridge Area: Lua bridge labels Mar 26, 2024
@thibaultcha
Copy link
Member

thibaultcha commented Mar 27, 2024

Thanks for the test case! Long story short, yes as by the Valgrind report, it appears our Lua bridge cannot handle more than one "Lua thread" at a time (lctx).

At first I enabled some of the DDEBUG macros and ran the test as-is. I commented the error.log output here, which shows what is currently going on. "posted event" followed by "delete posted event" is not of concern, since its part of ngx_event_process_posted. The logs showed the issue was with a call to ngx_http_lua_reset_ctx in the Lua bridge.

I then commented-out the lua_reset_ctx call which prevented the waiting file descriptor for call 1 from being prematurely closed, but also showed that call 1 was never resumed, even after the Lua thread of call 2 had yielded and resumed all the way through. It then became apparent that rctx only supports a single lctx object, which is problematic since the http module resumes yielded Lua threads via its single lctx reference.

Maybe we should make lctx a single, long-lived Lua thread that schedules smaller co-routines? Which I hope we can do with the exposed OpenResty facilities. We could also make rctx store a queue of Lua threads instead of a single one, and resume all threads until its empty. This could seem to be the underlying cause of the behavior we've seen with the Gateway's DNS semaphore not being released, although a single Lua thread is supposed to be resumed all the way through, so I am not sure.

@hishamhm
Copy link
Collaborator Author

Thanks for taking a look at it!

I commented the error.log output here, which shows what is currently going on.

Good, so I think I was looking at the right thing: that # fd:12 closed (!!) is the same block as Looks like it deleted coroutine ending-62C0 in the annotation above.

We could also make rctx store a queue of Lua threads instead of a single one, and resume all threads until its empty.

Something along those lines was the first thing that came to my mind when I saw the Valgrind output. I'll give that a try!

@hishamhm
Copy link
Collaborator Author

We could also make rctx store a queue of Lua threads instead of a single one, and resume all threads until its empty.

Something along those lines was the first thing that came to my mind when I saw the Valgrind output. I'll give that a try!

I did try this approach: https://github.com/Kong/ngx_wasm_module/tree/wip/lctxs-queue

I managed to collect all lctxs and resume them, but it doesn't quite work, because the socket data did not arrive correctly.

What is happening is:

  • when our Lua chunk issues a DNS request, resty/dns/resolver.lua yields at line 871 (buf, err = sock:receive(4096))
  • when the response from the DNS server arrives, this triggers the ngx_http_wasm_wev_handler, in which I go through my queue, waking up the yielded lctxs.
  • what happens then is that sock:receive() returns 0 bytes (which then confuses resty.dns.resolver and produces a "truncated" response error)

Apparently, the socket data connection is only plugged correctly to the most-recently-created lctx? I don't fully understand how data is supposed to flow there.

Maybe we should make lctx a single, long-lived Lua thread that schedules smaller co-routines?

If there is no way to map incoming data to the correct lctx at the "wev_handler", then that would be the way to go. But then, OpenResty does have its own socket scheduler, right? (i.e. it implements the equivalents of both luasocket and copas). Not sure how we would be able to put another scheduling layer on top of that. Lua coroutine schedulers are not composable out-of-the-box...

@thibaultcha
Copy link
Member

thibaultcha commented Mar 28, 2024

At first I noticed that in ngx_http_lua_socket_udp, the response length is stored in u->received which is set to 111 after ngx_udp_recv, but later in ngx_http_lua_socket_udp_receive_retval_handler it ends up being 0 (which bubbles up to sock:receive() returning ""). The value of u->received changed because u changed in-between the recv event and the udp_receive_retval_handler:

  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp socket read handler
  2024/03/27 22:25:20 [debug] 1695661#0: *1 event timer del: 12: 155591122
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp socket read data: waiting: 1
  2024/03/27 22:25:20 [debug] 1695661#0: *1 recv: fd:12 111 of 4096
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp recv returned 111
1. ---> u is correct lua *** ngx_http_lua_socket_udp_read: u->received: 111 (u: 0x7f5a8569d9c0) at ../ngx_lua-0.10.26/src/ngx_http_lua_socket_udp.c line 1158.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp socket waking up the current request
  2024/03/27 22:25:20 [debug] 1695661#0: *1 rewrite phase: 3
  ngx_http_wasm_rewrite_handler: enter at /home/code/ngx_wasm_module/src/http/ngx_http_wasm_module.c line 672.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 wasm rctx reused: 000055CFADA90BB8 (r: 000055CFADA8FE40, main: 1)
  2024/03/27 22:25:20 [debug] 1695661#0: *1 any lua yielded? 1
  ngx_http_wasm_rewrite_handler: wasm lua forcing content wev handler at /home/code/ngx_wasm_module/src/http ngx_http_wasm_module.c line 685.
  ngx_http_wasm_wev_handler: enter at /home/code/ngx_wasm_module/src/http/ngx_http_wasm_module.c line 994.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 wasm rctx reused: 000055CFADA90BB8 (r: 000055CFADA8FE40, main: 1)
  2024/03/27 22:25:20 [debug] 1695661#0: *1 wasm wev handler "/t?" - timeout: 0, ready: 1 (main: 1, count: 1, resp_finalized: 0, state: 0)
  ngx_http_wasm_wev_handler: will resume lua thread (lctx: 0x55cfad7d9eb0) at /home/code/ngx_wasm_module/src/http/ngx_http_wasm_module.c line 1025.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 wasm resuming lua thread (lctx: 000055CFAD7D9EB0, L: 00007F5C8681A380, co: 00007F5C867E3DA8)
  ngx_http_wasm_set_resume_handler: resume_handler = ngx_http_core_run_phases (r: 0x55cfada8fe40, rctx: 0x55cfada90bb8) at /home/code/ngx_wasm_module/src/http/ngx_http_wasm_module.c line 1119.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp operation done, resuming lua thread
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp socket calling prepare retvals handler 000055CFAC3A0954, u:00007F5C867D5EF0
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua udp socket receive return value handler
  lua *** ngx_http_lua_socket_udp_receive_retval_handler: buffer: <CB>Q<81><80> at ../ngx_lua-0.10.26/src/ngx_http_lua_socket_udp.c line 1031.
2. ---> u should be 0x7f5a8569d9c0 lua *** ngx_http_lua_socket_udp_receive_retval_handler: u->received: 0 (u: 0x7f5c867d5ef0) at ../ngx_lua-0.10.26/src/ngx_http_lua_socket_udp.c line 1032.
  lua *** dumpstack: top: 1 at ../ngx_lua-0.10.26/src/ngx_http_lua_util.h line 738.
3. ---> we get "" on top of the stack with luaL_pushlstring(L, buf, 0) lua *** dumpstack: 1    string  "" at ../ngx_lua-0.10.26/src/ngx_http_lua_util.h line 747.
  2024/03/27 22:25:20 [debug] 1695661#0: *1 lua run thread, top:9 c:1
  lua *** ngx_http_lua_run_thread: ctx: 0x55cfad91ecd0, co: 0x7f5c867e9708, co status: 0, co is_wrap: 0 at ../ngx_lua-0.10.26/src/ngx_http_lua_util.c line 1150.
  lua *** ngx_http_lua_run_thread: 0x7f5c867e9708: saved co top: 0, nrets: 1, true top: 1 at ../ngx_lua-0.10.26/src/ngx_http_lua_util.c line 1162.
  lua *** ngx_http_lua_run_thread: co top elem: string at ../ngx_lua-0.10.26/src/ngx_http_lua_util.c line 1170.
  2024/03/27 22:25:20 [notice] 1695661#0: *1 [lua] resolver.lua:633: parse_response(): buf:  len: 0, client: 127.0.0.1, server: localhost, request: "GET /t HTTP/1.1", host: "localhost"
  2024/03/27 22:25:20 [debug] 1695661#0: *1 [lua] resolver.lua:797: _tcp_query(): query the TCP server due to reply truncation

Since u is stored in coctx->data and since the coctx of all our Lua threads comes from &ctx->entry_co_ctx, I believe that we are currently scheduling multiple udp_socks in the same coroutine context and at the same time, oops. As you said it is not plugged in correctly. Now I believe that we are not initializing our coroutines correctly. We should look closer at facilities that create coroutines in OpenResty like ngx_http_lua_uthread.c, ngx_http_lua_coroutine.c, and ngx_http_lua_coroutine_create_helper + ngx_http_lua_post_thread. If we instantiate our own coroutine + coroutine contexts, and attach them to ctx->cur_co_ctx (or &ctx->entry_co_ctx), we should get closer to what we are trying to do...

@hishamhm
Copy link
Collaborator Author

Now I believe that we are not initializing our coroutines correctly.

Yep, that's what I feared.

We should look closer at facilities that create coroutines in OpenResty like ngx_http_lua_uthread.c, ngx_http_lua_coroutine.c, and ngx_http_lua_coroutine_create_helper + ngx_http_lua_post_thread.

This is going to be quite the journey. I've always been uncomfortable with that part of the codebase not knowing what exactly the various ctx-this-and-that variables exactly mean; about time I build a proper understanding of those bits. I'm going to deep dive into this, since it's a blocker for the other stuff I was doing. Thanks for the pointers!

@thibaultcha
Copy link
Member

I think we will figure it out; I made good progress this morning, so far both calls resume with the proper coctx locally (and proper socket + u), just need to tie some lose ends.

thibaultcha added a commit that referenced this issue May 6, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- *Breaking:* the `proxy_wasm.start()` FFI function is **removed**. Only
  `proxy_wasm.attach()` is now necessary, and the filter chain is only
  resumed once the ngx_http_wasm_module `rewrite` or `access` phases are
  entered. Prior, `proxy_wasm.start()` would resume the filter chain
  during the ngx_http_lua_module phase handlers, which was incompatible
  with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.

Fix #524
thibaultcha added a commit that referenced this issue May 6, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.

Fix #524
thibaultcha added a commit that referenced this issue May 6, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.

Fix #524
thibaultcha added a commit that referenced this issue May 6, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.

Fix #524
thibaultcha added a commit that referenced this issue May 10, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.
- A new `rctx->resume_handler` pointer holds the resume entry point back
  from yielding facilities into `ngx_http_core_run_phases`. For now,
  only the Lua bridge uses it, but other yielding facilities should be
  refactored to use it so as to factorize our resuming code.

Fix #524
thibaultcha added a commit that referenced this issue May 15, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.
- A new `rctx->resume_handler` pointer holds the resume entry point back
  from yielding facilities into `ngx_http_core_run_phases`. For now,
  only the Lua bridge uses it, but other yielding facilities should be
  refactored to use it so as to factorize our resuming code.

Fix #524
@thibaultcha thibaultcha linked a pull request May 15, 2024 that will close this issue
thibaultcha added a commit that referenced this issue May 16, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.
- A new `rctx->resume_handler` pointer holds the resume entry point back
  from yielding facilities into `ngx_http_core_run_phases`. For now,
  only the Lua bridge uses it, but other yielding facilities should be
  refactored to use it so as to factorize our resuming code.

Fix #524
thibaultcha added a commit that referenced this issue May 16, 2024
Major refactor of the Lua bridge to support multiple concurrent yielding
Lua threads. The old implementation would break down when scheduling
more than one yielding Lua thread at a time.

The new implementation "tricks" OpenResty by scheduling uthreads via C
and passing these threads to the OpenResty runloop as if they were
created from Lua (via `ngx.thread`). Because all uthreads must resume
their "parent thread" when finished (as per OpenResty's implementation),
we schedule a stub "entry thread" whenever we are trying to use the Lua
bridge. This entry thread itself does nothing and is collected at
request pool cleanup.

List of significant changes for this refactor:

- **Breaking:** the `proxy_wasm.start()` FFI function is **removed**.
  Only `proxy_wasm.attach()` is now necessary, and the filter chain is
  only resumed once the ngx_http_wasm_module `rewrite` or `access`
  phases are entered. Prior, `proxy_wasm.start()` would resume the
  filter chain during the ngx_http_lua_module phase handlers, which was
  incompatible with Lua threads yielding.
- The `ngx.semaphore` API can be used in the Lua bridge. The default Lua
  resolver now has synchronization enabled.
- In ngx_wasm_socket_tcp, the `sock->env` member is now a pointer to the
  request's `env` instead of a copy so as to manipulate the `env->state`
  control variable.
- The `wasm_call` directive can now yield, which allows for sanity
  testing of the Lua bridge yielding functionality.
- A new `rctx->resume_handler` pointer holds the resume entry point back
  from yielding facilities into `ngx_http_core_run_phases`. For now,
  only the Lua bridge uses it, but other yielding facilities should be
  refactored to use it so as to factorize our resuming code.

Fix #524
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lua-bridge Area: Lua bridge bug Something isn't working
Projects
None yet
2 participants