Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes on connection reset when handling requests asynchronously #1894

Closed
ahupowerdns opened this issue Nov 7, 2018 · 8 comments
Closed

Comments

@ahupowerdns
Copy link
Contributor

ahupowerdns commented Nov 7, 2018

First, thank you for H2O, it is great! We use the library for 'dnsdist-doh'. This report is about 2.2.5.

As described in #142, dnsdist-doh operates in asynchronous mode. An http2 request is received and then handed over for DNS processing. The return is sent over a pipe, which is hooked to h2o_socket_read_start. If data is received from there, it contains a pointer to the original 'req'. We use that req then to send the DNS answer.

This code is in: https://github.com/ahupowerdns/pdns/blob/dnsdist-doh/pdns/dnsdistdist/doh.cc

The problem now is that sometimes by the time 'on_dnsdist' receives data from the pipe, the HTTP2 connection has been reset already. As noted by Valgrind:

==11799== Invalid write of size 4
==11799==    at 0x6490BF: finalostream_send (stream.c:341)
==11799==    by 0x639760: h2o_send_inline (request.c:513)
==11799==    by 0x61EF70: on_dnsdist(st_h2o_socket_t*, char const*) (doh.cc:538)
==11799==    by 0x62F267: read_on_ready (evloop.c.h:235)

And this is because that data was free'd here:

==11799==    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11799==    by 0x643DFA: handle_rst_stream_frame (connection.c:797)
==11799==    by 0x641A9C: expect_default (connection.c:837)
==11799==    by 0x643AD0: parse_input (connection.c:873)
==11799==    by 0x643AD0: on_read (connection.c:901)
==11799==    by 0x62F267: read_on_ready (evloop.c.h:235)

After it originally was allocated here:

==11799==  Block was alloc'd at
==11799==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11799==    by 0x64920D: h2o_mem_alloc (memory.h:328)
==11799==    by 0x64920D: h2o_http2_stream_open (stream.c:37)
==11799==    by 0x644AFC: handle_headers_frame (connection.c:596)
==11799==    by 0x641A9C: expect_default (connection.c:837)

My interpretation of this is that H2O closed my connection while I was still handling the request.

My question is, how do I deal with this situation? Can I put a callback somewhere? Or can I convince H2O to keep my request alive as long as I still "own" a request?

Any help would be very welcome!

@ahupowerdns
Copy link
Contributor Author

I suspect my problem is that I use h2o_send_inline & that i should be using a generator that gets stop() called on it if the connection goes away, and that I should register the generator before returning control to the h2o event loop. Please let me know if I am thinking in the right direction!

@kazuho
Copy link
Member

kazuho commented Nov 7, 2018

i should be using a generator that gets stop() called on it if the connection goes away, and that I should register the generator before returning control to the h2o event loop.

Yes! Yes! Yes! Thank you for searching for the right answer yourself!

@ahupowerdns
Copy link
Contributor Author

I think this fixed it! This is the uptime of doh.powerdns.org.
Is there h2o library documentation where I could contribute a description of the restrictions on using the _inline functions when working asynchronously?

image

@kazuho
Copy link
Member

kazuho commented Nov 7, 2018

Fantastic! Good to know that dnsdist-doh is running well!

Unfortunately we do not have a good point for documentation, though I think it would be a good idea to add the warning to the doc-comment for h2o_send_inline in include/h2o.h.

@volyrique
Copy link
Contributor

It might also be worth mentioning that h2o_send_inline() creates a copy of its input as a performance consideration. It also means that the function does not take ownership of the parameter, so the latter can be deallocated immediately after returning to the caller.

ahupowerdns added a commit to ahupowerdns/h2o that referenced this issue Nov 8, 2018
…pecifically relevant for asynchronous operations as described in h2o#142.

This closes h2o#1894.
@volyrique
Copy link
Contributor

I am a little bit confused by the documentation added in #1896 because the latter seems to contradict what has been said in #142. I am using the libh2o event loop to manage database connections, so my workflow looks like this:

  1. A HTTP request is received by the server.
  2. The h2o_handler_t::on_req callback does not do anything except starting an asynchronous database query (using h2o_socket_read_start() and h2o_socket_notify_write() if necessary) and returning 0 - in particular, it doesn't call h2o_start_response() or h2o_send_inline() (or something similar).
  3. Control goes back to the libh2o event loop.
  4. The database query result arrives and is handled by a callback. The latter calls either h2o_send_inline() or h2o_start_response() followed by a series of h2o_send() invocations.

According to the comments in #142 that is correct because at point 3 there isn't enough information to generate the HTTP response headers (for instance, the response body size depends on the database query result, which hasn't been received yet), so it is too early to register the generator.

I have looked through the reverse proxy implementation, and it seems to be doing a more or less similar thing - as far as I can tell, the first time it uses h2o_start_response() is in the on_head() function, which, if I have understood correctly, is called after the origin server returns the first portion of the response to the proxy (which corresponds to point 4 in my workflow). On the other hand, according to my reading of #1896 I need to set the h2o_generator_t::stop callback and to call h2o_start_response() at point 2 before returning 0, so that I avoid a potential access to deallocated memory. So, what is the right approach, or am I missing something?

@kazuho
Copy link
Member

kazuho commented Nov 9, 2018

@volyrique Thank you for raising the question. I now realize that I have caused confusion.

Let me explain this a different way.

h2o_req_t is discarded when the client closes the underlying connection or abruptly discards a HTTP/2 stream. Asynchronous applications need to detect that and avoid using h2o_req_t afterwards.

There are two ways to achieve that.

One is to register the generator to h2o_req_t to receive the stop callback of the generator being called. The downside as you correctly point out is that the generator is registered by calling h2o_start_response: a function that can be called only after the status code and the response headers become ready.

The other way is to allocate a memory with a dispose callback from the memory pool of the request1. The proxy handler uses that technique (see lib/core/proxy.c line 553 of version 2.2.5); it's on_generator_dispose is called when the h2o_req_t is discarded.

@ahupowerdns I hope this comment better clarifies how H2O works. I hope that your changes is not based on the incorrect assumption that the stop callback will be invoked even before you call h2o_start_response to bind the generator to the response. And finally, thank you for submitting #1896; I will make tweaks to the PR to reflect what I have written here and will merge it.

[1] The function used for allocating memory with a dispose callback is called h2o_mem_alloc_shared, because it is one of the capabilities of a "shared" (i.e. ref-counted) memory allocation.

@ahupowerdns
Copy link
Contributor Author

dnsdist-doh now uses h2o_mem_alloc_shared which is 1) easier and 2) actually better since it allows us to store exactly which request got terminated. Using the 'stop' method, you do get a signal which 'req' needs to stop, but h2o turns out to quickly reuse that same req ptr in practice, so you need to be very careful you cancel the right outstanding work. Using h2o_mem_alloc_shared you can add a pointer to your own state. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants