Crashes on connection reset when handling requests asynchronously #1894

ahupowerdns · 2018-11-07T05:41:58Z

First, thank you for H2O, it is great! We use the library for 'dnsdist-doh'. This report is about 2.2.5.

As described in #142, dnsdist-doh operates in asynchronous mode. An http2 request is received and then handed over for DNS processing. The return is sent over a pipe, which is hooked to h2o_socket_read_start. If data is received from there, it contains a pointer to the original 'req'. We use that req then to send the DNS answer.

This code is in: https://github.com/ahupowerdns/pdns/blob/dnsdist-doh/pdns/dnsdistdist/doh.cc

The problem now is that sometimes by the time 'on_dnsdist' receives data from the pipe, the HTTP2 connection has been reset already. As noted by Valgrind:

==11799== Invalid write of size 4
==11799==    at 0x6490BF: finalostream_send (stream.c:341)
==11799==    by 0x639760: h2o_send_inline (request.c:513)
==11799==    by 0x61EF70: on_dnsdist(st_h2o_socket_t*, char const*) (doh.cc:538)
==11799==    by 0x62F267: read_on_ready (evloop.c.h:235)

And this is because that data was free'd here:

==11799==    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11799==    by 0x643DFA: handle_rst_stream_frame (connection.c:797)
==11799==    by 0x641A9C: expect_default (connection.c:837)
==11799==    by 0x643AD0: parse_input (connection.c:873)
==11799==    by 0x643AD0: on_read (connection.c:901)
==11799==    by 0x62F267: read_on_ready (evloop.c.h:235)

After it originally was allocated here:

==11799==  Block was alloc'd at
==11799==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11799==    by 0x64920D: h2o_mem_alloc (memory.h:328)
==11799==    by 0x64920D: h2o_http2_stream_open (stream.c:37)
==11799==    by 0x644AFC: handle_headers_frame (connection.c:596)
==11799==    by 0x641A9C: expect_default (connection.c:837)

My interpretation of this is that H2O closed my connection while I was still handling the request.

My question is, how do I deal with this situation? Can I put a callback somewhere? Or can I convince H2O to keep my request alive as long as I still "own" a request?

Any help would be very welcome!

The text was updated successfully, but these errors were encountered:

ahupowerdns · 2018-11-07T10:13:20Z

I suspect my problem is that I use h2o_send_inline & that i should be using a generator that gets stop() called on it if the connection goes away, and that I should register the generator before returning control to the h2o event loop. Please let me know if I am thinking in the right direction!

kazuho · 2018-11-07T11:47:48Z

i should be using a generator that gets stop() called on it if the connection goes away, and that I should register the generator before returning control to the h2o event loop.

Yes! Yes! Yes! Thank you for searching for the right answer yourself!

ahupowerdns · 2018-11-07T13:36:57Z

I think this fixed it! This is the uptime of doh.powerdns.org.
Is there h2o library documentation where I could contribute a description of the restrictions on using the _inline functions when working asynchronously?

kazuho · 2018-11-07T18:34:11Z

Fantastic! Good to know that dnsdist-doh is running well!

Unfortunately we do not have a good point for documentation, though I think it would be a good idea to add the warning to the doc-comment for h2o_send_inline in include/h2o.h.

volyrique · 2018-11-07T20:29:59Z

It might also be worth mentioning that h2o_send_inline() creates a copy of its input as a performance consideration. It also means that the function does not take ownership of the parameter, so the latter can be deallocated immediately after returning to the caller.

…pecifically relevant for asynchronous operations as described in h2o#142. This closes h2o#1894.

volyrique · 2018-11-09T00:47:54Z

I am a little bit confused by the documentation added in #1896 because the latter seems to contradict what has been said in #142. I am using the libh2o event loop to manage database connections, so my workflow looks like this:

A HTTP request is received by the server.
The h2o_handler_t::on_req callback does not do anything except starting an asynchronous database query (using h2o_socket_read_start() and h2o_socket_notify_write() if necessary) and returning 0 - in particular, it doesn't call h2o_start_response() or h2o_send_inline() (or something similar).
Control goes back to the libh2o event loop.
The database query result arrives and is handled by a callback. The latter calls either h2o_send_inline() or h2o_start_response() followed by a series of h2o_send() invocations.

According to the comments in #142 that is correct because at point 3 there isn't enough information to generate the HTTP response headers (for instance, the response body size depends on the database query result, which hasn't been received yet), so it is too early to register the generator.

I have looked through the reverse proxy implementation, and it seems to be doing a more or less similar thing - as far as I can tell, the first time it uses h2o_start_response() is in the on_head() function, which, if I have understood correctly, is called after the origin server returns the first portion of the response to the proxy (which corresponds to point 4 in my workflow). On the other hand, according to my reading of #1896 I need to set the h2o_generator_t::stop callback and to call h2o_start_response() at point 2 before returning 0, so that I avoid a potential access to deallocated memory. So, what is the right approach, or am I missing something?

kazuho · 2018-11-09T02:44:33Z

@volyrique Thank you for raising the question. I now realize that I have caused confusion.

Let me explain this a different way.

h2o_req_t is discarded when the client closes the underlying connection or abruptly discards a HTTP/2 stream. Asynchronous applications need to detect that and avoid using h2o_req_t afterwards.

There are two ways to achieve that.

One is to register the generator to h2o_req_t to receive the stop callback of the generator being called. The downside as you correctly point out is that the generator is registered by calling h2o_start_response: a function that can be called only after the status code and the response headers become ready.

The other way is to allocate a memory with a dispose callback from the memory pool of the request¹. The proxy handler uses that technique (see lib/core/proxy.c line 553 of version 2.2.5); it's on_generator_dispose is called when the h2o_req_t is discarded.

@ahupowerdns I hope this comment better clarifies how H2O works. I hope that your changes is not based on the incorrect assumption that the stop callback will be invoked even before you call h2o_start_response to bind the generator to the response. And finally, thank you for submitting #1896; I will make tweaks to the PR to reflect what I have written here and will merge it.

[1] The function used for allocating memory with a dispose callback is called h2o_mem_alloc_shared, because it is one of the capabilities of a "shared" (i.e. ref-counted) memory allocation.

ahupowerdns · 2018-11-09T22:24:23Z

dnsdist-doh now uses h2o_mem_alloc_shared which is 1) easier and 2) actually better since it allows us to store exactly which request got terminated. Using the 'stop' method, you do get a signal which 'req' needs to stop, but h2o turns out to quickly reuse that same req ptr in practice, so you need to be very careful you cancel the right outstanding work. Using h2o_mem_alloc_shared you can add a pointer to your own state. Thanks!

ahupowerdns added a commit to ahupowerdns/h2o that referenced this issue Nov 8, 2018

document limitations on h2o_send_inline and functions that call it, s…

7081125

…pecifically relevant for asynchronous operations as described in h2o#142. This closes h2o#1894.

ahupowerdns mentioned this issue Nov 8, 2018

document limitations on h2o_send_inline and functions that call it #1896

Closed

kazuho closed this as completed in 82b8b68 Nov 19, 2018

volyrique mentioned this issue Nov 29, 2018

H2O: Prevent usage of discarded HTTP request structures TechEmpower/FrameworkBenchmarks#4226

Merged

volyrique mentioned this issue Mar 3, 2019

Store request details for later & return handle to event loop #1988

Closed

iamthebot mentioned this issue Feb 22, 2020

libh2o async calls inside handlers. Defer response? #2265

Closed

charlescao92 mentioned this issue Mar 14, 2022

use h2o as httpserver, handle request but client close suddenly #2965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes on connection reset when handling requests asynchronously #1894

Crashes on connection reset when handling requests asynchronously #1894

ahupowerdns commented Nov 7, 2018 •

edited

ahupowerdns commented Nov 7, 2018

kazuho commented Nov 7, 2018

ahupowerdns commented Nov 7, 2018

kazuho commented Nov 7, 2018

volyrique commented Nov 7, 2018

volyrique commented Nov 9, 2018

kazuho commented Nov 9, 2018 •

edited

ahupowerdns commented Nov 9, 2018

Crashes on connection reset when handling requests asynchronously #1894

Crashes on connection reset when handling requests asynchronously #1894

Comments

ahupowerdns commented Nov 7, 2018 • edited

ahupowerdns commented Nov 7, 2018

kazuho commented Nov 7, 2018

ahupowerdns commented Nov 7, 2018

kazuho commented Nov 7, 2018

volyrique commented Nov 7, 2018

volyrique commented Nov 9, 2018

kazuho commented Nov 9, 2018 • edited

ahupowerdns commented Nov 9, 2018

ahupowerdns commented Nov 7, 2018 •

edited

kazuho commented Nov 9, 2018 •

edited