Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled KeyError with HTTP/2 #197

Closed
merlinz01 opened this issue Feb 26, 2024 · 11 comments
Closed

Unhandled KeyError with HTTP/2 #197

merlinz01 opened this issue Feb 26, 2024 · 11 comments

Comments

@merlinz01
Copy link
Contributor

I am using hypercorn 0.16.0 and quart 0.19.4 on Debian 11 and getting this error periodically:

Task exception was never retrieved
future: <Task finished name='Task-459' coro=<worker_serve.<locals>._server_callback() done, defined at /home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/run.py:103> exception=ExceptionGroup('unhandled errors in a TaskGroup', [KeyError(3)])>
  + Exception Group Traceback (most recent call last):
  |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/run.py", line 109, in _server_callback
  |     await TCPServer(app, loop, config, context, reader, writer)
  |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/tcp_server.py", line 56, in run
  |     async with TaskGroup(self.loop) as task_group:
  |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/task_group.py", line 74, in __aexit__
  |     await self._task_group.__aexit__(exc_type, exc_value, tb)
  |   File "/usr/lib/python3.11/asyncio/taskgroups.py", line 133, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/tcp_server.py", line 70, in run
    |     await self._read_data()
    |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/asyncio/tcp_server.py", line 105, in _read_data
    |     await self.protocol.handle(RawData(data))
    |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/protocol/__init__.py", line 62, in handle
    |     return await self.protocol.handle(event)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/protocol/h2.py", line 189, in handle
    |     await self._handle_events(events)
    |   File "/home/prism/prism_console/.env/lib/python3.11/site-packages/hypercorn/protocol/h2.py", line 259, in _handle_events
    |     await self.streams[event.stream_id].handle(EndBody(stream_id=event.stream_id))
    |           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    | KeyError: 3
    +------------------------------------

I don't need HTTP/2 for this site, so I would also be OK with just disabling it if someone can tell me how.

The error doesn't seem to affect the operation of the site on the client end.

@rroque6428
Copy link

I just faced the same problem today. Apparently there is already a potential solution (Issue #50), and yes no troubles to end users just the above annoying error.

@merlinz01
Copy link
Contributor Author

As far as I can tell, #50 is a different error. I see this comment is the same error as I am getting, an unhandled IndexError on line 259 of protocol/h2.py.

@rroque6428
Copy link

You are right. There are two messages errors there. Ours is a different one. It looks like is always present at ANY TLS HTTP/2 request even the simplest ones (e.g. several GETs to same endpoint). The call itself is honoured, but then sometimes there is an ASYNCIO "Task exception was never retrieved... future" thing. Maybe related to this stackoverflow question. I tried to debug, but I think we need an asyncio expert to explain what is happening (and when), for the async way of programming is new to me. The exception is raised during create_stream execution (def _create_stream in "hypercorn/protocol/h2.py"). Maybe a zoombie task or an exception not properly catched inside it... I don't know for sure. But, it is without a doubt related to a closed connection.

@merlinz01
Copy link
Contributor Author

Here is a boiled-down demonstration of what I have figured out:

import asyncio
import traceback

class Stream:
    def do_whatever(self):
        pass

class StreamClosed:
    stream_id: int

class StreamEnded:
    stream_id: int

class H2Protocol:
    def __init__(self):
        self.streams = {}
        
    async def stream_send(self, event):
        if isinstance(event, StreamClosed):
            await self._close_stream(event.stream_id)
        elif isinstance(event, StreamEnded):
            self.streams[event.stream_id].do_whatever()

    async def _close_stream(self, stream_id):
        self.streams.pop(stream_id)


async def run_task_group():
    # Hypercorn runs a TaskGroup in hypercorn.asyncio.tcp_server
    async with asyncio.TaskGroup() as task_group:
        # There is a H2Protocol object that handles events for a dict of streams
        h = H2Protocol()

        # When a connection is made, a stream gets added to the dict
        h.streams[3] = Stream()

        # When the stream is closed, a StreamClosed event is sent to the protocol.
        # This event causes the stream to be destroyed and forgotten.
        event_1 = StreamClosed()
        event_1.stream_id = 3
        task_group.create_task(h.stream_send(event_1))

        # Then some data is received that causes a StreamEnded event to be also sent.
        # The event looks for the stream_id and raises a KeyError because the stream is not in the dict anymore.
        event_2 = StreamEnded()
        event_2.stream_id = 3
        task_group.create_task(h.stream_send(event_2))


loop = asyncio.new_event_loop()
try:
    loop.run_until_complete(run_task_group())
except ExceptionGroup as grp:
    traceback.print_exc()

This example prints a similar traceback to what I am getting.

@rroque6428
Copy link

rroque6428 commented Mar 1, 2024

Great job. As far as I could see not all requests really create a stream object inside self.streams, so I decided to check if the ID is inside it (code below). However, it would be great to check if such a simple solution cover all cases.

venv/lib/python3.10/site-packages/hypercorn/protocol/h2.py

    async def _handle_events(self, events: List[h2.events.Event]) -> None:
        for event in events:
            if isinstance(event, h2.events.RequestReceived):
                if self.context.terminated.is_set():
...
            elif isinstance(event, h2.events.StreamEnded):
+++             if event.stream_id in self.streams:
                    await self.streams[event.stream_id].handle(EndBody(stream_id=event.stream_id))
+++             else:
+++                 print(f">>> WEIRD! {event.stream_id}")  ## for debug purposes

@merlinz01
Copy link
Contributor Author

I found I can produce this error by closing Firefox and reopening it and revisiting my server.

I suspect that part of the reason why I am getting this error is that my HTTP traffic goes through a proxy server which might not handle HTTP/2 perfectly. I tested running my exact same server on my local PC, and also visiting the site from an unproxied device, and I'm not getting the error.

Still, the error should at least be caught since it turns up in real life in more than one situation.

@hanzhenbjm
Copy link


2024-05-08T15:09:40.308442232+08:00 Traceback (most recent call last):

2024-05-08T15:09:40.308450229+08:00   File "/usr/local/python3.12.0/lib/python3.12/site-packages/hypercorn/asyncio/run.py", line 109, in _server_callback

2024-05-08T15:09:40.308457530+08:00     await TCPServer(app, loop, config, context, reader, writer)

2024-05-08T15:09:40.308465026+08:00   File "/usr/local/python3.12.0/lib/python3.12/site-packages/hypercorn/asyncio/tcp_server.py", line 74, in run

2024-05-08T15:09:40.308472848+08:00     await self._close()

2024-05-08T15:09:40.308483455+08:00   File "/usr/local/python3.12.0/lib/python3.12/site-packages/hypercorn/asyncio/tcp_server.py", line 117, in _close

2024-05-08T15:09:40.308493708+08:00     await self.writer.wait_closed()

2024-05-08T15:09:40.308505402+08:00   File "/usr/local/python3.12.0/lib/python3.12/asyncio/streams.py", line 349, in wait_closed

2024-05-08T15:09:40.308532036+08:00     await self._protocol._get_close_waiter(self)

2024-05-08T15:09:40.308544232+08:00 TimeoutError: SSL shutdown timed out

NAME="CentOS Linux"
VERSION="7 (Core)"
Is this the same problem? http2 is also covered

app, config = create_app()

def create_app():

    app = FastAPI(lifespan=app_lifespan)
    config = Config()
    config.bind = ["0.0.0.0:6001"]
    config.certfile = "cert.pem"
    config.keyfile = "key.pem"
    return app, config

def _exception_handler(lp, context):
    exception = context.get("exception")
    if isinstance(exception, ssl.SSLError):
        pass
    else:
        lp.default_exception_handler(context)


if __name__ == '__main__':
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.set_exception_handler(_exception_handler)
    loop.run_until_complete(serve(app, config))

@merlinz01
Copy link
Contributor Author

That seems to be a different error. Are you using Quart?

@pgjones Any input on this issue?

@nabheet
Copy link

nabheet commented May 13, 2024

Hello! I finally was able to step through the code and here is what I have so far:

  1. All discussion is limited to hypercorn/protocol/h2.py module
  2. somehow the _handle_events function on line 237 (in my version) is called with 5 events in the following order - RemoteSettingsChanged, WindowUpdated, RequestReceived, StreamEnded and PriorityUpdated.
  3. When the loop in the function reaches RequestReceived after processing them one at a time, it ends up calling _close_stream which removes (pops) the stream from the dictionary of streams self.streams. This happens in the stream_send function on line 197 when it gets the StreamClosed event. You can see that the _close_stream function is called.
  4. Then the next event to process is StreamEnded which looks for the stream id in self.streams and it is not there.
  5. Exception thrown.

So, IMHO, basically the StreamClosed event is processed before the StreamEnded event.

I hope I am on the correct track. I don't really know how best to resolve this issue. But I can submit a PR with a hacky proposal. However, I would love to fix it for the next release if someone could advise on the best approach.

Thank you!!!!

EDIT 1:

I just read the entire thread and looks like most of the research was already done. Sorry! I didn't understand what was happening until I stepped through the code myself. Now I have duplicated the effort.

However, I did try something similar to the solution proposed above.

elif isinstance(event, h2.events.StreamEnded) and event.stream_id in self.streams:
  await self.streams[event.stream_id].handle(EndBody(stream_id=event.stream_id))

This cause the streams dict to grow unbounded. So obviously a bad idea.

Then I tried pretty much exactly the proposal above:

elif isinstance(event, h2.events.StreamEnded):
  if event.stream_id in self.streams:
    await self.streams[event.stream_id].handle(EndBody(stream_id=event.stream_id))

And I believe this seems to work, at least in my testing ... I would give this solution a 👍 !!!!!!

EDIT 2:

Just wanted to add that our app based on FastAPI and I can reproduce this error on every request locally on my laptop. This only happens on SSL. In my testing I am using chrome which is not using the clear text protocol but the other one (binary?). And I used a self-signed cert locally. Very easy to reproduce.

@pgjones
Copy link
Owner

pgjones commented May 27, 2024

Does 24aacf4 fix this?

@merlinz01
Copy link
Contributor Author

Looks like it should. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants