Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events Streams Blocking Console When Start Events Not Sent #2950

Closed
htdvisser opened this issue Jul 21, 2020 · 14 comments
Closed

Events Streams Blocking Console When Start Events Not Sent #2950

htdvisser opened this issue Jul 21, 2020 · 14 comments
Assignees
Labels
c/shared This is shared between components dependencies Pull requests that update a dependency file

Comments

@htdvisser
Copy link
Contributor

Summary

Since #2565 the events streams in the console block after clicking around in the console.

Steps to Reproduce

  1. Go to https://tti.staging1.cloud.thethings.industries/console/gateways
  2. In a single tab, open a gateway and use the breadcrumbs to go back
  3. Repeat for each gateway

What do you see now?

At some point the console keeps loading. This seems to be caused by "too many open connections" because /api/v3/events requests aren't closed.

What do you want to see instead?

The console should stay responsive.

Environment

Build based on the current master branch 3df51cd (don't pay attention to the 3.8.5 that is shown by the console)

How do you propose to implement this?

I suspect this is caused by the fact that headers are only written right before the first message on the stream. If there's no message, there's no header and the browser keeps waiting.

So to investigate, I suggest to:

  • Check if we write the header correctly
  • Check if gRPC gateway writes and flushes the header correctly
  • Check if the Echo/Mux stuff doesn't interfere

How do you propose to test this?

Try the reproduction steps

@htdvisser htdvisser added the bug Something isn't working label Jul 21, 2020
@htdvisser htdvisser added this to the July 2020 milestone Jul 21, 2020
@rvolosatovs
Copy link
Contributor

On Firefox console is loading forever if there are 6 open tabs - if e.g. one of the working tabs is refreshed - one of the loading tabs unblocks, but that seems to fail sometimes.

In any case, I tried sending the start/end messages, but that did not change anything - it still hangs after there are 6 tabs.

Note: all tabs are at data view of a gateway (2 different ones)

@htdvisser
Copy link
Contributor Author

@kschiffer any ideas?

@kschiffer
Copy link
Member

kschiffer commented Jul 22, 2020

On Firefox console is loading forever if there are 6 open tabs - if e.g. one of the working tabs is refreshed - one of the loading tabs unblocks, but that seems to fail sometimes.

That is expected and a limitation unsing event streams while not using HTTP/2:

When not used over HTTP/2, SSE suffers from a limitation to the maximum number of open connections, which can be specially painful when opening various tabs as the limit is per browser and set to a very low number (6). The issue has been marked as "Won't fix" in Chrome and Firefox. This limit is per browser + domain, so that means that you can open 6 SSE connections across all of the tabs to www.example1.com and another 6 SSE connections to www.example2.com. (from Stackoverflow). When using HTTP/2, the maximum number of simultaneous HTTP streams is negotiated between the server and the client (defaults to 100).

I have trouble reproducing the original issue since I only have access to two connected gateways on the staging environment.

@kschiffer
Copy link
Member

kschiffer commented Jul 22, 2020

I suspect this is caused by the fact that headers are only written right before the first message on the stream. If there's no message, there's no header and the browser keeps waiting.

I can confirm this. The requests will stall if they don't get a response header. After six such connections being open, all subsequent XHRs will stall as well.

Before #2565 this was not a problem since the event stream would always sent a "start stream message" immediately, which resulted in the headers being sent.

As such, I don’t see it as a frontend issue. The only thing that would make sense to change on the frontend is making sure that stalled stream connections are killed after a certain time.

@rvolosatovs
Copy link
Contributor

So the issue happens when event streams do not send any data. The XHR performing the event stream connection will only resolve after the first message has been received, otherwise, it will hang in the pending state. Once 6 of those connections are opened all other connections will hang as well, since the maximum amount of concurrent TCP connections has been reached. So it appears like the XHR expects some kind of acknowledgment from the server that the stream is established.

Before #2565 this was not a problem since the event stream would always sent a "start stream message".

I'm currently trying to find out if and how this can be fixed on the frontend side.

In my experience sending "initial message" does not help with this issue. Please try yourself with the following diff:

diff --git a/pkg/events/grpc/grpc.go b/pkg/events/grpc/grpc.go
index 03f229ca0..5a305b5ac 100644
--- a/pkg/events/grpc/grpc.go
+++ b/pkg/events/grpc/grpc.go
@@ -119,6 +119,10 @@ func (srv *EventsServer) Stream(req *ttnpb.StreamEventsRequest, stream ttnpb.Eve
 	if err := stream.SendHeader(metadata.MD{}); err != nil {
 		return err
 	}
+	if err := stream.Send(&ttnpb.Event{}); err != nil {
+		return err
+	}
+	defer stream.Send(&ttnpb.Event{})
 
 	for {
 		select {

Here's a video:
https://youtu.be/0Ir0lakV-Mc

@htdvisser
Copy link
Contributor Author

I do see a difference.

On master WITHOUT the diff:

htdvisser % curl -v 'http://localhost:1885/api/v3/events' --compressed -H 'Authorization: Bearer MFRWG.DVTINRZNJDORITWD64NUJRIYVIXWCKJ3Q6VYLQY.E4K63GZ3WWTZGCIFMD7XLN7A7ACA35YCQSMFCBVCTEOMATCYGG6Q' -H 'Accept: text/event-stream' -H 'Content-Type: application/json' --data-raw '{"identifiers":[{"application_ids":{"application_id":"admin-app"}}]}'
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1885 (#0)
> POST /api/v3/events HTTP/1.1
> Host: localhost:1885
> User-Agent: curl/7.64.1
> Accept-Encoding: deflate, gzip
> Authorization: Bearer MFRWG.DVTINRZNJDORITWD64NUJRIYVIXWCKJ3Q6VYLQY.E4K63GZ3WWTZGCIFMD7XLN7A7ACA35YCQSMFCBVCTEOMATCYGG6Q
> Accept: text/event-stream
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
^C

On master WITH the diff.

htdvisser % curl -v 'http://localhost:1885/api/v3/events' --compressed -H 'Authorization: Bearer MFRWG.DVTINRZNJDORITWD64NUJRIYVIXWCKJ3Q6VYLQY.E4K63GZ3WWTZGCIFMD7XLN7A7ACA35YCQSMFCBVCTEOMATCYGG6Q' -H 'Accept: text/event-stream' -H 'Content-Type: application/json' --data-raw '{"identifiers":[{"application_ids":{"application_id":"admin-app"}}]}'
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 1885 (#0)
> POST /api/v3/events HTTP/1.1
> Host: localhost:1885
> User-Agent: curl/7.64.1
> Accept-Encoding: deflate, gzip
> Authorization: Bearer MFRWG.DVTINRZNJDORITWD64NUJRIYVIXWCKJ3Q6VYLQY.E4K63GZ3WWTZGCIFMD7XLN7A7ACA35YCQSMFCBVCTEOMATCYGG6Q
> Accept: text/event-stream
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 200 OK
< Content-Type: text/event-stream
< Referrer-Policy: strict-origin-when-cross-origin
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-Request-Id: 01EDTX9RGKRGM02YZVWNB3RXJP
< X-Xss-Protection: 1; mode=block
< Date: Wed, 22 Jul 2020 09:22:32 GMT
< Transfer-Encoding: chunked
< 
{"result":{"time":"0001-01-01T00:00:00Z"}}
^C

@kschiffer
Copy link
Member

kschiffer commented Jul 22, 2020

We have to differentiate this issue:

  1. Whenever there are six TCP connections concurrently open per domain and browser, any subsequent request will hang. This is a browser limitation and IMO ways to deal with this should be discussed in another issue. This is the issue I see in your video @rvolosatovs
  2. When a stream endpoint does not send a response header immediately, the request will stall (until the header is eventually sent on the first message). This means that although using just one tab, the console will accumulate pending stream connections when repeatedly navigating to pages that open stream connections. After six such connections being open, the console will hang, for the same reason as in (1).

To mitigate 2, the response header needs to be sent immediately.

One thing to note here as far as the console goes, we should look into also cancelling pending requests. Currently the requests can only be cancelled once the stream connection has been established.

@rvolosatovs
Copy link
Contributor

@kschiffer can you give steps to reproduce the issue in 2. ?
I'm constantly refreshing a single gateway data tab and experience no hangs with latest master (without patch)

@kschiffer
Copy link
Member

kschiffer commented Jul 22, 2020

The easiest way is to create a completely fresh and unconnected gateway since it will not send any messages. Then navigating to its overview page and back to the gateway list six times. 

This is not a problem specific to gateway events but all event streams by the way.

@kschiffer kschiffer removed their assignment Jul 27, 2020
@kschiffer
Copy link
Member

Unassigning me since this is not an issue originating from the Console.

@htdvisser
Copy link
Contributor Author

Closed by #2989

@rvolosatovs rvolosatovs changed the title Events Streams Blocking Console Events Streams Blocking Console When Start Events Not Sent Jul 30, 2020
@rvolosatovs rvolosatovs removed prio/high blocking release This is blocking a release labels Jul 30, 2020
@rvolosatovs
Copy link
Contributor

A temporary solution was introduced in #2989 - a more robust solution is yet to be found.

@rvolosatovs rvolosatovs reopened this Jul 30, 2020
@rvolosatovs rvolosatovs modified the milestones: July 2020, Next Up Jul 30, 2020
@htdvisser htdvisser added c/shared This is shared between components dependencies Pull requests that update a dependency file and removed bug Something isn't working labels Aug 24, 2020
@htdvisser
Copy link
Contributor Author

So this is either something in our dependencies (grpc-gateway not properly flushing headers) or in our shared code (maybe some grpc or http middleware not flushing the headers).

Removing bug label because it's currently not something that impacts our users.

@johanstokking
Copy link
Member

Please open a new issue if there's still something that needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/shared This is shared between components dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

4 participants