Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams sometimes de-synced from Glimesh.tv and janus-ftl-plugin #112

Open
clone1018 opened this issue Mar 19, 2021 · 7 comments
Open

Streams sometimes de-synced from Glimesh.tv and janus-ftl-plugin #112

clone1018 opened this issue Mar 19, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@clone1018
Copy link
Member

We're seeing an increasing number of users who are reporting that their Channel is not appearing as "LIVE" on their channel page, and that their channel does not show up in channel list. Whenever the users report it I query the API to see their current status and sure enough their Channel.status = offline and their Channel.Stream (which should be an object) is null instead. This means that Glimesh.tv thinks the stream is offline even though janus is still relaying video.

It's worth noting we have a mechanism on Glimesh.tv where if the channel is offline, and janus-ftl-plugin sends a logStreamMetadata() or uploadStreamThumbnail() it'll send an error back to janus-ftl-plugin of Stream has ended. The plugin should handle that by disconnecting the users video stream.

I don't think it's a safe bet to entirely rule out the Glimesh.tv repo from this bug, but I am currently convinced it lives inside this repo.

@clone1018 clone1018 added the bug Something isn't working label Mar 19, 2021
@clone1018
Copy link
Member Author

I've confirmed it's possible to have this issue occur both:

  1. When starting your stream
  2. Randomly in the middle of your stream

@clone1018
Copy link
Member Author

I'm tracing a channel I just watched it happened to.

It looks like the Stream.ended_at = '2021-03-19 13:28:17'. The user is streaming to do-nyc3-ingest2.kjfk.live.glimesh.tv.

Here's the surrounding log, which don't look odd...

Mar 19 13:27:26 do-nyc3-ingest2.kjfk.live.glimesh.tv janus[776558]: [2021-03-19 13:27:26.809] [info] End Stream Relay request from Orchestrator: Channel 5496, Stream 60093, Target: do-nyc3-edge10.kjfk.live.glimesh.tv
Mar 19 13:27:26 do-nyc3-ingest2.kjfk.live.glimesh.tv janus[776558]: [2021-03-19 13:27:26.809] [info] Stopping relay for channel 5496 / stream 60093 -> do-nyc3-edge10.kjfk.live.glimesh.tv...
Mar 19 13:28:34 do-nyc3-ingest2.kjfk.live.glimesh.tv janus[776558]: [2021-03-19 13:28:34.763] [info] Start Stream Relay request from Orchestrator: Channel 5496, Stream 60093, Target do-nyc3-edge4.kjfk.live.glimesh.tv

@clone1018
Copy link
Member Author

Last bit of info I noticed on do-nyc3-ingest1 (which the streamer is not connected to).

[2021-03-19 13:51:12.104] [warning] Glimesh service connection HTTP request failed with error 4
[2021-03-19 13:51:12.104] [warning] Attempt 1 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...

I wonder if we still have funky http client stuff going on.

@haydenmc
Copy link
Member

Grabbed a dump of a repro instance on a production server, and found that the metadata thread was hung up on an HTTP request. Partial backtrace here:

Thread 8 (Thread 0x7fd6a7fff700 (LWP 758584)):
#0  __libc_read (nbytes=460, buf=0x7fd69ccb2571, fd=18) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=18, buf=0x7fd69ccb2571, nbytes=460) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007fd6c4a7f389 in  () at /lib/x86_64-linux-gnu/libcrypto.so.1.1
#3  0x00007fd6c4a7a62e in  () at /lib/x86_64-linux-gnu/libcrypto.so.1.1
#4  0x00007fd6c4a79484 in  () at /lib/x86_64-linux-gnu/libcrypto.so.1.1
#5  0x00007fd6c4a79a57 in BIO_read () at /lib/x86_64-linux-gnu/libcrypto.so.1.1
#6  0x00007fd6c4d00b91 in  () at /lib/x86_64-linux-gnu/libssl.so.1.1
#7  0x00007fd6c4d055b6 in  () at /lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x00007fd6c4d026d0 in  () at /lib/x86_64-linux-gnu/libssl.so.1.1
#9  0x00007fd6c4d09c45 in  () at /lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007fd6c4d14a1f in  () at /lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x00007fd6c4d14b27 in SSL_read () at /lib/x86_64-linux-gnu/libssl.so.1.1
#12 0x00007fd6c1e27991 in httplib::detail::SSLSocketStream::read(char*, unsigned long) (this=0x7fd6a7ffd920, ptr=0x7fd6a7ffc7b7 "", size=1) at ../vendor/cpp-httplib/httplib.h:6532
#13 0x00007fd6c1e2f60c in httplib::detail::stream_line_reader::getline() (this=0x7fd6a7ffc8e0) at ../vendor/cpp-httplib/httplib.h:1736
#14 0x00007fd6c1e55563 in httplib::ClientImpl::read_response_line(httplib::Stream&, httplib::Request const&, httplib::Response&) (strm=..., req=..., res=..., this=0x7fd69c1c6cd0) at ../vendor/cpp-httplib/httplib.h:5258
...

This look like a known issue in cpp-httplib, tracked here: yhirose/cpp-httplib#873

My current working theory as to what happens:

1.) Streamer connects to ingest, an HTTP request is made to the Glimesh API and the stream is started successfully.
2.) The metadata reporting thread is hung up per the issue above, so the ingest is never able to deliver metadata updates to the Glimesh API for this stream.
3.) The Glimesh API assumes the stream is now stale, and marks it offline.

The biggest bummer is that so long as the metadata thread is stuck, this will happen to every stream on that ingest.

@clone1018
Copy link
Member Author

We do seem to still have this bug, or a slight variation, where certain streams stay online instead of going offline even after the video content has stopped. I checked out database and we are still getting a metadata stream from them.

@clone1018
Copy link
Member Author

Confirmed this morning that the exact original bug is still occurring. A user was still transmitting data packers, orchestrator still had knowledge of them, and edges properly distributed the video. The user was appearing as offline on Glimesh.tv, with no metadata being sent to the server.

@haydenmc
Copy link
Member

haydenmc commented Apr 1, 2021

Some offline investigations revealed remaining issues may be on the API side. Will keep this issue open to track until we can confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants