Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge server can permanently lose connectivity to Origin after Origin server restart #1530

Open
d-uzlov opened this issue Feb 20, 2024 · 2 comments
Assignees
Labels
bug Confirmed as bug

Comments

@d-uzlov
Copy link
Contributor

d-uzlov commented Feb 20, 2024

Describe the bug

Origin server doesn't watch for the entry in OriginMapStore to be valid.

When I restart OME origin server, sometimes Edge fails to re-connect to new origin server. It doesn't reconnect even if I wait several minutes.
I tracked it down to the fact that redis no longer contains an entry for the stream, and reproduced it without an Edge server.

To Reproduce

  1. Set up Redis
  2. Enable OriginMapStore in Origin server config:
<Server version="8">
    <VirtualHosts>
            <OriginMapStore>
                <RedisServer>
                    <Host>${env:REDIS_SERVICE:redis}:${env:REDIS_PORT:6379}</Host>
                    <Auth>${env:REDIS_PASS}</Auth>
                </RedisServer>
                <OriginHostName>${env:OME_HOST_IP}</OriginHostName>
            </OriginMapStore>
        </VirtualHost>
    </VirtualHosts>
</Server>
  1. Start OME origin server
  2. Start the stream in OBS with automatic reconnect
  3. Restart OME origin server (it may matter that I kill the docker container forcefully after 1s timeout, since it doesn't seem to respond to SIGTERM)
  4. Check the list of Redis keys: redis-cli -a "$redis_pass" keys "*"

You may need to try the restart several times, for me it reproduces roughly 1/3 of restarts.
Also, you may need to check redis several times, for me there was some delay between restart and redis entry disappearing.

Expected behavior

Origin server updates OriginMapStore to put a valid entry for re-connected stream.

Actual behavior

Redis entry somehow gets deleted. Either by Redis itself, or by the Origin server.

It seems like Origin server tries to register the stream in OriginMapStore only once, when stream starts, and doesn't update / maintain it during the stream.

Logs

�[37m[2024-02-20 04:21:23.451] I [OutboundWorker:32] MediaRouter | mediarouter_application.cpp:477  | [#default#tc/danil(1818817424)] Stream has been prepared 
[Stream Info]
id(1818817424), msid(0), output(danil), SourceType(Transcoder), RepresentationType(Source), Created Time (Tue Feb 20 04:21:22 2024) UUID(72382f59-0bdf-49ae-97d6-e7c401cf3943/default/#default#tc/danil/o)
	>> Origin Stream Info
	id(100), output(danil), SourceType(Rtmp), Created Time (Tue Feb 20 04:21:22 2024)

	Video Track #0: Public Name(Video_0) Variant Name(bypass_video) Bitrate(25.00Mb) Codec(1,H264,Passthrough:0) BSF(AVCC) Resolution(2560x1440) Framerate(60.00) KeyInterval(60/frame) SkipFrames(0) BFrames(0) timebase(1/1000)
	Video Track #1: Public Name(Video_0) Variant Name(720p) Bitrate(2.50Mb) Codec(1,H264,default:0) BSF(H264_ANNEXB) Resolution(1280x720) Framerate(60.00) KeyInterval(60/frame) SkipFrames(0) BFrames(0) timebase(1/90000)
	Video Track #2: Public Name(Video_0) Variant Name(1080p) Bitrate(8.00Mb) Codec(1,H264,default:0) BSF(H264_ANNEXB) Resolution(1920x1080) Framerate(60.00) KeyInterval(60/frame) SkipFrames(0) BFrames(0) timebase(1/90000)
	Video Track #3: Public Name(Video_0) Variant Name(image_0) Bitrate(1.00Mb) Codec(9,JPEG,default:0) BSF(JPEG) Resolution(854x480) Framerate(1.00) KeyInterval(0/frame) SkipFrames(0) BFrames(0) timebase(1/90000)
	Audio Track #4: Public Name(Audio_1) Variant Name(aac) Bitrate(128.00Kb) Codec(6,AAC,Passthrough) BSF(AAC_RAW) Samplerate(48.0K) Format(fltp, 32) Channel(stereo, 2) timebase(1/1000)
	Audio Track #5: Public Name(Audio_1) Variant Name(opus) Bitrate(128.00Kb) Codec(8,OPUS,default) BSF(OPUS) Samplerate(48.0K) Format(s16, 16) Channel(stereo, 2) timebase(1/48000)
	Data  Track #6: Public Name(Data_2) Variant Name(Data) Codec(0,Unknown,Passthrough) BSF(ID3v2) timebase(1/1000)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] WebRTC Publisher | rtc_stream.cpp:163  | RtcStream(#default#tc/danil) - Ignore unsupported codec(JPEG)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] WebRTC Publisher | rtc_stream.cpp:163  | RtcStream(#default#tc/danil) - Ignore unsupported codec(AAC)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] WebRTC Publisher | rtc_stream.cpp:200  | WebRTC Stream has been created : danil/1818817424
Rtx(false) Ulpfec(false) JitterBuffer(false) PlayoutDelay(false min:0 max: 0)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] Publisher | stream.cpp:212  | WebRTC Publisher Application application has started [danil(1818817424)] stream (MSID : 0)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] LLHLS Publisher | llhls_stream.cpp:118  | LLHlsStream(#default#tc/danil) - Ignore unsupported codec(JPEG)�[0m
�[37m[2024-02-20 04:21:23.452] I [OutboundWorker:32] LLHLS Publisher | llhls_stream.cpp:118  | LLHlsStream(#default#tc/danil) - Ignore unsupported codec(OPUS)�[0m
�[37m[2024-02-20 04:21:23.453] I [OutboundWorker:32] LLHLS Publisher | llhls_stream.cpp:213  | LLHlsStream has been created : danil/1818817424
OriginMode(true) Chunk Duration(0.50) Segment Duration(6) Segment Count(10) DRM(none)�[0m
�[37m[2024-02-20 04:21:23.453] I [OutboundWorker:32] Publisher | stream.cpp:212  | LLHLS Publisher Application application has started [danil(1818817424)] stream (MSID : 0)�[0m
�[91m[2024-02-20 04:21:23.455] E [OutboundWorker:32] OriginMapClient | origin_map_client.cpp:71   | <tc/danil> stream is already registered.�[0m
�[33m[2024-02-20 04:21:23.455] W [OutboundWorker:32] OVT | ovt_stream.cpp:46   | Failed to register stream to origin map store : #default#tc/danil�[0m
�[37m[2024-02-20 04:21:23.456] I [OutboundWorker:32] Publisher | stream.cpp:212  | ThumbnailPublisher Application application has started [danil(1818817424)] stream (MSID : 0)�[0m
�[33m[2024-02-20 04:21:23.456] W [OutboundWorker:32] Publisher | application.cpp:287  | OVTPublisher Application could not start [danil] stream.�[0m
�[37m[2024-02-20 04:27:24.387] I [SPRTMP-t1935:20] RTMPProvider | rtmp_provider.cpp:216  | The RTMP client has disconnected: [#default#tc/danil], remote: <ClientSocket: 0x7f9b33e01010, #18, Disconnected, TCP, Nonblocking, 10.0.1.1:50064>�[0m

Server (please complete the following information):

  • OS: Debian 12
  • OvenMediaEngine Version: docker.io/airensoft/ovenmediaengine:0.16.4

Additional context

Here is an example of how I check the issue using OME deployment in Kubernetes.
OriginMapStore entry seems to be present in Redis for a few more seconds after the Origin server is restarted, but then it's just gone.

$ kl -n ome get pod
NAME                              READY   STATUS    RESTARTS   AGE
ome-origin-cpu-5d774bff5f-64dl6   1/1     Running   0          8s
redis-7557f5946c-678nl            1/1     Running   0          20m
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
tc/danil
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome delete pod ome-origin-cpu-5d774bff5f-64dl6 
pod "ome-origin-cpu-5d774bff5f-64dl6" deleted
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
tc/danil
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
tc/danil
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.

danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.

danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ kl -n ome exec deployments/redis -- redis-cli -a "$redis_pass" keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
@d-uzlov d-uzlov changed the title Edge server can permanently lose connectivity after Origin server restart Edge server can permanently lose connectivity to Origin after Origin server restart Feb 20, 2024
@getroot getroot self-assigned this Feb 21, 2024
@getroot getroot added the bug Confirmed as bug label Feb 21, 2024
@getroot
Copy link
Sponsor Member

getroot commented Feb 21, 2024

Yes, guessing the scenario, I think this issue will recur. While resolving conflicts from multiple servers, I need to think about how to solve this problem. Thanks for reporting the bug.

Copy link

stale bot commented Apr 21, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 21, 2024
@stale stale bot closed this as completed May 2, 2024
@getroot getroot removed the stale label May 2, 2024
@getroot getroot reopened this May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed as bug
Projects
None yet
Development

No branches or pull requests

2 participants