New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible explanation for difference in regularity of packets received from a small latency vs large latency buffer? #2911
Comments
The only way to see what is happening here is to grab the time at the moment of scheduling a packet (let's say the automatic time pickup is in use) and the same times at the receiver side when the packet is given up to the application (there's the PLAYING phrase in the debug log). This could be then put into a spreadsheet with getting time comparison between subsequent packets. These differences between the same packets (that can be identified by sequence numbers) should be very much the same. If there are any visible differences, then logs around this value should be found and checked what ingredients were used to calculate it (these details should be in the debug logs). Anyway, the whole idea of TSBPD and the live mode is that the time difference between two subsequent packets should be exactly the same on the sender (taking the application-send time) and the receiver (taking the time when the receiver application gets the packet). If it's not, then there's something wrong. If you could help a bit more, you may try also the same with some older version, like the last from 1.3 line. |
Thanks! I'm busy investigating this at the moment in a similar manner to how you describe, although using the PTS of my custom payload representing audio/video packets to identify them. When my latency is 100ms, SRT seems to be occasionally giving up packets to the application that have arrived more than 100ms later than expected, rather than dropping them. Typical pattern is a long gap (>100ms) then a quick succession of packets as it catches up. Is there some element of tolerance here? Or is it possible that let's say I'm not reading packets fast enough from SRT that this could lead to it giving up packets late that ought to have been dropped? The fact that this issue reduces and goes away as I increase latency suggests these are packets that are meant to have been dropped, but for some reason aren't being. I still need to verify the timings on the server so I'm still figuring this out. |
So this is the high level overview. As you can see the intervals between calls to SRT send on the server are fairly constant, which I would expect from the encoder output. Yet, the intervals between the player receives are far more variable. This particular sample doesn't break the player, but the variations can be more extreme such the that player is starved of packets and then gets a load in one go and has to adjust to this (=stutter). Interestingly (or not), I get the same pattern as below whether the server is Windows or Linux (Jetson ARM), and whether the player is tvOS or macOS. Next stage is to investigate the SRT logs, I suppose. INTERVALS BETWEEN SERVER SEND *** VIDEO DTS 23433519574 INT 20366us *** INTERVALS BETWEEN PLAYER RECEIVE *** VIDEO DTS 23433519574 INT 24765us *** |
Yeah. Just note that it is crucial to set the latency to a value that doesn't conflict with the current RTT. The RTT measurement is also visible in the logs, but is reported in the stats. SRT only tries to add a delay on a received data packet, if it has arrived EARLIER than its play time, but when this packet arrived later than this time, it's still delivered. SRT doesn't drop packets that have arrived. It drops only those packet that haven't arrived at the time when the NEXT packet is ready to play. Note that RTT should give you an idea of how big the STT (single trip time) is, or in general the "sending delay" from one machine to another. This document explains the details. The latency can be just as well set to 0 because the delay of the first packet is included in the formula. But if you set it to 0, you must be somehow certain that every next packet will be delayed in the sending process over the network exactly the same as the very first packet, or at least not more than that (and obviously you don't lose packets, as with latency 0 every lost packet will be dropped). It's then not exactly the overall value of RTT that causes the problem, but it's variance. The recommendation for setting the latency in a function of measured RTT is not such due to the value of RTT, but by stating that the higher the RTT, the more it can potentially vary. |
Hmm thanks for the explanation. In this case, this is over the LAN, so the latency of 100ms shouldn't be an issue. I'm working on producing some (heavy) logs. |
Well, I have been trying this out also in the LAN (that's my first-hand environment for testing) and I can only tell you that the variance of the packet delivery time can be surprisingly high. I've learned it the hard way when developing the backup-type groups and counted on that ACK, which is sent every 10ms, can be delayed for up to 20ms, or even 50ms. It's LAN after all. After first tests I learned that I couldn't be more wrong. |
A slight tangent - the way my application works is it initially opens up 4 x SRT connections to the server, each of which is intended for a different profile/latency. This is to facilitate rapid switching between them. Only one is active at a time, the other 3 remain idle. As an experiment/attempt to strip things down to the minimum, I just disabled this and made it only open one connection. Take a look at the difference. I suppose this could be overhead from the additional threads and context switching (although, this is on a pretty powerful M3 Pro MacBook Pro) or is there some sort of contention between SRT connections that I'm overlooking? 1 x ACTIVE SRT CONNECTION, 3 x IDLE *** VIDEO DTS 64243639662 INT 25330us *** 1 x ACTIVE SRT CONNECTION *** VIDEO DTS 64383595620 INT 20196us *** |
So this player-side log has some examples of a few very high "belated" PLAYING PACKET lines. By the way, I cannot enable heavy logging on the server as it slows down everything too much (it's relatively low-powered, Jetson ARM). However, I know that I am submitting packets regularly to SRT on the server. For example
|
Yeah, so this can explain things. But then, please find the line that shows this same packet's times at the moment of reception. This is the log containing the RECEIVED phrase and it should show the ETS value. The value is in a different clock space than the log clock, but it can at least show you if the packet has already arrived after its expected arrival time or not. |
This is this log:
(And might be that you can restore this OTS, I'm just not sure whether it's the right time that should be displayed). |
So I'm seeing two things.
|
Yeah, and you should check all other logs that mention the packet with the same sequence number. This should shed some light on when exactly this packet has arrived over the network and what is the cause of it being so delayed. NOTE: AFAIK the current code is still using the old manner of ACK-eclipsing, that is, packets stay unavailable for the reader, even if they are ready to play, if ACK position hasn't moved behind them. I have changed that behavior in the #2527 PR. It's highly experimental and Max has already checked that it suffers of performance problems (so there's still a lot to be fixed there), but you might try this out whether it might do any better job. |
So taking the packet that was 305ms late, sequence number 90071056, it appears in the log in the places below. So is the bottom line basically that a range of packets just didn't arrive, and then all arrived in quick succession?
|
Ok, most of the lines that appear in this log match the sequence number of the packet because it was occasionally a sequence of the first lost packet and hence the first "no packet" cell and so the ACK number, as well as the first sequence in the receiver buffer. The packet has been physically received around the moment reported by the log "INCOMING PACKET". The distance between the log reports of this one and the "PLAYING PACKET" is about 100ms. This means that if it was already belated 305ms at this last report, it means that it has definitely come already 205ms after its play time (which might have happened in theory as it was retransmitted). It was fortunate to not get dropped only because the first readable packet in the buffer (sequence %90071080) had likely some later play time than this, otherwise this packet - and all up to %90071079 - would have been already dropped to allow this one to play. And still, with a delay required to properly switch threads at the time to TSBPD so that it picks up the play time of the next packet to play. An unusual thing what I can see here is why the TSBPD thread wasn't woken up earlier anyway. Might be that somehow the receiver thread was treated as a high priority as getting updates from the system with the newly received packets, or has been even kept for too long time frozen, as the system decided to collect-and-burst freshly received packets as they came in as retransmitted and therefore in a burst. You may still try out the version in that PR I showed you above. It is of poor performance as for now, but it is designed to get updates from the newly received packets directly to TSBPD and avoid most of the spurious wakeups of this thread. |
Ok, thank you for that, it all makes sense. I think it might be the state of my ethernet wiring; I just connected the Apple TV to the router using Powerline (G.hn) technology instead and it seems much more solid so far. I still need to workaround the problems I'm having with multiple SRT connections though. Are there global locks or anything that might mean separate SRT connections affect one another, or is it likely to be just a problem with having too many threads, context switches? |
Well, that depends on many conditions, such as:
Locks are done for various shared objects inside, but the only global lock is happening when dispatching a socket by its ID, which can be only a bit more loaded when you are processing a connection at the moment or configuring a socket. However, don't underestimate also that ACK-signoff problem. I don't exactly know how much it disturbs, but preferring ACK state to play time is at least quite a big theoretical possibility. |
Multiple SRT connections within one application, and no other SRT connections on other applications. Connections connect out on the default auto-assigned port. Connections made once at the start, then held with one receiving data from the server, the others doing nothing (server not sending anything, player just sitting on a poll). What do you mean re: "ACK-signoff problem"? |
Then this ACK-signoff problem is the only things that comes to my mind. That's why you should really try it. |
Ah ok you mean the PR, so avoiding spurious wake-ups might help then. |
Yeah. I don't know if it will help much, but if you have a problem like that it's worth a try. |
The other thing I’m pondering is the way a complete packet , likely comprised of multiple SRT packets is sent and received, specifically the timings of each packet. Really one wants to consider a frame packet as a discrete entity that is sent at time X and is available to play at time Y. In reality, each SRT packet will have a timestamp of X, X+something, X+something more etc. On the receive, the complete assembled packet will only be available when all SRT packets have been received, thus at time X+whatever. I’m curious as to what would happen if all sub packets are given the same timestamp X and whether that would result in less fragmented sleep/wake/sleep on the player side. Obviously that would mean sending some SRT packets at a time that has already passed, so I’m not sure of the effect of that and whether it’s a good or bad idea. Alternatively, one could only pass on a completed video packet to the player when the first SRT packet of the following video frame packet has been received, thus aligning to packet boundaries, but that wouldn't help with any excessive sleep/wake/sleep activity, only with "smoothing" packet delivery to the player. |
I tried the above and it didn't make any noticeable difference. Here's something interesting though; I'm testing on iOS today, just a single SRT connection. I get these intervals between video packets: STREAM_CONNECTION: Video packet interval is 23039us So some variance, but not too bad. But if I replace this line in SRT...
...with this instead...
I get these results: STREAM_CONNECTION: Video packet interval is 19312us Which show noticeably less variance. Obviously there are CPU considerations to running a tight loop like that, but it's interesting to see that there might be potential for a more timely release of packets in SRT in some situations, perhaps something along the lines of the USE_BUSY_WAITING that is already an option on the server. |
I found that simply looping and sleeping for half the remaining time produces the same improved regularity of packet release, while preserving the same CPU usage.
Has any analysis been done on this before? i.e. whether in fact packets are released in a timely manner? |
Well, sounds reasonable. @maxsharabayko ? |
This is where the Source Time feature (
This is in a way OS and hardware related, and stream bitrate related. See below tests for the BUSY_WAITING feature in PR #678. See also #936. Waiting for half the time would not be a universal solution.
@oviano You can measure timer accuracy on your playback device using |
Thanks for that. Actually my first thought was to try the same/similiar code used for USE_BUSY_WAITING on the send side for this too, so I more or less copy and pasted that code for this but somewhat counterintuitively it did not have an effect, packets were still delivered irregularly on macOS at least. I’ll continue to investigate given the info in your post. |
By the way, that chart shows exactly why I saw a benefit with the crude "sleep in a loop for half the time remaining" approach for Apple platforms as it would reduce the "over sleep" each time because as it gets closer to the wake time it sleeps for shorter and shorter durations and thus over-sleeps by less each time. |
Might be that we could add such a feature to configure the live sleep time with configurable number of time slices and the length of the first slice. Especially that sleep inaccuracy is not an easy problem to solve, and moreover, it's more of a problem on Windows. Especially very short sleeps are something hard to achieve, probably even a spin lock with yield would yield (XD) a better result. That could be not the most important thing to do in general, but it may allow further development and research. |
The thing I can't quite figure out at the moment is why the equivalent of the USE_BUSY_WAITING code, when placed here, does not have the desired effect of smoothing the release of packets and is less effective (but uses more CPU) then my rudimentary "iteratively sleep for half of what is remaining" hack. I'm testing in macOS right now., so the "td_threshold" is 1ms but I think there might be situations where it is still oversleeping, because let's say we want it to sleep for 8ms, then the USE_BUSY_WAITING path would tell it to sleep for 7ms and then spin for the last 1ms. But I think that 7ms still oversleeps enough time to make a difference. There may also be other factors at play; the unit tests were probably run in isolation, but when the machine is under load, a load of other threads actually doing other stuff, then maybe we don't get the same results in reality as the unit tests. The answer might be a combination of both concepts; instead of sleeping for the 7ms above, we would sleep for 3.5ms and loop again, but still, when we get to the last 1ms, we run the tight loop. |
Not sure if this helps, but note that the condition for having the decoding running smoothly is:
Of course, SRT can't do anything to ensure this, because it's content-agnostic, but there are several things that could be done, I think. First, it might be possible that SRT can be switched during transmission to "aggressive delivery" (which shall not be enabled until at least one I-Frame is delivered). This "aggressive delivery" could cause that the sleep time is taken only of its 8/10 slice to sleep and then if the remaining time is evaluated as less than some "delta" (probably the value should be somehow corelated with the bitrate), then the packet should be delivered to the application even if it is too early (that is, you sleep for 7ms and then simply deliver the packet without spin-waiting for this 1ms). There's also some thing that the sender application can do: The timeslots assigned to subsequent fragments of the MPEG-TS stream can be adjusted to the absolute accuracy required by the distance between last packets of two subsequent I-Frames and the time would be explicitly specified at the sending call, without relying on the library to do it for you. |
Thanks for your thoughts! Actually it's not using MPEG-TS, it's a custom/slimmed-down container format of my own. Essentially just sending the raw video and audio streams with as little overhead as possible. I-frames are only sent once at the start of the stream, and again on request in the event of a discontinuity in the video stream. I've gone through quite a few variations, but I had ended up with around a 100ms buffer after SRT specifically to smooth out the packets coming out of SRT so they can be played smoothly by my player. Then I found that due to drift, and so forth, this buffer could change size. So then I ended up adding all sorts of complicated code to track this buffer size and adjust playback speed etc to keep it under control. This itself required additional audio buffering (for time stretching the audio samples) and it also wasn't 100% reliable and could sometimes break with extreme network conditions. Since my project is one about remote controlling a remote source, latency is important - when the player requests a channel change or navigates the remote source's UI, it needs to be responsive. All this added latency was making it feel sluggish. The 100ms jitter buffer + it needed another 100ms for time-stretching the audio samples. So another 200ms ontop of the SRT buffer and any other latency (the small time taken to initiate a control request, and also the server end-to-end capture-to-encode latency, although that is only around 20ms). So lately, I've sort of gone back to basics; I shouldn't really need another buffer after the SRT one, it's 100ms+ after all. That should be my only real buffer, apart from a small audio buffer plus I also maintain 2-3 video frames just so I can nicely lock playback to the refresh rate of the device. So my challenge was to see if I can try and get the audio and video packets delivered from SRT in as rock-solid a manner as possible so I can feed them directly into the playback engine. I've more or less succeeded with two key changes:
Both of these have *dramatically improved the consistency of the output, as shown in some of my posts further up. You mention releasing packets early - but actually, I wouldn't want that, because if that happens with a few video packets in a row, for example, then my player is going to overflow it's small video frame buffer (I mentioned 2-3 frames) and then when packets revert to being on-time, it's going to risk stutter. What I want is for complete video frames to be released as consistently as possible with respect to one another. Any adjustments due to drift would be ok, it would maybe jump a frame occasionally to "re-align". In terms of scheduling the packet sending on the server, I assume you mean delaying sending a complete frame according to its PTS/DTS. I have experimented with this before, but again you end up having to maintain a small buffer on the server to "schedule" these packets, a buffer which can drift one way or other over time and introduces another headache. Giving all sub-packets of a video frame the same SRT send time does make sense though, for sure, as there is no benefit to waiting slightly longer for the last packet to arrive before releasing it to the player. *on Apple platforms. This approach won't work for Windows as the default granularity seems to be 15ms, which is way too high. |
Wow, you stole one of my ideas, buddy! XD But then, if it's not MPEG-TS, and you are using the idea of sending difference frames only, except for the first-after-discont I-Frame, then you should have even less problems with splitter synchronization. All difference frames should be able to be sent long ahead of their play time, so there's quite a lot of jitter buffer. For such type of transfer I think SRT should better support a little different working mode: Apply a delay as it normally happens, as long as you are waiting for the first I-Frame (or simply you send it as a first data portion anyway), and AFTER the receiver has confirmed its reception, the application should set the "immediate mode", that is, data are delivered to the application from SRT without any delay, just as soon as they are available. Waiting in the buffer for the "play time" should only happen in case of lost packets, that is, you apply a drop only when the play time comes for the packet following the drop gap, but when the packet is available in the bufffer's first cell, the application gets it, no matter if its play time has come or not. This mode would still require a cooperation with the application, so that the application can also restore the "normal live mode" as long as it detects that the next I-Frame is coming - but that's not a problem because the application will receive this packet anyway, while it's the first from the bunch, so it can then set the normal mode again before the last packet of the I-Frame is delivered. This can just as well be done a little bit differently, that is, the immediate mode is set for the whole time, but the application receives a packet always immediately, except that it gets also the delivery time information. Then the application can decide for itself whether this packet should be sent to the splitter right now (that is, when exactly it will be submitted to the decoder) and up to which time it has to wait for it. I have seen earlier no sense in doing this because it complicates the application development, but maybe it could be helpful in certain use cases. |
If you mean the idea of sending an I-frame only at the start or on a discontinuity then it would seem an ideal enhancement to SRT, as I assume most use-cases of SRT are one-to-one transmission? (I'm not sure on that)
Well, not if you want to have as low-latency as possible from capture-encode-transmit-receive-decode-display.
It makes sense, but perhaps rather than the application set a mode, it would be a flag set on the packet on send? i.e. scheduled_packet, meaning, deliver this according to its timestamp vs deliver this to the application asap. Besides, if all the packets for a video frame were given the same send time, does this add anything, as you'd get all the subsequent packets as soon as they are available in any case? |
Yes, that one. It came to my mind some time ago that MPEG-TS was intended with the use of UDP possibly with FEC, that's why you need the I-Frame refreshed once per a time and the configuration tables repeated. But if you have a reliable connection with SRT, you can get the most extensive information just in the beginning, and then simply send only difference frames.
Exactly when you want as low-latency as possible. Encoding and sending an I-Frame will take a lot of time anyway and you will have to spend that time anyway, including on the reception side before you send it to the encoder. But sending next difference frames will take less time and therefore they can be submitted to the decoder way earlier than it happened to the I-Frame.
Could be, but this requires some changes, if not inventions, in the protocol. While this immediate mode can be simply implemented in the library.
Hmm, you know, I didn't think about it. But you may be right. There's just one small problem though - earlier delivery times may mean earlier play time for a packet, which means that an earlier drop preference may kill all efforts of retransmission. What I was talking about was that the ready-delivery can happen earlier (once the packet is available), but drop preference remains the same as it was - SRT would wait for the proper time to play until it delivers a packet after drop. Another thing that I have missed a bit is that for such a protocol you should unfortunately turn off TLPKTDROP. It was relatievely harmless for MPEG-TS (a drop can spoil the video only up to the next I-Frame), but in a system where you have only one I-Frame, one dropped packet can ruin the whole tapestry. The only way to respond to a choking network could be to drop frames on the sender side and enforce discontinuity (possibly with changed parameters). |
Again though, if you want latency to be absolute minimum then you want to deliver complete frames on-time. Not early, not late, just on time. If you deliver early, then what do you do? Well, if you play them early, then what happens when the next one is on-time? There is potentially a gap in playback. So maybe you buffer the early frame, but then what have you gained, SRT could have just held onto it until the play time... Any difference in decoding time between I-frame and non-i-frame is negligible with hardware decoding. Maybe we're talking a little bit at cross-purposes here, I'm not sure :) Either way though, at some point a more accurate release of packets would be helpful. I think maybe something like USE_BUSY_WAITING applied to tsbpd release would be a start - but as mentioned above, it still doesn't seem to stop "over sleep" on Apple platforms anyway. I found that a hybrid approach - iterative half-sleep + busy wait does work, but it seems annoying to have to have any element of busy wait on Apple platforms if the iterative half-sleep is sufficient. So maybe the busy wait is a Windows-only thing (I'd be interested to see if applying the iterative half-sleep approach would improve send time accuracy on non-Windows platforms and negate the need for a busy loop). |
Whatever will be done with the frames that your application receives, it's up to your application. I'm talking about SRT and whether it can deliver earlier the packets for particular frame so that your application is given some advantage of time that it can just as well waste itself, if it wishes, or do something else. If SRT delivers these packets exactly on time, it doesn't leave the application any choice. Playback is a completely differnet topic - this should be done in the rhythm of the frame timestamps, always. Not at the time they come out of the decoder, should they come any earlier in that situation. |
Yep. I think the point I am making is that I don't want the application to have to make a choice. A choice means to play or not, to decode or not, and likely involves another buffer.
Interestingly, in my use-case, with my current test, I am not even using the timestamps. I know that sounds peculiar, but with the latency now so low, and hardware encoding and decoding on each side, CBR as the encode mode, and with more accurate SRT tsbpd release, all I need to have is a three-frame video frame buffer; I essentially populate this buffer as frames come in, and for each screen refresh I play back the next frame by deciding which frame number should be played at the current screen refresh (taking into account any mismatch between refresh rate and frame rate). Any variation between decode-time, or whatever else, is "absorbed" by the small video frame buffer. Audio works similarly. I previously had exactly that - waiting until a frame's play time according to PTS, but that is a buffer, and that buffer gets screwed by drift between SRT packet release time and play time according to PTS. So then we're back to managing that buffer as it fluctuates. |
Just picking up on this, as it's an interesting concept. Some brief background info; my application allows control and streaming of a remote TV source. For example, I have an Apple TV setup in my brother's house in Stockholm (I am in London, latency is around 40ms) and this allows me to stream and control that Apple TV and watch stuff that I can't necessarily watch here in the UK. I have another similar setup in Istanbul (latency is around 80ms). The app needs to allow responsive control of the remote UI, thus low latency is critical. At the same time, when the user is not navigating the UI one wants a stable glitch-free stream. At the moment I make a minor adjustment to my final video frame "jitter buffer" depending on which mode we are in; if the user has controlled the remote source in the last 5s I reduce the jitter buffer to the minimum, otherwise it's larger, so that any fluctuations in decode time, or packet reception is smoothed out. However, I'm still currently limited by whatever the SRT latency is set to for the stream (and it's messy to restart the stream with a different latency etc). So in the case of Istanbul, it's around 320ms. What I'm wondering is if I could use the sort of setting you describe above to allow receiving packets asap when in low-latency mode, and present them asap; so it'd lead to a more jittery picture but only while controlling the remote source; once the user stops controlling the source it would then revert to normal mode where packets are released on time, or, as you suggest, they are always released asap, but with timing information so that my application can sleep until the play time instead. This would actually be preferable as I would probably only want to present the video packets asap in this mode, and keep the audio stream stable as it is more annoying when audio breaks-up and has to insert silence than a jittery picture. |
I am streaming from a custom server application to a custom player, and the following scenario is over the LAN:
Latency set to 2000ms - no packets lost, video and audio packets decoded and playback is smooth
Latency set to 100ms - no packets lost, video and audio packets decoded but playback is not smooth
Latency set to 200ms - no packets lost, video and audio packets decoded but playback is better than with a 100ms buffer but still occasional stutter
The reason it is not smooth seems to be that from time-to-time some packets are received from SRT either earlier or later (I'm not sure which, yet) when using the small buffer so my video player has to adjust to this and the result is stutter.
This probably isn't a new thing; lately I've been trying to see how small I get end-to-end server-to-player latency. Previously I had a 100ms "jitter" buffer in my video player code, which likely concealed this oddity. Having removed this buffer, and trying to feed packets more directly into my video player code in order to reduce latency, I now see this issue - but only with a small buffer, and that's the bit I don't understand.
This was observed on an Apple TV. The server is Linux Jetson (ARM).
The text was updated successfully, but these errors were encountered: