New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packets are dropped when net buffer is full even if receiver is trying to read #1341
Comments
Another solution we talked about is to smear on the sender side; the sender shouldn't actually be able to send all 100 packets within a single nanosecond should it? |
I think the sender-side approach breaks in case you have a large number of senders sending packets to the same receiver socket all at the same instant. For UDP where you don't have a socket buffer per "connection pair", I think this breaks. (For TCP that would only happen on listening sockets, and then those packets don't enter the read queue (the control packets to create the connection don't contain user data) so we should be OK.) |
In some sense that seems realistic - if many senders each send at the receiver's full capacity simultaneously, some packets are going to be dropped, right? OTOH in a real network there'd be some latency jitter, and maybe some "smoothing" happening where packets from multiple senders arrive at a single upstream router and are buffered before being retransmitted, making it less likely that they'd all arrive "simultaneously" in practice. Maybe your proposal is an easier solution than trying to model those things :) |
Maybe just using a bigger "receive" buffer would work, since it's actually modeling not just the local buffer, but all of the upstream router buffers? (EDIT: and currently the sender's 'send' buffer, since we're not buffering/smearing on the sender side) |
We do model the upstream routers and upstream router buffering in We did verify that bigger
More generally, I think the problem is an artifact of how we choose to line things up in our event queue for a given nanosecond, and we could certainly line them up differently per my suggestion. The network interface modeling won't allow us to exceed our configured network rate in any case, so we don't have to worry about that. A nice side effect of my suggestion is that the Shadow network sockets may actually need to store |
Just documenting here that whether the new event gets added before or after the other 99 packets depends on the relative host IDs. If the receiving host has a host ID smaller than the host ID from the packets' source host, then the new event will be added before the other 99 packet events. Otherwise the new event is added after the other 99 packet events. This isn't ideal since the network behaviour depends on the host order, and changing the hostnames can significantly change the network throughput and drop rate in some cases. |
I'm not sure I'm convinced anymore :) If the application is set up to send for example 100,000 bytes at a time, but only uses a receive buffer size of 10,000 bytes, that seems like a problem with the application, not Shadow. It would be nice to know how this example application would perform in the real world. Would the OS really be able to schedule the process fast enough for it to handle all the packets, or would it have the same issue where all the packets arrive before the OS has a chance to schedule the process and it would drop some? I agree that it seems like network jitter would play a role here, and would maybe be a more realistic way of solving this.
I don't see how we could choose a value of |
I agree that it would be more realistic, but it's not a viable way to solve the problem without making runtime performance significantly worse. Conceptually, each host currently organizes their packet sending into "batches" - they send all the packets they are allowed to send for one millisecond (enforced by refilling our bandwidth token buckets every 1 millisecond). We get much better runtime performance by batching this way. The alternative suggestion as I understand it is to basically reduce the batch time from one millisecond down to the time it takes to send a single packet (based on the host's configured bandwidth) - in other words, eliminating batched sending. I agree that this would introduce a network jitter effect and be more realistic, and in fact this is how I first implemented the sending behavior over 10 years ago. However, without batching, the number of send events required to send the same number of packets would significantly increase and very likely reduce runtime performance (which is why I moved to batch sending in the first place). I don't think it's even worth checking, but if you wanted to you could reduce the batch time by reducing the token refill interval in this function: shadow/src/main/host/network_interface.c Lines 82 to 124 in 578d972
|
It does seem to me that the minimum buffer size required to enable smooth sending behavior in Shadow would be larger than it would be in Linux, because Shadow is operating with a batched sending algorithm and Linux is operating in real time. So I think we could choose any |
We allow for packets up to 64 KiB with UDP, so we would need the receive buffer size to be at least 64*p KiB. But what should we do if the application sets its own non-default buffer size? For example if |
Changing the buffer size like that does seem odd. The way I would implement it is dynamic, i.e., don't use a fixed
Then we just need to make sure the socket buffers can always hold at least one packet (which we should probably be doing in any case). So if the application tries to set a buffer size of Does this work? |
Okay this seems like it would work. Right now there's no way for the code to get the socket from the event, since the event has a callback and the callback passes the packet to the network interface. We'll need to change the host's event system to deal with packets and the network interface directly and only use events for local tasks.
I don't see why we'd need to do this? If the buffer size is smaller than the packet size I think we want to just drop the packet, and I think that's compatible with the above code. |
OK, sounds good! |
We should enable this test when we fix shadow#1341
Once we get this working, we should check if we want to increase the speed threshold in the tgen tests from 85% up to something higher (e.g., 90%). |
Suppose a UDP buffer fits 10 packets. The receiver called read and is waiting for incoming packets. The sender sent 100 packets all at the same time.
Those 100 packets arrive at the receiver all at t=0. After the first one gets added to the buffer, we notice that the receiver needs to be woken up to read. We add an event to the queue to wake up the receiver. But if it gets added after those other 99 packet events that are already in the queue, then:
Clearly, we need a way to wake up the receiver in between some number of packet receive events, so they get a chance to read out some of the packets, so we don't drop any of them.
We have verified that an infinite size buffer fixes the problem.
One possible way to deal with this is that each host keeps 2 separate queues, one for packets (which other hosts can insert into), and one for local host tasks (which a host can only insert into itself). Then at a specific nanosecond
t
and packet processing thresholdp
, we:t
p
packets with time=t
t
This gives us better control over executing simulator tasks vs processing new packets.
The text was updated successfully, but these errors were encountered: