Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do you send the same file over multiple sockets? #602

Closed
AsafFisher opened this issue Sep 12, 2023 · 23 comments
Closed

Why do you send the same file over multiple sockets? #602

AsafFisher opened this issue Sep 12, 2023 · 23 comments
Labels

Comments

@AsafFisher
Copy link

AsafFisher commented Sep 12, 2023

Describe the bug

I do not understand why you use multiple sockets to transfare files... It seems to me like one socket for data transfare would be enough

@AsafFisher AsafFisher added the bug label Sep 12, 2023
@schollz
Copy link
Owner

schollz commented Sep 20, 2023

It can be faster. Try a/b testing yourself to see if it does - you can setup a croc relay with one port and then one with more than one port. I did testing back in 2019-2020 and it was faster with four ports for windows/linux machines on networks I needed to use it.

@AsafFisher
Copy link
Author

AsafFisher commented Sep 28, 2023

Hmm but that makes no sense... it should be the same speed...
Are you sure there is no other reason for the non-multiplex way to be slower?

Using 4 sockets does not mean that you send 4 packets at once... the packets are still sent on a serial manner...

  • Maybe you didn't use async on the single thread version? And then the multithreaded version had more chances of read while on the single thread version some sockets blocked and others didn't?

  • Maybe there is an overhead after each read that causes the reads to be less frequent? And then you need to apply a two threaded program, one that reads packets and one that proves as them?
    (I have a feeling that that's what happened)

@schollz
Copy link
Owner

schollz commented Sep 28, 2023

Did you try it?

@AsafFisher
Copy link
Author

AsafFisher commented Sep 28, 2023

Not yet but when we talk theoretically this should not improve performance... also if it would platforms like Netflix would've done that

(Btw side note just because we are talking, I implemented pake/siec in rust if you want to see)

@schollz
Copy link
Owner

schollz commented Sep 28, 2023

Yeah would love to see a SIEC implementation! Is it public?

The benefits might be OS specific, but they were beneficial with my testing between Windows and Linux. Maybe it's different now, it deserves testing.

@shidenkai0
Copy link

Hmm but that makes no sense... it should be the same speed... Are you sure there is no other reason for the non-multiplex way to be slower?

Using 4 sockets does not mean that you send 4 packets at once... the packets are still sent on a serial manner...

  • Maybe you didn't use async on the single thread version? And then the multithreaded version had more chances of read while on the single thread version some sockets blocked and others didn't?
  • Maybe there is an overhead after each read that causes the reads to be less frequent? And then you need to apply a two threaded program, one that reads packets and one that proves as them?
    (I have a feeling that that's what happened)

Just dropping into the conversation as I stumbled upon here randomly while scrolling through the issues. Not a contributor of this repo but I just wanted to add my 2 cents on the networking side of things.
Using multiple connections to speed up file transfers is an old trick that has been in use since the time of download managers back in the early 2000s.
It's quite a common misconception to think that using a single socket should provide the same speed as using multiple sockets, however in practice, this is often not the case, and here is why:

  1. Parallelism:

    • When using multiple sockets, different parts of the file can be sent in parallel over different connections. This parallel transmission can potentially speed up the overall file transfer, especially on networks with high latency or packet loss.
  2. Serial vs Parallel Transmission:

    • You mentioned that packets are sent in a serial manner, which is true for a single socket. However, when using multiple sockets, each socket can transmit its packets independently and concurrently with the others, essentially moving from a serial to a parallel transmission model.
  3. Congestion Control:

    • TCP has a "slow start" phase to discover the available bandwidth and to avoid congesting the network. During slow start, the window size starts small and doubles with each successful acknowledgment, until it reaches a threshold or encounters packet loss. By using multiple sockets, each with its own window size, we can essentially bypass the slow start phase for each individual connection, potentially allowing for faster transmission, especially for smaller files, where the slow start phase may be more prevalent.
    • Each socket operates its own congestion control mechanism. In networks with good conditions, having multiple congestion windows (one per socket) allows for more data to be in flight simultaneously, which can improve the utilization of the available bandwidth.
  4. Error Handling:

    • In a scenario where packet loss occurs, a single socket would need to wait for retransmissions, potentially stalling the transfer. With multiple sockets, if one socket encounters an error, the others can continue transmitting data, thus making the process more resilient to network issues.
  5. Async and Multithreading:

    • While implementing async operations or multithreading can improve the efficiency of a single-socket setup, the potential benefits of having multiple sockets (and thus multiple connections) often surpass those optimizations, especially in challenging network conditions.
  6. Overcoming Bandwith Delay Product limitations (e.g. on networks with high latency for instance):

    • The Bandwidth Delay Product (BDP) is the product of a network link's capacity and its round-trip time (RTT). A single TCP connection may not fully utilize the available bandwidth, especially in high-latency networks, even if the bandwidth is high (think a satellite link with 100 Mbps second bandwidth but 100s ms latency). Multiple sockets can help overcome this limitation by allowing more data to be in flight at the same time, thus better utilizing the available bandwidth.

@AsafFisher
Copy link
Author

Yeah would love to see a SIEC implementation! Is it public?

The benefits might be OS specific, but they were beneficial with my testing between Windows and Linux. Maybe it's different now, it deserves testing.

Yes!
https://github.com/AsafFisher/rust-croc

@AsafFisher
Copy link
Author

AsafFisher commented Oct 4, 2023

Hmm but that makes no sense... it should be the same speed... Are you sure there is no other reason for the non-multiplex way to be slower?
Using 4 sockets does not mean that you send 4 packets at once... the packets are still sent on a serial manner...

  • Maybe you didn't use async on the single thread version? And then the multithreaded version had more chances of read while on the single thread version some sockets blocked and others didn't?
  • Maybe there is an overhead after each read that causes the reads to be less frequent? And then you need to apply a two threaded program, one that reads packets and one that proves as them?
    (I have a feeling that that's what happened)

Just dropping into the conversation as I stumbled upon here randomly while scrolling through the issues. Not a contributor of this repo but I just wanted to add my 2 cents on the networking side of things. Using multiple connections to speed up file transfers is an old trick that has been in use since the time of download managers back in the early 2000s. It's quite a common misconception to think that using a single socket should provide the same speed as using multiple sockets, however in practice, this is often not the case, and here is why:

Hmm I didnt read the whole thing because its 2AM, but as for point 1 and 2, its not true.
Even if you have multiple socket transmitting from different threads, you have only a single network device that these packets go through...
Lets assume your network device has multiple cores and can transmit in parallel (which is wierd) you still have a single layer 1 limitation, you cant send multiple packets on a single physical link.... You just cant send serial data with multiple sockets and expect it to be faster. It might be faster because the CPU gives more runtime to your threads but this is due to schedualing not parallelism.
litterally the linux kernel e1000's driver has a single serial rx/tx buffer to send and receive packets

  1. Parallelism:

    • When using multiple sockets, different parts of the file can be sent in parallel over different connections. This parallel transmission can potentially speed up the overall file transfer, especially on networks with high latency or packet loss.
  2. Serial vs Parallel Transmission:

    • You mentioned that packets are sent in a serial manner, which is true for a single socket. However, when using multiple sockets, each socket can transmit its packets independently and concurrently with the others, essentially moving from a serial to a parallel transmission model.

Also on most OS's you can change the window size using setsockopt and such... Its not a good reason to use multiple threads to either trick the OS to get bigger windows size nor get more runtime (should use something like nice instead)

  1. Congestion Control:

    • TCP has a "slow start" phase to discover the available bandwidth and to avoid congesting the network. During slow start, the window size starts small and doubles with each successful acknowledgment, until it reaches a threshold or encounters packet loss. By using multiple sockets, each with its own window size, we can essentially bypass the slow start phase for each individual connection, potentially allowing for faster transmission, especially for smaller files, where the slow start phase may be more prevalent.
    • Each socket operates its own congestion control mechanism. In networks with good conditions, having multiple congestion windows (one per socket) allows for more data to be in flight simultaneously, which can improve the utilization of the available bandwidth.

is a good point, but I havn't seen a dropped packet since like 5 months ago...

  1. Error Handling:

    • In a scenario where packet loss occurs, a single socket would need to wait for retransmissions, potentially stalling the transfer. With multiple sockets, if one socket encounters an error, the others can continue transmitting data, thus making the process more resilient to network issues.
  2. Async and Multithreading:

    • While implementing async operations or multithreading can improve the efficiency of a single-socket setup, the potential benefits of having multiple sockets (and thus multiple connections) often surpass those optimizations, especially in challenging network conditions.

I dont use satalite I have fiber optics but I still dont understand why it matters as long as you can really send in parallel on a physical link

  1. Overcoming Bandwith Delay Product limitations (e.g. on networks with high latency for instance):

    • The Bandwidth Delay Product (BDP) is the product of a network link's capacity and its round-trip time (RTT). A single TCP connection may not fully utilize the available bandwidth, especially in high-latency networks, even if the bandwidth is high (think a satellite link with 100 Mbps second bandwidth but 100s ms latency). Multiple sockets can help overcome this limitation by allowing more data to be in flight at the same time, thus better utilizing the available bandwidth.

I bet you that I can make a single threaded async version of it in rust that is faster (am actually working on it).
Also, and I dont like playing this card:
https://stackoverflow.com/questions/29277554/is-transmitting-a-file-over-multiple-sockets-faster-than-just-using-one-socket

@ferebee
Copy link
Sponsor Contributor

ferebee commented Oct 5, 2023

That’s great, but did you test it? It looks like you can disable the parallel transmission with the option --no-multi.

I used a 670 MB file, macOS on both ends, 100 Mbit/s downstream DSL on the receiving end, multiple runs.

With defaults, multiplexing enabled, it took an average of 78 seconds. Without multiplexing, it took 92 seconds. Seems worthwhile to me.

@AsafFisher
Copy link
Author

AsafFisher commented Oct 5, 2023

I understand that it's faster rn... My feeling is that it's faster not because the multiplexing but because of the design or some other scheduling reasons (maybe try benchmarking on a single core with nice -20 croc). I might be wrong but that's my feeling (:

Anyways I'm planning on investing that anyway...

Did you make the time benchmarking start right when the file transmission start?

In general just because something works faster doesn't mean it's the solution to that problem

@ferebee
Copy link
Sponsor Contributor

ferebee commented Oct 5, 2023

Sure! I did

time croc -yes 1234-xyz

on the receiving end each time, and alternated sending the same file with and without --no-multi. Here are the times in seconds.

no-multi multi
101 82
94 75
92 82
90 73
79 76
97 78
average 92 78

Why not run some tests yourself? And I’m sure if you can build something that’s consistently faster than the current code, people will be interested to see your test results.

@AsafFisher
Copy link
Author

Sure! I did

time croc -yes 1234-xyz

on the receiving end each time, and alternated sending the same file with and without --no-multi. Here are the times in seconds.

| | no-multi | multi |

| ---: | ---: | ---: |

| | 101 | 82 |

| | 94 | 75 |

| | 92 | 82 |

| | 90 | 73 |

| | 79 | 76 |

| | 97 | 78 |

| average | 92 | 78 |

Why not run some tests yourself? And I’m sure if you can build something that’s consistently faster than the current code, people will be interested to see your test results.

I don't have a lot of time (: + I believe you when you say it's faster... but the math isn't mathing for me 😊
As I said I will take a look for why this happens but yah

@AsafFisher
Copy link
Author

Hey so I think the decryption part in receiveData is the bottle neck, if you'd send the data received to a different decryption thread and keep reading packets it might be faster in a single threaded environment then on multithreaded one

@ferebee
Copy link
Sponsor Contributor

ferebee commented Oct 5, 2023

It’s great that you want to speed up croc, but a concrete implementation might be more helpful than unproven hypotheses.

You can disable compression with --no-compress.

I ran another quick test, different file and different endpoint, so not directly comparable to the last.

Compression disabled
default multiplexing: 76 seconds
--no-multi: 111 seconds

@schollz
Copy link
Owner

schollz commented Oct 5, 2023

Thanks so much @ferebee for running some experiments! A 16% improvement seems pretty good.

Also thanks @shidenkai0 for weighing in with a very detailed overview!! That is greatly appreciated.

@AsafFisher I went ahead and tested your hypothesis about decryption with and without multiplexing. I removed encryption and tried sending files. Without encryption, with multiplexing: 105 seconds. Without encryption, without multiplexing: 115 seconds. Not a 16% improvement like @ferebee, but an improvement nonetheless with multiplexing and without encryption.

@AsafFisher now its your turn! Please try running your own tests, see for yourself.

@ferebee
Copy link
Sponsor Contributor

ferebee commented Oct 5, 2023

Oh my, there I was looking at compression while @AsafFisher was talking about encryption. Good to see the results hold up anyway.

I’ll mention that in the ’90s we were developing some custom software to send 3D rendering jobs over multiple ISDN lines in parallel. We had several well-reasoned theories about how the Linux TCP/IP stack would behave, and they were all wrong. We ended up multiplexing.

@AsafFisher
Copy link
Author

AsafFisher commented Oct 6, 2023

Thanks so much @ferebee for running some experiments! A 16% improvement seems pretty good.

Also thanks @shidenkai0 for weighing in with a very detailed overview!! That is greatly appreciated.

@AsafFisher I went ahead and tested your hypothesis about decryption with and without multiplexing. I removed encryption and tried sending files. Without ecryption, with multiplexing: 105 seconds. Without encryption, without multiplexing: 115 seconds. Not a 16% improvement like @ferebee, but an improvement nonetheless with multiplexing and without encryption.

@AsafFisher now its your turn! Please try running your own tests, see for yourself.

I'll do my tests sometime, anyways if it's possible to do a decryption thread and make the file transfer even faster so that's a good outcome (:

@schollz
Copy link
Owner

schollz commented Feb 8, 2024

its been four months @AsafFisher, what were the results of your tests?

@AsafFisher
Copy link
Author

What is the file size you used?

@schollz
Copy link
Owner

schollz commented Feb 9, 2024

don't remember

@AsafFisher
Copy link
Author

AsafFisher commented Feb 9, 2024

On my rust client, using your relay at croc.schollz.com it takes 4.8 seconds to transferer 670 MB (while encrypting the data using pake). I use async programming with one thread and one socket for data transfare.
I am checking it again now just to make sure its correct because my benchmark is an order of magnitude faster than @ferebee 's
image

@schollz
Copy link
Owner

schollz commented Feb 9, 2024

you need to test croc between the same two computers against your result. otherwise it is not controlled against all the network infrastructure

@schollz
Copy link
Owner

schollz commented May 23, 2024

@AsafFisher its been a bit. have you been able to run tests comparing multiplexing and not multiplexing on the same system?

@schollz schollz closed this as completed May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants