Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code formating and refine #4

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Conversation

ssrlive
Copy link

@ssrlive ssrlive commented Dec 17, 2023

update do latest mio and support windows.

@ssrlive ssrlive force-pushed the master branch 3 times, most recently from dbcdad2 to 75bcfbb Compare December 17, 2023 17:32
@flan
Copy link
Member

flan commented Dec 19, 2023

Hey, thank you very much for taking an interest in this project!

I would like to welcome the changes you have submitted, but I am afraid the sudden, and rather large, pull-request caught me off-guard. I have only just now finished reading through the changes you have made, so I was unable to respond sooner.

Usually, refactoring and alterations on this scale would come after some sort of discussion with the project's maintainer to determine what needs to be done and why. Because I do not yet know your intentions, I am afraid that I cannot accept this pull-request yet, but I would like to figure out how we can avoid creating an unnecessary fork of rperf.

Aside from a few small details, which I will comment on in the commits themselves, I do not see anything problematic, so the bulk of my concern rests with your statement about publishing your revisions to https://crates.io/ after one month has passed.

I do not see the value of rperf existing as a Crate. In my opinion, it is an application, not a library. Furthermore, neither I nor my team have the bandwidth to enable its use as a library: its internal interfaces are not documented for use by other developers, its implementation may change unexpectedly, and the amount of support requests that follow from having a supply-chain-hosted library with varying versions would likely kill the project.

If you need to have rperf exist as a public Crate, please explain why so we can figure out how it can be maintained going forward. cargo can pull from private repositories and other sources, so your build pipeline should be unaffected if one of those suits your needs, and it is a GPLv3 project, so you are free to fork it and distribute your own version if you require some changes that do not belong upstream.

Another detail of importance is attribution. If your changes are to be merged, for GPLv3 compliance (copyright, proof of ownership), it is important that you provide some sort of sign-off that indicates that you grant use of your changes to the project under the terms of the license agreement. This is usually as simple as putting your name and an e-mail address in the header of any files you have materially changed (a lot of your commits are whitespace and linting changes, so those do not need to be updated); if there is some reason for which you cannot do this, an online handle and a link to your GitHub profile should be fine.

Related to your recent changes for Windows, I ended up porting the TCP stream logic to socket2 when I tackled the problems associated with mio and Poll-reassociation, though it is possible that your tcp_stream_try_clone() implementation may serve as an alternate solution. See #3 and master...socket2 for details -- if you are experiencing any reliability issues, you may want to check your logic against that branch. I was planning to publish a new version based on it before the end of the year, now that I finally have access to a Mac so I can verify that macOS (Apple Silicon) works properly, but it appears that your code was derived from master, which is an older branch.

exiting.join().expect("unable to join SIGINT handler thread");
if service.is_err() {
log::error!("unable to run server: {}", service.unwrap_err());
std::process::exit(4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exit-code is important to help determine whether an environmental error has occurred. I must ask that it not be suppressed -- the changes here would replace it with 1.

let execution = client::execute(&args);
if execution.is_err() {
log::error!("unable to run client: {}", execution.unwrap_err());
std::process::exit(4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exit-code is also important.

} else {
use clap::CommandFactory;
let mut cmd = args::Args::command();
cmd.print_help().unwrap();
std::process::exit(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one, too. 2 is commonly used to indicate that a configuration error occurred and this exit-code is part of rperf's established process interface.

src/server.rs Outdated
poll.poll(&mut events, Some(POLL_TIMEOUT))?;
if let Err(err) = poll.poll(&mut events, Some(POLL_TIMEOUT)) {
if err.kind() == std::io::ErrorKind::Interrupted {
log::debug!("Poll interrupted: \"{err}\", ignored, continue polling");
Copy link
Member

@flan flan Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message isn't consistent with other logging statements in this project: they all start with a lower-case letter and employ "[state]; [infinitive-action]" grammar, rather than "[state], [declarative-decision]"

Perhaps "poll interrupted, \"{err}\" ignored; resuming poll"

@ssrlive
Copy link
Author

ssrlive commented Dec 20, 2023 via email

@ssrlive
Copy link
Author

ssrlive commented Dec 20, 2023 via email

@ssrlive
Copy link
Author

ssrlive commented Dec 20, 2023 via email

@flan
Copy link
Member

flan commented Dec 20, 2023

Ah, I'm sorry that the project gave the impression that it was abandoned.

It's an open project I maintain out of interest and on behalf of my employer, because iperf2 is very unstable and iperf3 is not well-suited to continuous operation, in addition to its flaws, regressions relative to iperf3, and very long patch-review process.

rperf is an important part of the products we develop, so it has some internal QA requirements, preventing experimental ideas from being published early and slowing down the official release cycle. That, combined with my needing hardware (and time) to ensure macOS can be supported, is why the branches from August and September have not yet been promoted. I do try to respond quickly to issues as they come up in all of my projects, though.

It hasn't received frequent updates because it hasn't really needed them. It has been very stable and close to feature-complete in all contexts in which my company has needed it, which is a relatively finite scope, given that its purpose is to test network reliability and Layer-3 networking is a well-understood problem that does not change very quickly.

As for distribution, my plan was to provide precompiled binaries for Microsoft and Apple platforms (and Linux above some reasonable version of libc, likely whatever shipped with Debian Bullseye) after the changes in the September branch had been validated through field-testing, bumping the version to 1.0 at the same time.

While I do see the appeal of making it easy to install on any system that has a Rust build-chain, I suspect the most common use-case is that of a network tech just wanting to run a test, probably carrying it on a USB drive, and, at least in my experience, the moment they have to think about the state of what's installed on their computer, they look for another option.

I view crates.io as being another thing to maintain, but it's possible that I just don't know enough about its update pipeline. I do worry that it would increase the amount of support that would be required without providing significant benefit to most users.

@ssrlive
Copy link
Author

ssrlive commented Dec 20, 2023

I have tested on macOS, it works fine.
But It's not fine on windows, it's always report timeout. it should be fixed.

@@ -42,10 +42,10 @@ pub fn send(stream: &mut TcpStream, message: &serde_json::Value) -> BoxResult<()
let serialised_message = serde_json::to_vec(message)?;

log::debug!(
"sending message of length {}, {:?}, to {}...",
"sending message to {} length {}, {:?}...",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comma or other delimiter here to break up the elements: "sending message to {}, length..."

}

/// receives the length-count of a pending message over a client-server communications stream
fn receive_length(stream: &mut TcpStream, alive_check: fn() -> bool, results_handler: &mut dyn FnMut() -> BoxResult<()>) -> BoxResult<u16> {
fn receive_length(stream: &mut TcpStream, alive_check: fn() -> bool, handler: &mut dyn FnMut() -> BoxResult<()>) -> BoxResult<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be usize?

The protocol itself encodes the length of a message into two bytes, so u16 covers all possibilities. I know that for most processor architectures, optimal performance is achieved when working at a native wordsize, but the performance cost of truncating a value here is negligible (it gets checked once, then it's discarded, and it's not like any mainstream 64-bit CPU doesn't have 16-bit register operations for comparison).

However, a bigger problem exists when working with 8-bit (or smaller) architectures, for which rperf has known use-cases. usize will be a single byte on those platforms, resulting in truncation of the full 16-bit value.

match serde_json::from_slice(&buffer) {
Ok(v) => {
log::debug!("received {:?} from {}", v, stream.peer_addr()?);
log::debug!("received from {} {:?}", stream.peer_addr()?, v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea about delimiting fields here.

I get that the more salient bit of information when looking at a stream of events is the source address (though if you have multiple active connections when debugging, you're already going to be dealing with a ton of information), but when you're only dealing with a single peer, which is more common when troubleshooting, seeing the message first may be more helpful because that's the part that is actually interesting.

If, as with the previous logging change, you want to keep this order of information, but also make it flow naturally in English, consider "received message from {}: {:?}"

@ssrlive
Copy link
Author

ssrlive commented Dec 22, 2023

Adjusted following suggestion.

src/args.rs Outdated
/// primarily to avoid including TCP ramp-up in averages;
/// using this option may result in disagreement between bytes sent and received,
/// since data can be in-flight across time-boundaries
#[arg(short, long, default_value = "0", value_name = "seconds")]
Copy link
Member

@flan flan Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short = 'O' is required here.

The unit-tests, and existing real-world code, are broken without it, so this seems like a (minor) refactoring regression.

src/args.rs Outdated

/// run in client mode; value is the server's address
#[arg(short, long, value_name = "host", conflicts_with = "server")]
pub client: Option<std::net::IpAddr>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're going for here, but this breaks compatibility with hostname-based resolution.

Not everything will, or should, be specified as an IP address.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, specifically a hostname as defined in https://datatracker.ietf.org/doc/html/rfc952 or an IP address, v4 or v6, per https://datatracker.ietf.org/doc/html/rfc1123#page-13, but I'm perfectly content for it to just be a string and for the user to figure it out if they enter something nonsensical, rather than having the tool try to hold their hand on something basic like this.

@flan
Copy link
Member

flan commented Dec 22, 2023

I'm not entirely sure where this broke, but https://github.com/opensource-3d-p/rperf/blob/socket2/test.py#L128 is failing.

======================================================================
FAIL: test_udp_reverse (__main__.TestIpv4.test_udp_reverse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/flan/projects/ssrlive/rperf/./test.py", line 143, in test_udp_reverse
    self.assertEqual(result['summary']['bytes_received'], result['summary']['bytes_sent'])
AssertionError: 0 != 1000224

It seems to be related to the way omit is being handled. The time-value can be increased to -t 10, but if -O 1 is specified, it's like the bytes-received event isn't propagated at all.

https://github.com/ssrlive/rperf/blob/master/src/protocol/results.rs#L648 is where the problem is manifesting, since the results only has a single interval for UDP receive:

{
    ...
    "streams": [
    {
      "abandoned": false,
      "failed": false,
      "intervals": {
        "receive": [
          {
            "bytes_received": 5001120,
            "duration": 10.000581741333008,
            "jitter_seconds": 4.642256044462556e-6,
            "packets_duplicated": 0,
            "packets_lost": 0,
            "packets_out_of_order": 0,
            "packets_received": 4140,
            "timestamp": 1703266093.4379778,
            "unbroken_sequence": 4140
          }
        ],
        //there should be ten of these, around one second each
        //instead, we only see one, of approximately ten seconds
    ...

This is probably happening because something changed in the UDP receive loop so it isn't emitting a result after a second (https://github.com/ssrlive/rperf/blob/master/src/stream/mod.rs#L26) has elapsed. Might have been an over-simplification in refactoring.

For the others, having delta values being a bit off in the tests isn't ideal, but I wouldn't consider them an outright failure, rather something to be assessed manually. A value of 0, however, indicates that something isn't working at all.

@ssrlive
Copy link
Author

ssrlive commented Dec 22, 2023

changed.

@flan
Copy link
Member

flan commented Dec 22, 2023

Verified that the tests are mostly working now, except for the UDP receive thing. The remaining issues seem to be related to the environment in which I'm conducting tests, since there are a bunch of tunnels.

It affects both forward and reverse streams, so it's almost certainly just that the loop needs to emit updates every time the interval elapses, like the TCP-receive loop.

@ssrlive
Copy link
Author

ssrlive commented Dec 22, 2023

So I don't know how to deal with it.

@flan
Copy link
Member

flan commented Dec 22, 2023

I'll try to get to it this weekend. It shouldn't be complex, but I'll need to set aside a bit of time to experiment.

I've still got a lot of work I need to finish this week and next, and I wasn't expecting to work on rperf at all before January, so I'm kind of scrambling to find time right now.

@flan
Copy link
Member

flan commented Dec 25, 2023

It doesn't seem to show up in this thread, but I pushed a fix for the UDP regression to your master branch: ssrlive@fb68b59

There was an inner loop that seems to have been introduced as part of removing mio.

All other tests seem to be operating within reason right now, so hopefully I'll be able to get non-Linux platform testing done this week, then I'll be glad to merge this, though I won't be ready to call it 1.0 until a few other people in my team at work have had an opportunity to try to break it.

@ssrlive
Copy link
Author

ssrlive commented Dec 26, 2023

Here is the result in my windows. it looks OK.

PS C:\rperf> cargo r -- -c 192.168.3.12 -u
==========
UDP send result over 10.11s | streams: 1
stream-average bytes per second: 124473.545 | megabits/second: 0.996
total bytes: 1259040 | per second: 124473.545 | megabits/second: 0.996
packets: 1220 per second: 120.614
==========
UDP receive result over 10.01s | streams: 1
stream-average bytes per second: 68575.377 | megabits/second: 0.549
total bytes: 686280 | per second: 68575.377 | megabits/second: 0.549
packets: 665 | lost: 555 (45.5%) | out-of-order: 0 | duplicate: 0 | per second: 66.449
jitter: 0.018021s over 86 consecutive packets

PS C:\rperf> cargo r -- -c 192.168.3.12 -u -R
==========
UDP send result over 19.38s | streams: 1
stream-average bytes per second: 87810.693 | megabits/second: 0.702
total bytes: 1701768 | per second: 87810.693 | megabits/second: 0.702
packets: 1649 per second: 85.088
==========
UDP receive result over 19.67s | streams: 1
stream-average bytes per second: 86504.380 | megabits/second: 0.692
total bytes: 1701768 | per second: 86504.380 | megabits/second: 0.692
packets: 1649 | lost: 0 (0.0%) | out-of-order: 0 | duplicate: 0 | per second: 83.822
jitter: 0.005764s over 1649 consecutive packets

@ssrlive
Copy link
Author

ssrlive commented Dec 26, 2023

And it's OK in macOS.

@flan
Copy link
Member

flan commented Dec 29, 2023

I've also been able to do some testing between Linux (amd64 and arm), macOS (arm), and Windows (x64 and arm) and what's in this branch seems to work, though I still need to let others in my team try to break it before finalising anything.

Still missing is some way for me to attribute your work, though. How would you like to be added to the relevant copyright lists?

Also, I see that you are working on a NAT-traversal branch. I'm interested in seeing how that turns out, but I am scheduled to be on vacation for the next two weeks, so aside from answering questions, I will not be able to do any testing or review any code that might come out of it.

@ssrlive
Copy link
Author

ssrlive commented Dec 30, 2023

I've completed support for NAT, just on the nat branch, and it seems to be performing well so far.

After the current merge work is completed, I will submit another PR and I will ask for your opinion whether to choose the refactor branch or the nat branch.

The refactor branch does not add new functionality, it just refactors the code to make it more readable.

@ssrlive
Copy link
Author

ssrlive commented Jan 13, 2024

Hi @flan , today is 2024. Now you can review this PR?

@flan
Copy link
Member

flan commented Jan 15, 2024

I'm back from vacation now and I've gone through the changes; I think the master branch is in good shape.

I'm going to try to arrange for some testing with our QA team on Thursday. If we can't find any broken edge-cases, I'll proceed with merging, though I do still need to know how to credit your contributions for GPL reasons.

At that point, we can either call it 1.0 and I'll provide binaries or we can defer the jump to 1.0 until after the NAT changeset is complete.

@ssrlive
Copy link
Author

ssrlive commented Jan 19, 2024

This step can apply after merged.

I do still need to know how to credit your contributions for GPL reasons.

It can be added my name next line of your name.

@flan
Copy link
Member

flan commented Jan 20, 2024

Testing on Thursday went well, so I'm ready to merge and drop my older branches.

I really do need to be able to credit you in some way before the code lands in the main repository, though, even if it's just attribution like ssrlive <https://github.com/ssrlive> instead of a more traditional name and e-mail address.

@ssrlive
Copy link
Author

ssrlive commented Jan 20, 2024

I hope that subsequent submissions will be merged before adding my signature.
Otherwise, it will cause many conflicts in subsequent merges.

@ssrlive
Copy link
Author

ssrlive commented Feb 10, 2024

beep

@flan
Copy link
Member

flan commented Feb 13, 2024

I'm still ready to merge this branch once I know how to provide attribution for your work.

I can put it into a staging location where the signature details can be added, and I can do that part on my own, so it doesn't need to be present in this repo. I just need to know what you want to see in the license headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants