Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unit testing #143

Open
jedbrown opened this issue Dec 20, 2022 · 4 comments
Open

Support for unit testing #143

jedbrown opened this issue Dec 20, 2022 · 4 comments

Comments

@jedbrown
Copy link
Contributor

I would like to enable MPI unit testing and doctesting, with IDE integration and command-line tooling. I found the rusty-fork library, which I think is a viable strategy. Unfortunately, it is unmaintained, and thus probably better to fork into an MPI-equipped testing crate. Also, nextest is fast, produces nice output, and supports tests that use a different number of slots, thus parallelize while preventing/limiting oversubscription. In all cases, we need a macro that integrates with #[test].

How to spawn

  1. Use mpiexec, which we need to identify (heuristics/configuration/environment variable) and handle any special options. Cannot give us an intercommunicator for collating results. This is the best supported approach as far as MPI implementations are concerned.
  2. Use MPI_Comm_spawn to launch the parallel job. This allows the caller to interact with the parallel job via an intercommunicator. Unfortunately, spawning from a singleton init is not well supported by MPI implementations. For example, MPICH needs the environment to be carefully crafted because it just runs the first mpiexec from PATH. Implementations could make this a reliable approach, with better performance than using external job launchers and without sensitivity to environment.
  3. "Fork" (actually spawn self like rusty-fork does because fork() isn't portable to Windows) before MPI_Init. Use MPI_Open_port and send the port to the child (e.g., over stdin); use MPI_Comm_accept on the parent and MPI_Comm_connect on the child. This creates an intercommunicator similar to MPI_Comm_spawn. Support for this feature requires more environment shenanigans, such as running ompi-server as a daemon and crafting environment variables to interact with the server. I think this also doesn't solve problems for us at present, but it could if MPI implementations avoided the need for these external channels.

Collective assertions

Standard assert_eq!(left, right) and friends can leave MPI processes hanging, and don't produce nice output in case they diverge. I think a collective coll_assert_eq!(comm, left, right) that collates output is desirable. This could be implemented using an intercommunicator MPI_Gather if we used the MPI_Comm_spawn model above, but this is likely not feasible with current implementations.

Cc: @jtronge @hppritcha

@2pt0
Copy link

2pt0 commented Apr 12, 2023

Hi. I am also interested in this. I'm wondering if a simpler solution is possible.

Consider segregating MPI tests into their own test crate and using a test harness (check out this article) which calls MPI_Init before the MPI tests are executed. Each MPI test would receive the Universe and perform its test. However, we must consider the following

  1. Per the standard, MPI_Init can only be called once per process. We could not loop over the tests and call MPI_Init inside the loop to create a clean Universe.
  2. Calling MPI_Init outside of the loop and handing the Universe object to each tests allows a test to mutate the hidden MPI state (split communicators, create reduction operations, etc.). Subsequent tests would not have a clean Universe.

Following the approach of the second option (following the linked article) would look something like

struct Test<'a> {
    name: String,
    test: Box<dyn Fn(&'a Universe)>;
}

fn main() {
    let universe = mpi::initialize().unwrap();

    // Assemble tests in a `Vec<Test<'a>>` somehow ...

    for test in tests.into_iter() {
        (test.test)(&universe)
    }
}

Finally, the tests must be executed sequentially and not on individual threads as is the default behavior of cargo test and the collection of tests must maintain the same order on each MPI process to ensure correctness.

@jedbrown
Copy link
Contributor Author

How do we get a parallel job in your model? Using MPI_Comm_spawn?

I'm familiar with test harnesses and tinkered with them to some extent. I'm a big fan of nextest, which is way faster than standard cargo test due to better parallelism/use of processes. It supports threads-required so we can avoid oversubscription. I think the harness should mpiexec each test case, mostly due to MPI process management implementations being somewhere between fragile and buggy. https://nexte.st/book/threads-required.html

@2pt0
Copy link

2pt0 commented Apr 12, 2023

Not sure. It's desirable to have an interface like cargo mpirun test. The problem is that I believe Rust generates a binary for each test (or is it for each test module?), so there would have to be a way to call mpirun for each test binary.

On the other hand, if it's possible for the test harness to call mpirun for each test, I think that should be the first implementation

@jedbrown
Copy link
Contributor Author

The interface would be cargo nextest run. Custom test harnesses can be implemented using libtest-mimic or manually (nextest docs). If using libtest-mimic, it'll require a small modification to be able to run the inner job without printing "harness" stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants