Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between using make_shared on distributed matrix vs. solver #248

Open
allaffa opened this issue Jan 25, 2023 · 1 comment
Open

Comments

@allaffa
Copy link

allaffa commented Jan 25, 2023

I see that there are different options proposed to solve a linear system with a distributed matrix in different examples.

Option 1 is described in this example:
https://github.com/ddemidov/amgcl/blob/master/examples/mpi/mpi_solver.cpp

  1. Reading chunks of a matrix across different processes
  2. Use make_shared on the solver
solve = std::make_shared<Solver>(comm,
                amgcl::adapter::block_matrix<val_type>(std::tie(chunk, ptr, col, val)),
                prm, bprm);

Option 2 is described in this example:
https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html

  1. Reading chunks of a matrix across different processes
  2. Create the distributed matrix from the local parts.
auto A = std::make_shared<amgcl::mpi::distributed_matrix<DBackend>>(
            world, std::tie(chunk, ptr, col, val));
  1. Initialize the solver without calling make_shared anymore:
prof.tic("setup");
Solver solve(world, A);
prof.toc("setup");

I have two questions to ask:

Question 1: Are the two options equivalent, or should one be preferable over the other depending on the situation (e.g., scale, sparsity pattern, ...)?
Question 2: In the example https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html, the matrix is re-partitioned before being passed as argument to the solver's constructor. Is this repartition needed, or can one use the original partition as well?

if (world.size > 1) {
        prof.tic("partition");
        Partition part;

        // part(A) returns the distributed permutation matrix:
        auto P = part(*A);
        auto R = transpose(*P);

        // Reorder the matrix:
        A = product(*R, *product(*A, *P));

        // and the RHS vector:
        std::vector<double> new_rhs(R->loc_rows());
        R->move_to_backend(typename DBackend::params());
        amgcl::backend::spmv(1, *R, rhs, 0, new_rhs);
        rhs.swap(new_rhs);

        // Update the number of the local rows
        // (it may have changed as a result of permutation):
        chunk = A->loc_rows();
        prof.toc("partition");
    }

Thank you again for your support and timely response to the questions :-)

@ddemidov
Copy link
Owner

The difference between using shared_ptr and not is just a matter of convenience. Look closely into what's happening in the example on your first link: the solver is constructed inside the function and is returned outside for further use. Shared pointer allows to convenientrly transfer ownership in this case, so that the solver is properly destroyed and no memory is leaked when it is no longer used. Also, it makes sure that just the pointer to the solver is copied around instead of the large structure. Internaly, the two options are completely equivalent.

Re-partitioning is the more beneficial the larger the number of your MPI processes. You can get away with naive partitioning (just reading chunks of consecutive rows of the input system) for a small (probably up to 8-16) number of processes. When the number of processes grows, the ratio of boundary-to-internal points on the subdomains grows, so that the communication becomes the bottleneck of the solution. Also, using proper partitioning improves the convergence speed and reduces the number of required iterations. This is briefly discussed around Fig. 3 here: https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants