Use of Matrix Market routines to distributed matrix reading across MPI processes #247

allaffa · 2023-01-25T18:00:26Z

Hello,

I am trying to read a matrix from a file with Matrix Market (MM) format within a distributed computing framework using MPI.
I familiarized with the methodology explained in the following example:
https://github.com/ddemidov/amgcl/blob/master/examples/mpi/mpi_solver.cpp

Following the example above, I implemented the following function that is supposed to read the matrix in parallel using MPI, distributing the matrix row-wise in chunks across the MPI processes:

ptrdiff_t read_matrix_market(
        amgcl::mpi::communicator comm,
        const std::string &A_file, 
	const std::string &rhs_file, 
	int block_size,
        std::vector<ptrdiff_t> &ptr,
        std::vector<ptrdiff_t> &col,
        std::vector<double>    &val,
        std::vector<double>    &rhs
	)
{
    amgcl::io::mm_reader A_mm(A_file);
    ptrdiff_t n = A_mm.rows();

    ptrdiff_t chunk = (n + comm.size - 1) / comm.size;
    if (chunk % block_size != 0) {
        chunk += block_size - chunk % block_size;
    }

    ptrdiff_t row_beg = std::min(n, chunk * comm.rank);
    ptrdiff_t row_end = std::min(n, row_beg + chunk);

    std::cout<<"Rank: "<<comm.rank<<" - Min row: "<<row_beg<<std::endl;
    std::cout<<"Rank: "<<comm.rank<<" - Max row: "<<row_end<<std::endl;

    chunk = row_end - row_beg;

    A_mm(ptr, col, val, row_beg, row_end);

    try{
        amgcl::io::mm_reader rhs_mm(rhs_file);
        rhs_mm(rhs, row_beg, row_end);
    }
    catch(std::exception const& e){
	if(comm.rank == 0){
	   std::cout<<"Following error occurred opening file b.mtx for RHS: "<<e.what()<<std::endl;
	   std::cout<<"The code will continue setting all entries of RHS to 1.0"<<std::endl;
	}
        rhs.resize(chunk);
        std::fill(rhs.begin(), rhs.end(), 1.0);
    }

    return chunk;
}

and later on in the main script I call the function as follows:

    using amgcl::prof;

    int block_size = prm.get("precond.coarsening.aggr.block_size", 1);

    prof.tic("read problem");
    std::vector<ptrdiff_t> ptr;
    std::vector<ptrdiff_t> col;
    std::vector<double>    val;
    std::vector<double>    rhs;

    ptrdiff_t chunk = read_matrix_market(
            world, A_file, rhs_file, block_size, ptr, col, val, rhs
            );

    prof.toc("read problem");

However, I notice that the time spent to read the matrix from the MM increases with the number of processes instantiated, rather than decreasing.
Am I doing something wrong? If you are interested, I can also share the entire main.cpp file with you.

Thank you very much in advance for your attention to this issue.

The text was updated successfully, but these errors were encountered:

ddemidov · 2023-01-25T18:43:31Z

My guess is that reading a single file by many processes stresses your io subsystem. I prefer to convert the matrix and the rhs to a binary format, which significantly decreases the read times.

You can convert the system with

./build/examples/mm2bin -i A.mtx -o A.bin
./build/examples/mm2bin -i b.mtx -o b.bin

And the example of reading the matrix in this format may be found here: https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of Matrix Market routines to distributed matrix reading across MPI processes #247

Use of Matrix Market routines to distributed matrix reading across MPI processes #247

allaffa commented Jan 25, 2023 •

edited by ddemidov

ddemidov commented Jan 25, 2023

Use of Matrix Market routines to distributed matrix reading across MPI processes #247

Use of Matrix Market routines to distributed matrix reading across MPI processes #247

Comments

allaffa commented Jan 25, 2023 • edited by ddemidov

ddemidov commented Jan 25, 2023

allaffa commented Jan 25, 2023 •

edited by ddemidov