Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh reader/writer with ADIOS2 #2618

Closed
wants to merge 34 commits into from
Closed

Conversation

jorgensd
Copy link
Sponsor Member

@jorgensd jorgensd commented Apr 11, 2023

This is the first step towards having function check-pointing in DOLFINx.

The code uses ADIOS2 as backend for reading and writing data.
No MPI communication required in either stage (until mesh creation/partitioning).

The following C++ code runs nicely

#include <dolfinx/common/MPI.h>
#include <dolfinx/io/ADIOS2Writers.h>
#include <dolfinx/mesh/Mesh.h>
#include <dolfinx/mesh/generation.h>
#include <filesystem>
#include <mpi.h>

using namespace dolfinx;

int main(int argc, char* argv[])
{
  dolfinx::init_logging(argc, argv);
  MPI_Init(&argc, &argv);

  {
    auto mesh
        = std::make_shared<mesh::Mesh<float>>(mesh::create_rectangle<float>(
            MPI_COMM_WORLD, {{{0.0, 0.0}, {1.0, 1.0}}}, {2, 3},
            mesh::CellType::triangle,
            mesh::create_cell_partitioner(mesh::GhostMode::none)));

#ifdef HAS_ADIOS2
    std::filesystem::__cxx11::path filename = "test";
    io::Checkpointer<float> outfile(mesh->comm(),
                                    filename.replace_extension("bp"), mesh);
    outfile.write(2.0);
    outfile.close();
    io::Checkpointer<float> infile(mesh->comm(),
                                   filename.replace_extension("bp"));
    mesh::Mesh<float> mesh_no_ghost
        = infile.read_mesh(2.0, mesh::GhostMode::none);
    mesh::Mesh<float> mesh_shared
        = infile.read_mesh(2.0, mesh::GhostMode::shared_facet);

    std::stringstream cc;
    const int rank = dolfinx::MPI::rank(MPI_COMM_WORLD);
    cc << "-" << rank << "-"
       << "\n";
    cc << "Mesh no ghosts: "
       << mesh_no_ghost.topology()
              ->index_map(mesh_no_ghost.topology()->dim())
              ->size_local()
       << " "
       << mesh_no_ghost.topology()
              ->index_map(mesh_no_ghost.topology()->dim())
              ->num_ghosts()
       << "\n";
    cc << "Mesh shared_facet: "
       << mesh_shared.topology()
              ->index_map(mesh_shared.topology()->dim())
              ->size_local()
       << " "
       << mesh_shared.topology()
              ->index_map(mesh_shared.topology()->dim())
              ->num_ghosts()
       << "\n";
    infile.close();

    io::Checkpointer<float> infile_self(MPI_COMM_SELF,
                                        filename.replace_extension("bp"));
    mesh::Mesh<float> mesh_self
        = infile_self.read_mesh(2.0, mesh::GhostMode::none);
    infile_self.close();
    cc << "MEsh self: "
       << mesh_self.topology()
              ->index_map(mesh_self.topology()->dim())
              ->size_local()
       << " "
       << mesh_self.topology()
              ->index_map(mesh_self.topology()->dim())
              ->num_ghosts()
       << "\n";
    std::cout << cc.str();

    io::VTXWriter<float> vtx_ng(
        mesh_no_ghost.comm(), "mesh_no_ghost.bp",
        std::make_shared<mesh::Mesh<float>>(mesh_no_ghost));
    vtx_ng.write(0.0);
    vtx_ng.close();

    io::VTXWriter<float> vtx_s(
        mesh_no_ghost.comm(), "mesh_shared.bp",
        std::make_shared<mesh::Mesh<float>>(mesh_shared));
    vtx_s.write(0.0);
    vtx_s.close();

    io::VTXWriter<float> vtx_self(
        mesh_self.comm(), "mesh" + std::to_string(rank) + ".bp",
        std::make_shared<mesh::Mesh<float>>(mesh_self));
    vtx_self.write(0.0);
    vtx_self.close();

#endif
  }
  MPI_Finalize();

  return 0;
}
 mpirun -n 3 ./build-dir-real/demo_interpolation-io 
-0-
Mesh no ghosts: 4 0
Mesh shared_facet: 4 2
MEsh self: 12 0
-1-
Mesh no ghosts: 4 0
Mesh shared_facet: 4 2
MEsh self: 12 0
-2-
Mesh no ghosts: 4 0
Mesh shared_facet: 4 4
MEsh self: 12 0

It would be good to get feedback on this design prior to implementing the python interface.

Once this is merged, I can start working on the function checkpointing, with the approach used in https://github.com/jorgensd/adios4dolfinx/blob/main/src/adios4dolfinx/checkpointing.py#L91-L173

Copy link
Member

@garth-wells garth-wells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments.

Could this be divided into a 'read' PR, and a checkpoint PR after?

cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Show resolved Hide resolved
@jorgensd
Copy link
Sponsor Member Author

Some preliminary comments.

Could this be divided into a 'read' PR, and a checkpoint PR after?

No, as this reads the mesh from the mesh format this PR creates a writer for

@jorgensd
Copy link
Sponsor Member Author

jorgensd commented May 2, 2023

Some preliminary comments.

Could this be divided into a 'read' PR, and a checkpoint PR after?

Changes have been made according to your suggestions, any further comments?

@jorgensd
Copy link
Sponsor Member Author

@garth-wells Any further comments?

@jorgensd jorgensd requested a review from jhale June 1, 2023 06:43
@garth-wells
Copy link
Member

@jorgensd could you make clearer in the PR comment what this changes does in practice?

Copy link
Member

@garth-wells garth-wells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments.

Would be helpful to have clear statement in the PR of what functionality is being add.

cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
int(geometry.cmaps()[0].variant()));
}

/// @brief Create a ADIOS2 checkpointer for reading a mesh from file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a class? Would a function suffice?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, it depends on how we want to do it when adding functions to the mix. have stand-alone functions for each of the operations, or having them interact in a class.

cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
@jorgensd
Copy link
Sponsor Member Author

Another round of comments.

Would be helpful to have clear statement in the PR of what functionality is being add.

The code can write meshes to file on N processes, and read it back in on M processes. This is the first step towards having full checkpointing, as a next PR would add in writing/reading functions.

This is the reason for making the checkpointer a class, as one would write the mesh and function to the same file (and keep it alive during the process), avoiding duplicate initialization statements.

What differs in this code from the already existing Fides/VTXWriter is that:

  1. the data is written to global arrays (so that they can be read in again on a different number of processes).
  2. the topology is stored in DOLFINx order, i.e. the dofs are not permuted to VTK or GMSH ordering.

@jorgensd jorgensd linked an issue Jul 6, 2023 that may be closed by this pull request
@garth-wells
Copy link
Member

What differs in this code from the already existing Fides/VTXWriter is that:

  1. the data is written to global arrays (so that they can be read in again on a different number of processes).

What's a 'global array'? An ADIOS2 concept?

The VTX files we write do have a local-to-global map array.

  1. the topology is stored in DOLFINx order, i.e. the dofs are not permuted to VTK or GMSH ordering.

Do we want to guarantee that the DOLFINx remains unchanged? An advantage of VTK ordering is that we can freely change the internal DOLFINx ordering.

@jorgensd
Copy link
Sponsor Member Author

jorgensd commented Jul 6, 2023

What's a 'global array'? An ADIOS2 concept?

Yes. Consider writing out a mesh with the VTXWriter in serial:

root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# ./build-dir-real64/demo_poisson 
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# bpls -l mesh.bp/
  uint32_t  NumberOfCells        {1} = 1024 / 1024
  uint32_t  NumberOfNodes        {1} = 561 / 561
  int64_t   connectivity         [1]*{1024, 4} = 0 / 560
  double    geometry             [1]*{561, 3} = 0 / 2
  double    step                 scalar = 0
  uint32_t  types                scalar = 69
  uint8_t   vtkGhostType         [1]*{561} = 0 / 0
  int64_t   vtkOriginalPointIds  [1]*{561} = 0 / 560

is what you get in serial, but in parallel you get

root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# mpirun -n 4 ./build-dir-real64/demo_poisson 
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/poisson# bpls -l mesh.bp/
  uint32_t  NumberOfCells        {4} = 272 / 288
  uint32_t  NumberOfNodes        {4} = 166 / 182
  int64_t   connectivity         [4]*{__, 4} = 0 / 181
  double    geometry             [4]*{__, 3} = 0 / 2
  double    step                 scalar = 0
  uint32_t  types                scalar = 69

which means that the data is split into block structures (equal to the number of processors).

This makes the data very fast to write, but very hard to read on any other number of processes than the one that wrote it.

A global array is what we are creating when we specify both local and global dimensions in put as well as the global offset, like:

adios2::Variable<T> local_geometry = impl_adios2::define_variable<T>(
      io, "geometry", {num_xdofs_global, gdim}, {local_range[0], 0},
      {num_xdofs_local, gdim});

which gives you a .bp file that looks like:

root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/checkpoint# mpirun -n 4 ./build-dir-real64/demo_interpolation-io 
root@dokken-XPS-9320:~/shared/dolfinx/cpp/demo/checkpoint# bpls -l test.bp/
  uint32_t  CellPermutations  {12} = 0 / 6
  int64_t   connectivity      {12, 3} = 0 / 11
  float     geometry          {12, 2} = 0 / 1
  double    step              scalar = 2

cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
}

/// @brief Move constructor
Checkpointer(Checkpointer&& writer) = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noexcept?

~Checkpointer() { close(); }

/// @brief Move assignment
Checkpointer& operator=(Checkpointer&& writer) = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noexcept?

cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
cpp/dolfinx/io/ADIOS2Writers.h Outdated Show resolved Hide resolved
Copy link
Member

@garth-wells garth-wells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkpointing needs a good high-level plan and discussion before merging. We need a summary of what we want to be able to do to inform the design. Parts of this PR mirror the VTX and Fides writers, but these designs are probably not ideal for different types.

When reading a file we don't know a priori what the types are. We need to able to (i) query this, or (ii) maybe return a std::variant.

std::span<const T> x = geometry.x();
std::size_t gdim = geometry.dim();

// Trim geometry to gdim and trunctate for owned entities only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unusual for a 'checkpoint', which I would expect would align closely with the internal storage.

auto geometry_dmap = geometry.dofmap();

// Convert local geometry to global geometry
std::vector<std::int64_t> connectivity_out(num_cells_local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terminology here is confusing. The variable name includes 'connectivity', but that data that gets copied in is geometry data.

engine.Put<std::int64_t>(local_topology, connectivity_out.data());

// Compute mesh permutations and store them to file. These are only
// needed for when we want to checkpoint functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'when we want to checkpoint fem::Functions.'


/// Read and write mesh::Mesh objects with ADIOS2
template <std::floating_point T>
class Checkpointer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem logical that this class write and read - beyond the ADIOS2 handles the member data and methods are not shared across read and write. And the ADIOS2Writer class seems to already handle the ADISO2 handles.

std::unique_ptr<adios2::IO> _io;
std::unique_ptr<adios2::Engine> _engine;
std::shared_ptr<const mesh::Mesh<T>> _mesh;
U _u;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

@jorgensd jorgensd closed this May 22, 2024
@jorgensd jorgensd deleted the dokken/adios2-mesh-checkpointer branch May 22, 2024 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Support reading of meshes from VTX/ADIOS2 files
4 participants