Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime failure of tutorial example 03 copy on amd-epyc #572

Open
staleyLANL opened this issue May 8, 2019 · 13 comments
Open

Runtime failure of tutorial example 03 copy on amd-epyc #572

staleyLANL opened this issue May 8, 2019 · 13 comments
Labels

Comments

@staleyLANL
Copy link
Contributor

Danny Shevitz copied flecsi-tutorial example 03 to an external (to flecsi-tutorial) location, in preparation for integrating some of its contents into a separate application to be built in cmake/make fashion rather than with flecsit. The code has been failing with cryptic runtime errors, even while the same code as built with flecsit runs as intended.

We have explored various leads including, but not limited to: (1) Ensuring the same underlying compiler and flags in the cmake/make system as compared with flecsit. (2) Looking into debug- vs. release-mode issues. (3) Guaranteeing that paths were such that we pulled in the same specialization in both cases, instead of inadvertently picking up a different and incompatible one in the problem case. (4) Issues related to modules and libraries.

We also hand-#included all relevant specialization/ material into the main code, to make a single self-contained file that replicates the original tutorial example. FleCSI Static Analyzer reports no known problems, the same as it does with the original tutorial code.

So far, no luck in tracking down the error. We'll continue exploring, and report on what proves to be the problem.

@charest
Copy link
Contributor

charest commented May 8, 2019

What are the runtime errors?

@shevitz
Copy link

shevitz commented May 8, 2019

Martin's description of the problem is good. I will reiterate in my own words, the goal is to build a single file tutorial 3 (which is the first tutorial that accesses a mesh) with a CMake/Make tool chain rather than the flecsi-tutorial/flecsit tool chain.

One can only presume the issue is configuration. What makes this hard is we are getting run time, not build time errors.

Martin's standalone app generates the following runtime error:

terminate called after throwing an instance of 'std::runtime_error'
what(): FATAL ERROR storage.h:143
The data_client type you are trying to access with key 9893712834078802846 does not exist!
Make sure it has been properly registered!

[cn4002:08266] *** Process received signal ***
[cn4002:08266] Signal: Aborted (6)
[cn4002:08266] Signal code: (-6)
[cn4002:08266] [ 0] /lib64/libc.so.6(+0x36280)[0x14b5b4a7c280]
[cn4002:08266] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x14b5b4a7c207]
[cn4002:08266] [ 2] /lib64/libc.so.6(abort+0x148)[0x14b5b4a7d8f8]
[cn4002:08266] [ 3] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x8b933)[0x14b5b53b8933]
[cn4002:08266] [ 4] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91a96)[0x14b5b53bea96]
[cn4002:08266] [ 5] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91ad1)[0x14b5b53bead1]
[cn4002:08266] [ 6] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91d04)[0x14b5b53bed04]
[cn4002:08266] [ 7] ./slinky-marty[0x41b093]
[cn4002:08266] [ 8] ./slinky-marty[0x418388]
[cn4002:08266] [ 9] ./slinky-marty[0x416fe1]
[cn4002:08266] [10] ./slinky-marty[0x414b0f]
[cn4002:08266] [11] ./slinky-marty(_ZN6flecsi9execution14runtime_driverEiPPc+0x791)[0x43204e]
[cn4002:08266] [12] /home/shevitz/local/flecsi/lib64/libFleCSI.so(_ZN6flecsi9execution20mpi_context_policy_t10initializeEiPPc+0x58)[0x14b5b6087ae6]
[cn4002:08266] [13] ./slinky-marty[0x43ea32]
[cn4002:08266] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x14b5b4a683d5]
[cn4002:08266] [15] ./slinky-marty[0x414799]
[cn4002:08266] *** End of error message ***
Aborted (core dumped)

My original standalone app generates the following runtime error:

terminate called after throwing an instance of 'std::runtime_error'
what(): FATAL ERROR context.h:610 invalid index space: 0

[cn4002:08248] *** Process received signal ***
[cn4002:08248] Signal: Aborted (6)
[cn4002:08248] Signal code: (-6)
[cn4002:08248] [ 0] /lib64/libc.so.6(+0x36280)[0x15464c908280]
[cn4002:08248] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x15464c908207]
[cn4002:08248] [ 2] /lib64/libc.so.6(abort+0x148)[0x15464c9098f8]
[cn4002:08248] [ 3] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x8b933)[0x15464d244933]
[cn4002:08248] [ 4] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91a96)[0x15464d24aa96]
[cn4002:08248] [ 5] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91ad1)[0x15464d24aad1]
[cn4002:08248] [ 6] /projects/opt/centos7/gcc/8.2.0/lib64/libstdc++.so.6(+0x91d04)[0x15464d24ad04]
[cn4002:08248] [ 7] ./slinky-mesh[0x426ff8]
[cn4002:08248] [ 8] ./slinky-mesh[0x423559]
[cn4002:08248] [ 9] ./slinky-mesh[0x41ff11]
[cn4002:08248] [10] ./slinky-mesh[0x41cd54]
[cn4002:08248] [11] ./slinky-mesh[0x4198cf]
[cn4002:08248] [12] ./slinky-mesh[0x418383]
[cn4002:08248] [13] ./slinky-mesh[0x41844c]
[cn4002:08248] [14] ./slinky-mesh[0x41718e]
[cn4002:08248] [15] ./slinky-mesh(_ZN6flecsi9execution14runtime_driverEiPPc+0x791)[0x43a28e]
[cn4002:08248] [16] /home/shevitz/local/flecsi/lib64/libFleCSI.so(_ZN6flecsi9execution20mpi_context_policy_t10initializeEiPPc+0x58)[0x15464df13ae6]
[cn4002:08248] [17] ./slinky-mesh[0x446b8a]
[cn4002:08248] [18] /lib64/libc.so.6(__libc_start_main+0xf5)[0x15464c8f43d5]
[cn4002:08248] [19] ./slinky-mesh[0x416e09]
[cn4002:08248] *** End of error message ***
Aborted (core dumped)

@staleyLANL
Copy link
Contributor Author

I'll note also that the error ending with "Make sure it has been properly registered!" might appear to have been caused by a simple misuse of the register/get-handle/execute pattern via FleCSI's macros, but I don't believe that it is. The code uses those correctly, just as index-spaces.cc (tutorial example 03) does over the course of the last couple dozen lines in its source.

@charest
Copy link
Contributor

charest commented May 8, 2019

please provide source

@staleyLANL
Copy link
Contributor Author

#include <array>
#include <iostream>
#include <vector>

#include <flecsi/data/common/privilege.h>
#include <flecsi/data/data_client_handle.h>
#include <flecsi/data/dense_accessor.h>
#include <flecsi/data/ragged_accessor.h>
#include <flecsi/data/ragged_mutator.h>
#include <flecsi/data/sparse_accessor.h>
#include <flecsi/data/sparse_mutator.h>
#include <flecsi/data/data.h>
#include <flecsi/execution/execution.h>

#include <flecsi/topology/mesh_types.h>
#include <flecsi/topology/mesh.h>
#include <flecsi/topology/mesh_topology.h>



namespace flecsi {
namespace tutorial {

// Point type.
using point_t = std::array<double, 2>;


// Vertex type.
struct vertex_t : public flecsi::topology::mesh_entity_u<0, 1> {

  vertex_t(point_t & p) : p_(p) {}

  point_t const & coordinates() const {
    return p_;
  }

  void print(const char * string) {
    std::cout << string << " My id is " << id<0>() << std::endl;
  } // print

private:
  point_t p_;

}; // struct vertex_t


// Edge type.
struct edge_t : public flecsi::topology::mesh_entity_u<1, 1> {
}; // struct edge_t


// Cell type.
struct cell_t : public flecsi::topology::mesh_entity_u<2, 1> {
  using id_t = flecsi::utils::id_t;

  void print(const char * string) {
    std::cout << string << " My id is " << id<0>() << std::endl;
  } // print

  std::vector<size_t> create_entities(id_t cell_id,
    size_t dim,
    flecsi::topology::domain_connectivity_u<2> & c,
    id_t * e) {
    id_t * v = c.get_entities(cell_id, 0);

    e[0] = v[0];
    e[1] = v[2];

    e[2] = v[1];
    e[3] = v[3];

    e[4] = v[0];
    e[5] = v[1];

    e[6] = v[2];
    e[7] = v[3];

    return {2, 2, 2, 2};
  } // create_entities
}; // struct cell_t


// index_spaces
enum index_spaces : size_t {
  vertices,
  edges,
  cells,
  vertices_to_cells,
  cells_to_vertices
}; // enum index_spaces

struct specialization_mesh_policy_t {

  using id_t = flecsi::utils::id_t;

  flecsi_register_number_dimensions(2);
  flecsi_register_number_domains(1);

  flecsi_register_entity_types(
    flecsi_entity_type(index_spaces::vertices, 0, vertex_t),
    flecsi_entity_type(index_spaces::cells, 0, cell_t));

  flecsi_register_connectivities(
    flecsi_connectivity(index_spaces::cells_to_vertices, 0, cell_t, vertex_t));

  flecsi_register_bindings();

  template<size_t M, size_t D, typename ST>
  static flecsi::topology::mesh_entity_base_u<1> * create_entity(
    flecsi::topology::mesh_topology_base_u<ST> * mesh,
    size_t num_vertices,
    id_t const & id) {
    return nullptr;
  } // create_entity

}; // struct specialization_mesh_policy_t



//----------------------------------------------------------------------------//
// Mesh Specialization
//----------------------------------------------------------------------------//

struct specialization_mesh_t
  : public flecsi::topology::mesh_topology_u<specialization_mesh_policy_t> {

  void print(const char * string) {
    std::cout << string << std::endl;
  } // print

  auto cells() {
    return entities<2, 0>();
  } // cells

  auto cells(partition_t p) {
    return entities<2, 0>(p);
  } // cells

#if 0
  template< typename E, size_t M>
  auto cells(flecsi::topology::domain_entity_u<M, E> & e) {
    return entities<2, 0>(e);
  } // cells
#endif

  auto vertices() {
    return entities<0, 0>();
  } // vertices

  template<typename E, size_t M>
  auto vertices(flecsi::topology::domain_entity_u<M, E> & e) {
    return entities<0, 0>(e);
  } // vertices

}; // specialization_mesh_t

using mesh_t = specialization_mesh_t;



//----------------------------------------------------------------------------//
// Type Definitions
//----------------------------------------------------------------------------//

template<size_t PRIVILEGES>
using mesh = data_client_handle_u<mesh_t, PRIVILEGES>;

template<size_t SHARED_PRIVILEGES>
using field = dense_accessor<double, rw, SHARED_PRIVILEGES, ro>;

template<size_t SHARED_PRIVILEGES>
using ragged_field = ragged_accessor<double, rw, SHARED_PRIVILEGES, ro>;

using ragged_field_mutator = ragged_mutator<double>;

template<size_t SHARED_PRIVILEGES>
using sparse_field = sparse_accessor<double, rw, SHARED_PRIVILEGES, ro>;

using sparse_field_mutator = sparse_mutator<double>;

} // namespace tutorial
} // namespace flecsi



//----------------------------------------------------------------------------//
// Main Code
//----------------------------------------------------------------------------//

using namespace flecsi;
using namespace flecsi::tutorial;

namespace example {

void
simple(mesh<ro> mesh) {

  // Iterate over the vertices index space

  for(auto v : mesh.vertices()) {
    v->print("Hello World! I'm a vertex!");
  } // for

  // Iterate over the cells index space, and then over
  // the vertices index space.

  for(auto c : mesh.cells(owned)) {
    c->print("Hello World! I am a cell!");

    for(auto v : mesh.vertices(c)) {
      v->print("I'm a vertex!");
    } // for
  } // for

} // simple

// Task registration is as usual...

flecsi_register_task(simple, example, loc, single);

} // namespace example



namespace flecsi {
namespace execution {

void
driver(int argc, char ** argv) {

  // Get a data client handle as usual...

  auto m = flecsi_get_client_handle(mesh_t, clients, mesh);

  // Task execution is as usual...

  flecsi_execute_task(simple, example, single, m);

} // driver

} // namespace execution
} // namespace flecsi

/* vim: set tabstop=2 shiftwidth=2 expandtab fo=cqt tw=72 : */

@staleyLANL
Copy link
Contributor Author

That's just Tutorial 03 with specialization material integrated directly into one source file.

@shevitz
Copy link

shevitz commented May 8, 2019

source file: slinky-marty.cc (put in /src directory at same level as CMakeLists.txt):


#include <array>
#include <iostream>
#include <vector>

#include <flecsi/data/common/privilege.h>
#include <flecsi/data/data_client_handle.h>
#include <flecsi/data/dense_accessor.h>
#include <flecsi/data/ragged_accessor.h>
#include <flecsi/data/ragged_mutator.h>
#include <flecsi/data/sparse_accessor.h>
#include <flecsi/data/sparse_mutator.h>
#include <flecsi/data/data.h>
#include <flecsi/execution/execution.h>

#include <flecsi/topology/mesh_types.h>
#include <flecsi/topology/mesh.h>
#include <flecsi/topology/mesh_topology.h>



namespace flecsi {
namespace tutorial {

// Point type.
using point_t = std::array<double, 2>;


// Vertex type.
struct vertex_t : public flecsi::topology::mesh_entity_u<0, 1> {

  vertex_t(point_t & p) : p_(p) {}

  point_t const & coordinates() const {
    return p_;
  }

  void print(const char * string) {
    std::cout << string << " My id is " << id<0>() << std::endl;
  } // print

private:
  point_t p_;

}; // struct vertex_t


// Edge type.
struct edge_t : public flecsi::topology::mesh_entity_u<1, 1> {
}; // struct edge_t


// Cell type.
struct cell_t : public flecsi::topology::mesh_entity_u<2, 1> {
  using id_t = flecsi::utils::id_t;

  void print(const char * string) {
    std::cout << string << " My id is " << id<0>() << std::endl;
  } // print

  std::vector<size_t> create_entities(id_t cell_id,
    size_t dim,
    flecsi::topology::domain_connectivity_u<2> & c,
    id_t * e) {
    id_t * v = c.get_entities(cell_id, 0);

    e[0] = v[0];
    e[1] = v[2];

    e[2] = v[1];
    e[3] = v[3];

    e[4] = v[0];
    e[5] = v[1];

    e[6] = v[2];
    e[7] = v[3];

    return {2, 2, 2, 2};
  } // create_entities
}; // struct cell_t


// index_spaces
enum index_spaces : size_t {
  vertices,
  edges,
  cells,
  vertices_to_cells,
  cells_to_vertices
}; // enum index_spaces

struct specialization_mesh_policy_t {

  using id_t = flecsi::utils::id_t;

  flecsi_register_number_dimensions(2);
  flecsi_register_number_domains(1);

  flecsi_register_entity_types(
    flecsi_entity_type(index_spaces::vertices, 0, vertex_t),
    flecsi_entity_type(index_spaces::cells, 0, cell_t));

  flecsi_register_connectivities(
    flecsi_connectivity(index_spaces::cells_to_vertices, 0, cell_t, vertex_t));

  flecsi_register_bindings();

  template<size_t M, size_t D, typename ST>
  static flecsi::topology::mesh_entity_base_u<1> * create_entity(
    flecsi::topology::mesh_topology_base_u<ST> * mesh,
    size_t num_vertices,
    id_t const & id) {
    return nullptr;
  } // create_entity

}; // struct specialization_mesh_policy_t



//----------------------------------------------------------------------------//
// Mesh Specialization
//----------------------------------------------------------------------------//

struct specialization_mesh_t
  : public flecsi::topology::mesh_topology_u<specialization_mesh_policy_t> {

  void print(const char * string) {
    std::cout << string << std::endl;
  } // print

  auto cells() {
    return entities<2, 0>();
  } // cells

  auto cells(partition_t p) {
    return entities<2, 0>(p);
  } // cells

#if 0
  template< typename E, size_t M>
  auto cells(flecsi::topology::domain_entity_u<M, E> & e) {
    return entities<2, 0>(e);
  } // cells
#endif

  auto vertices() {
    return entities<0, 0>();
  } // vertices

  template<typename E, size_t M>
  auto vertices(flecsi::topology::domain_entity_u<M, E> & e) {
    return entities<0, 0>(e);
  } // vertices

}; // specialization_mesh_t

using mesh_t = specialization_mesh_t;



//----------------------------------------------------------------------------//
// Type Definitions
//----------------------------------------------------------------------------//

template<size_t PRIVILEGES>
using mesh = data_client_handle_u<mesh_t, PRIVILEGES>;

template<size_t SHARED_PRIVILEGES>
using field = dense_accessor<double, rw, SHARED_PRIVILEGES, ro>;

template<size_t SHARED_PRIVILEGES>
using ragged_field = ragged_accessor<double, rw, SHARED_PRIVILEGES, ro>;

using ragged_field_mutator = ragged_mutator<double>;

template<size_t SHARED_PRIVILEGES>
using sparse_field = sparse_accessor<double, rw, SHARED_PRIVILEGES, ro>;

using sparse_field_mutator = sparse_mutator<double>;

} // namespace tutorial
} // namespace flecsi



//----------------------------------------------------------------------------//
// Main Code
//----------------------------------------------------------------------------//

using namespace flecsi;
using namespace flecsi::tutorial;

namespace example {

void
simple(mesh<ro> mesh) {

  // Iterate over the vertices index space

  for(auto v : mesh.vertices()) {
    v->print("Hello World! I'm a vertex!");
  } // for

  // Iterate over the cells index space, and then over
  // the vertices index space.

  for(auto c : mesh.cells(owned)) {
    c->print("Hello World! I am a cell!");

    for(auto v : mesh.vertices(c)) {
      v->print("I'm a vertex!");
    } // for
  } // for

} // simple

// Task registration is as usual...

flecsi_register_task(simple, example, loc, single);

} // namespace example



namespace flecsi {
namespace execution {

void
driver(int argc, char ** argv) {

  // Get a data client handle as usual...

  auto m = flecsi_get_client_handle(mesh_t, clients, mesh);

  // Task execution is as usual...

  flecsi_execute_task(simple, example, single, m);

} // driver

} // namespace execution
} // namespace flecsi

/* vim: set tabstop=2 shiftwidth=2 expandtab fo=cqt tw=72 : */

current CMakeLists.txt:

#[[
This file is part of the Ristra slinky project.
Please see the license file at the root of this repository.
]]

project(SLINKY)

# We need C++ 17
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED on)
set(CMAKE_CXX_EXTENSIONS off)

#add_definitions(-DDEBUG -DOMPI_SKIP_MPICXX -DMPICH_SKIP_MPICXX)
#add_definitions(-DCINCH_OVERRIDE_DEFAULT_INITIALIZATION_DRIVER)
#add_definitions(-DFLECSI_ENABLE_SPECIALIZATION_TLT_INIT -DFLECSI_ENABLE_SPECIALIZATION_SPMD_INIT -DCINCH_OVERRIDE_DEFAULT_INITIALIZATION_DRIVER)

cmake_minimum_required(VERSION 3.11.0)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# We are going to use this app with MPI (for now - later we can add Legion)


find_package(MPI REQUIRED)

#set(CMAKE_C_COMPILER ${MPI_C_COMPILER} CACHE FILEPATH "C compiler to use" FORC
#E)
#set(CMAKE_CXX_COMPILER ${MPI_CXX_COMPILER} CACHE FILEPATH "C++ compiler to use
#" FORCE)

# External libraries

if (FLECSI_DIR)
  find_package(FleCSI REQUIRED
               HINTS ${FLECSI_DIR}/lib64/cmake ${FLECSI_DIR}/lib/cmake)

  include_directories(${FLECSI_INCLUDE_DIRS})

  message(STATUS "FleCSI_LIBRARIES ${FleCSI_LIBRARIES}")
else()
  message(FATAL "FLECSI_DIR must be defined")
endif(FLECSI_DIR)


#slinky-marty
add_executable(slinky-marty ${PROJECT_SOURCE_DIR}/src/slinky-marty.cc  ${FLECSI_RUNTIME_DRIVER} ${FLECSI_RUNTIME_MAIN})
target_link_libraries(slinky-marty ${FleCSI_LIBRARIES} ${FLECSI_LIBRARY_DEPENDENCIES} ${FLECSI_RUNTIME_LIBRARIES})

invocation from /build:

cmake -DFLECSI_DIR=/dir_of_flecsi_install ..

in my case: ~/local/flecsi

@charest
Copy link
Contributor

charest commented May 8, 2019

I dont see a "flecsi_register_data_client" as the errror suggests

@staleyLANL
Copy link
Contributor Author

It looks like one missing link (good eye, @charest) is that we're missing:

 namespace flecsi {
 namespace tutorial {
    flecsi_register_data_client(mesh_t, clients, mesh);
 } // namespace tutorial
 } // namespace flecsi

as from flecsi-tutorial/specialization/mesh.cc. The failure to have this shows up as neither a compile-time nor a link-time error.

The script flecsi/build/CMakeFiles/flecsi-tutorial-install.sh sets:

 export FLECSIT_LIBRARIES=/usr/local/lib/libFleCSI-Tut.so

which would have given us the inline bool (and resulting registration) inserted by the flecsi_register_data_client(mesh_t, clients, mesh) macro, via the linked library.

This explains why flecsit ended up with what it needed, but the more-direct build was missing something.

@Shevits if you insert:

 flecsi_register_data_client(mesh_t, clients, mesh);

then it gets us past the registration error.

Now I'm still getting:

 terminate called after throwing an instance of 'std::runtime_error'
   what():  FATAL ERROR context.h:610 invalid index space: 0
 Aborted

on my own machine, but it's progress.

@charest
Copy link
Contributor

charest commented May 9, 2019

where are your spmd and tlt init functions? This example is not complete.

@shevitz
Copy link

shevitz commented May 9, 2019

The invalid index space error is what we were originally getting...

@staleyLANL
Copy link
Contributor Author

Probably close to a resolution. Danny and I will meet on Monday.

@staleyLANL
Copy link
Contributor Author

Made good progress at this morning's meeting. Participants were Danny, Evgeny, Irina, Martin, and Navamita.

@tuxfan tuxfan added the PERSIST label Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants