Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use struct Slab in grow() so that tape elements do not relocate #795

Conversation

MashyBasker
Copy link

@MashyBasker MashyBasker commented Mar 4, 2024

Release Notes:

  • N/A

Change explanation

The current implementation of the grow() function for Tape uses a destroy() function whenever there is a memory allocation failure with AllocateRawStorage and relocates the old data to new a Tape.

My change uses a struct which contains array of size 32('elements`) and a pointer to the next Slab struct.

Workflow:

  • Dynamically allocate a slab with the new keyword and set the pointer to a nullptr.
  • If the Tape capacity is 0, meaning it is empty, then our newly allocated slab is the first and the _data pointer is updated to point to the newly allocated slab
  • If the Tape is not empty, then the chain of Slabs are iterated using the pointer variable until it reaches the last Slab and is appended to it.
  • After the addition of a new slab, the overall capacity of the Tape gets doubled.

Fixes #793

graph TD;
  Slab0;
  Slab1;
  Slab2;
  Slab0-->Slab1;
  Slab1-->Slab2;

Signed-off-by: Maharshi Basu <basumaharshi10@gmail.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

/// Initial capacity (allocated whenever a value is pushed into empty tape).

// Slab struct to store the elements. Contains the pointer to the next slab
template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: declaration of 'T' shadows template parameter [clang-diagnostic-error]

    template <typename T>
                       ^
Additional context

include/clad/Differentiator/Tape.h:13: template parameter is declared here

  template <typename T>
                     ^

// Slab struct to store the elements. Contains the pointer to the next slab
template <typename T>
struct Slab {
T elements[32]; // can store 32 elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not declare C-style arrays, use std::array<> instead [cppcoreguidelines-avoid-c-arrays]

      T elements[32]; // can store 32 elements
      ^

struct Slab {
T elements[32]; // can store 32 elements
Slab *nextSlab;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: expected ';' after struct [clang-diagnostic-error]

Suggested change
}
};

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
Slab<T>* newSlab = new Slab<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: initializing non-owner 'Slab *' with a newly created 'gsl::owner<>' [cppcoreguidelines-owning-memory]

      Slab<T>* newSlab = new Slab<T>;
      ^

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
Slab<T>* newSlab = new Slab<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with new to avoid duplicating the type name [modernize-use-auto]

Suggested change
Slab<T>* newSlab = new Slab<T>;
auto* newSlab = new Slab<T>;

_data = newSlab;
} else {
// find the last slab by iterating the chain of slabs
Slab<T>* currentSlab = static_cast<Slab<T>*>(_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
Slab<T>* currentSlab = static_cast<Slab<T>*>(_data);
auto* currentSlab = static_cast<Slab<T>*>(_data);

Signed-off-by: Maharshi Basu <basumaharshi10@gmail.com>
Signed-off-by: Maharshi Basu <basumaharshi10@gmail.com>
static_cast<const volatile void*>(_data)));
_data = new_data;
// Double the capacity on each slab addition
_capacity *= 2;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should have same size slabs. Can you create/run some of the benchmarks and see how this new implementation works?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am setting the size of each Slab to 32, which is the size of the elements array inside the Slab struct. Did you mean to suggest using _capacity += 32 instead of _capacity *= 2 for maintaining same-sized slabs?

 template <typename SlabT>
    struct Slab {
	  std::array<SlabT, 32> elements; // can store 32 elements
	  std::unique_ptr<SlabT> nextSlab;
    };

I tried to run the existing benchmarks, but got this error:

In file included from /home/okabe/open-source/clad/unittests/Kokkos/main.cpp:1:
In file included from /usr/include/Kokkos_Core.hpp:45:
In file included from /usr/include/KokkosCore_Config_DeclareBackend.hpp:22:
In file included from /usr/include/decl/Kokkos_Declare_OPENMP.hpp:21:
In file included from /usr/include/OpenMP/Kokkos_OpenMP.hpp:235:
/usr/include/OpenMP/Kokkos_OpenMP_Instance.hpp:23:2: error: "You enabled Kokkos OpenMP support without enabling OpenMP in the compiler!"
#error \
 ^

Does this mean I have to build the OpenMP part of LLVM in debug mode as well?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have kokkos installed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have installed Kokkos with spack. But running cmake -DLLVM_DIR=../llvm-project/build -DCMAKE_BUILD_TYPE=DEBUG -DLLVM_EXTERNAL_LIT="$(which lit)" ../, gives me this error

CMake Warning at unittests/CMakeLists.txt:37 (find_package):
  By not providing "FindKokkos.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Kokkos", but
  CMake did not find one.

  Could not find a package configuration file provided by "Kokkos" with any
  of the following names:

    KokkosConfig.cmake
    kokkos-config.cmake

  Add the installation prefix of "Kokkos" to CMAKE_PREFIX_PATH or set
  "Kokkos_DIR" to a directory containing one of the above files.  If "Kokkos"
  provides a separate development package or SDK, be sure it has been
  installed.

Is the KokkosConfig.cmake or kokkos-config.cmake file provided by the Kokkos installation or do I have to create it myself?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you are hitting a similar issue as described in #798. Kokkos is an optional dependency. If you remove it clad will still run...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. But interestingly the tests run fine in master branch despite giving the error message:

warning: input 'LIT' contained no tests

If there were problems with the build, then it should have made the tests in the master branch fail as well. But they only fail in the branch I am working on the grow() function. Could it be because of my implementation?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, I did not realize that the tests ran. Yes, if they pass with the master and fail in your branch it's probably due to the changes that were made.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give a feedback on my implementation. The only changes I've made are adding the Slab struct and the changes to the grow() function to use this struct for allocating data.

namespace clad {
    
  /// Dynamically-sized array (std::vector-like), primarily used for storing
  /// values in reverse-mode AD inside loops.
  template <typename T>
  class tape_impl {
    struct Slab {
	    T* elements; // can store 32 elements
	    Slab *nextSlab;
    };
    T* _data = nullptr;
    std::size_t _size = 0;
    std::size_t _capacity = 0;
    Slab* _last_added_slab = nullptr;
    //
    // other member functions
    //
    constexpr static std::size_t _init_capacity = 32;
    CUDA_HOST_DEVICE void grow() {
	    auto newSlab = (Slab*) new Slab;

      if (!_capacity) {
        // the Tape is empty and we can update
        // we can set the capacity with the initial capacity
        _capacity = this->_init_capacity;
      } else {
        // double the capacity if Slab exists
        _capacity *= 2;
      }
      auto* new_data = this->AllocateRawStorage(_capacity);
      if(!new_data) {
        delete newSlab; // delete the allocated slab on data allocation failure
        printf("Allocation failure during tape resize! Aborting\n");
      } else {
        newSlab->elements = new_data;
        _last_added_slab = newSlab;
      }
    }

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks good as a start. Now you need to find where it falls short when clad uses it. Maybe you can start with some of the failing tests that explicitly test the Tape. Maybe TapeMemory.C is something that is worth looking into...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mistakenly thinking that the grow() function was also responsible for pushing the data, where it is actually emplace_back which is doing it. I was keeping a T* elements field in the struct to keep this data. But since we do not need to put data in grow() we need to change the struct fields.

I was considering we need to also change the emplace_back function since we are moving to structs and convert T* elements to T element, but I am unsure if it would be the correct way to do it. Could you please give me a feedback on this approach?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: template argument for template type parameter must be a type [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                        ^
Additional context

include/clad/Differentiator/Tape.h:118: template parameter is declared here

    template <typename SlabT>
                       ^

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'Slab'? [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                        ^

this fix will not be applied because it overlaps with another fix

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'llabs'? [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                        ^

this fix will not be applied because it overlaps with another fix

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'newSlab' is not initialized [cppcoreguidelines-init-variables]

Suggested change
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
std::unique_ptr<Slab<SlabT>> newSlab = 0 = std::make_unique<Slab<SlabT>>();

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: template argument for template type parameter must be a type [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                                                                ^
Additional context

include/clad/Differentiator/Tape.h:118: template parameter is declared here

    template <typename SlabT>
                       ^

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'Slab'? [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                                                                ^

this fix will not be applied because it overlaps with another fix

destroy(begin(), end());
printf("Allocation failure during tape resize! Aborting.\n");
trap(EXIT_FAILURE);
std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'llabs'? [clang-diagnostic-error]

	  std::unique_ptr<Slab<SlabT>> newSlab = std::make_unique<Slab<SlabT>>();
                                                                ^

this fix will not be applied because it overlaps with another fix

_data = newSlab;
} else {
// find the last slab by iterating the chain of slabs
auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: template argument for template type parameter must be a type [clang-diagnostic-error]

        auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
                                             ^
Additional context

include/clad/Differentiator/Tape.h:118: template parameter is declared here

    template <typename SlabT>
                       ^

_data = newSlab;
} else {
// find the last slab by iterating the chain of slabs
auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'Slab'? [clang-diagnostic-error]

        auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
                                             ^

this fix will not be applied because it overlaps with another fix

_data = newSlab;
} else {
// find the last slab by iterating the chain of slabs
auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'SlabT'; did you mean 'llabs'? [clang-diagnostic-error]

        auto* currentSlab = static_cast<Slab<SlabT>*>(_data);
                                             ^

this fix will not be applied because it overlaps with another fix

@parth-07
Copy link
Collaborator

parth-07 commented Mar 5, 2024

Can we use std::deque to represent the data of the tape instead of explicitly adding slabs to the tape when they are required and maintaining all the slab-related data structures?

std::deque may fit the bill. From the official documentation:

As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays, with additional bookkeeping, which means indexed access to deque must perform two pointer dereferences, compared to vector's indexed access which performs only one.

The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location

std::deque seems to be doing what we are trying to achieve here.

@vgvassilev
Copy link
Owner

Can we use std::deque to represent the data of the tape instead of explicitly adding slabs to the tape when they are required and maintaining all the slab-related data structures?

std::deque may fit the bill. From the official documentation:

As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays, with additional bookkeeping, which means indexed access to deque must perform two pointer dereferences, compared to vector's indexed access which performs only one.

The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location

std::deque seems to be doing what we are trying to achieve here.

My expectation is that we will need a lot more customizations and we might be able to make it more our tape efficient than the std::deque. Another consideration is that we want this tape to work for C too.

@parth-07
Copy link
Collaborator

parth-07 commented Mar 7, 2024

My expectation is that we will need a lot more customizations and we might be able to make it more our tape efficient than the std::deque. Another consideration is that we want this tape to work for C too.

Oh, yes. You are right. I totally forgot about the C-compatibility and we should be able to make our tape more efficient than std::deque over time.

@vgvassilev
Copy link
Owner

@MashyBasker, can you take a look at the comments. How we can make progress here?

@MashyBasker MashyBasker closed this by deleting the head repository May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not relocate the tape elements
3 participants