Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cache for parallel::CellWeights. #16537

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

marcfehling
Copy link
Member

@marcfehling marcfehling commented Jan 25, 2024

In most cases of weighted load balancing, we only require information about the number of degrees of freedom. Currently, we calculate these weights for each cell individually. This is wasteful, as cells with the same finite element assigned will ultimately get the same weight.

This PR proposes to introduce a cache option: we compute the weights for each finite element in advance, and use these values as weights for each cell.


Currently, the interface of the WeightingFunction type uses both a cell_iterator and a finite element. We might discuss if we want to keep that interface for the future, or if we would like to simplify it to only take a finite element as an argument. The only possible use cases for this class at the moment are those in which weights are only determined via finite elements, so I would favor to keep it simple. Please tell me if you use this class for other purposes, or if you see other use cases.

I moved the selection of possible weighting functions to the top of the h and cc files. We might also want to move them into a separate namespace.

Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I even look at this: What is the problem you're trying to solve here? Do you actually see the computation of weights in profiles? It's hard to imagine that computing small powers of integers is ever going to be a worthwhile target for caching.

@marcfehling
Copy link
Member Author

marcfehling commented Jan 26, 2024

When writing the Stokes code matrixfree with two DoFHandlers, I had to write a separate function for weighted load balancing as parallel::CellWeights only supports one DoFHandler. For some reason, it didn't come to my mind that I could simply instantiate one object per DoFHandler, but I noticed that all the information we need really is in the FiniteElement, and not in the DoFCellAccessor. We can precompute the weights for each FiniteElement while registering the weighting callback function, and then simply access them via FE indices for each cell.

With this PR, I wanted to make my observation/implementation available in deal.II. I did not identify it as a bottleneck. I just recently stumbled over the weighting part in my code again and thought it would be handy to have it in the library as well.

In my opinion, I over-engineered the class in #7264, #8687 (comment). The only application it was built for was to weight cells by the number of degrees of freedom. There was never a reason to use a DoFCellAccessor to access properties, and I still can't think of one now. In the long term, I would like to change and simplify the interface of the WeightingFunction accordingly to only take a FiniteElement. If a user needs information on a DoFCellAccessor, they can still build their own function (like I did for the Stokes code) and attach it to the signal.

@kronbichler
Copy link
Member

Even though my experience is different from

In most cases of weighted load balancing, we only require information about the number of degrees of freedom.

because most applications I've dealt with have specific requirements (think of local time stepping, cells which do more work than others for other reasons than for different numbers of DoFs, such as CutFEM algorithms, higher cost of cells with hanging nodes compared to unconstrained cells), I think the present class was primarily designed for use in hp methods where this might be justified. In fact, all applications I've worked with manually connect to the signals, where the number of DoFs is part of the metric, but the actual cell is also needed to collect the information. Therefore, I would support the interface @marcfehling thinks is most useful, as he has most experience with the present class.

Now I should of course question whether it would make sense for some of my applications to use CellWeights. The question would be if my use cases fit into the class or not. Opinions, @marcfehling?

@marcfehling
Copy link
Member Author

marcfehling commented Jan 31, 2024

I think the present class was primarily designed for use in hp methods

Yes, exactly that's the motivation behind it. We kept an option open to potentially extend the class to use information from cell_iterators, but as this class was never really designed for this case, I would rather like to make the purpose of the class clearer, limit its functionality, and optimize for it. Keeping it simple for vanilla hp applications. Thus, in a follow-up PR I would like to deprecate the old interface, and adjust the documentation. Maybe we should even move it in the hp namespace. What's your opinion?

Users which need more sophisticated weighting functions can still write their own function and connect it to the signal, as you already pointed out.

@kronbichler Do you think your uses cases can be generalized to become part of the selection of weighting functions? If so, I would be happy to leave the interface as is.

template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::constant_weighting(const unsigned int factor)
{
return [factor](const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &) -> unsigned int {
return factor;
};
}
template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::ndofs_weighting(
const std::pair<float, float> &coefficients)
{
return [coefficients](
const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &future_fe) -> unsigned int {
const float result =
std::trunc(coefficients.first *
std::pow(future_fe.n_dofs_per_cell(), coefficients.second));
Assert(result >= 0. &&
result <=
static_cast<float>(std::numeric_limits<unsigned int>::max()),
ExcMessage(
"Cannot cast determined weight for this cell to unsigned int!"));
return static_cast<unsigned int>(result);
};
}
template <int dim, int spacedim>
typename CellWeights<dim, spacedim>::WeightingFunction
CellWeights<dim, spacedim>::ndofs_weighting(
const std::vector<std::pair<float, float>> &coefficients)
{
return [coefficients](
const typename DoFHandler<dim, spacedim>::cell_iterator &,
const FiniteElement<dim, spacedim> &future_fe) -> unsigned int {
float result = 0;
for (const auto &pair : coefficients)
result +=
pair.first * std::pow(future_fe.n_dofs_per_cell(), pair.second);
result = std::trunc(result);
Assert(result >= 0. &&
result <=
static_cast<float>(std::numeric_limits<unsigned int>::max()),
ExcMessage(
"Cannot cast determined weight for this cell to unsigned int!"));
return static_cast<unsigned int>(result);
};
}

@bangerth bangerth mentioned this pull request Jan 31, 2024
Copy link
Member

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're currently keeping the cache in the lambda function, but I think that's asking for trouble. It's not unreasonable if people use an idiom like

  FECollection fe;
  fe.push_back(FE_Q(1));
  dof_handler.distribute_dofs(fe);
  ...solve with a Q1 element...

  fe.push_back(FE_Q(2));
  ...refine mesh., assign active_fe_indices...
  dof_handler.distribute_dofs(fe); // now uses a different FECollection

In the best case, you'll get a segfault here. You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

include/deal.II/distributed/cell_weights.h Outdated Show resolved Hide resolved
include/deal.II/distributed/cell_weights.h Outdated Show resolved Hide resolved
source/distributed/cell_weights.cc Outdated Show resolved Hide resolved
source/distributed/cell_weights.cc Outdated Show resolved Hide resolved
@marcfehling
Copy link
Member Author

marcfehling commented Apr 11, 2024

You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

For re-building the cache in this case, I would need to introduce a signal in the distribute_dofs() function, to which the CellWeights object connects.

Instead, I opted for comparing sizes of weight cache and the FECollection in use. If they don't match, an assertion is triggered. Do you think this is enough?

// Check if cache still matches FECollection.
AssertDimension(dof_handler.get_fe_collection().size(),
weight_cache.size());

In addition, I can extend the check to compare each element of the FECollection. I would then also copy-capture the FECollection for validation purposes. I proposed this in a separate commit -- if we don't like it I can remove it quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants