Enable cache for parallel::CellWeights. #16537

marcfehling · 2024-01-25T01:47:56Z

In most cases of weighted load balancing, we only require information about the number of degrees of freedom. Currently, we calculate these weights for each cell individually. This is wasteful, as cells with the same finite element assigned will ultimately get the same weight.

This PR proposes to introduce a cache option: we compute the weights for each finite element in advance, and use these values as weights for each cell.

Currently, the interface of the WeightingFunction type uses both a cell_iterator and a finite element. We might discuss if we want to keep that interface for the future, or if we would like to simplify it to only take a finite element as an argument. The only possible use cases for this class at the moment are those in which weights are only determined via finite elements, so I would favor to keep it simple. Please tell me if you use this class for other purposes, or if you see other use cases.

I moved the selection of possible weighting functions to the top of the h and cc files. We might also want to move them into a separate namespace.

bangerth

Before I even look at this: What is the problem you're trying to solve here? Do you actually see the computation of weights in profiles? It's hard to imagine that computing small powers of integers is ever going to be a worthwhile target for caching.

marcfehling · 2024-01-26T22:27:43Z

When writing the Stokes code matrixfree with two DoFHandlers, I had to write a separate function for weighted load balancing as parallel::CellWeights only supports one DoFHandler. For some reason, it didn't come to my mind that I could simply instantiate one object per DoFHandler, but I noticed that all the information we need really is in the FiniteElement, and not in the DoFCellAccessor. We can precompute the weights for each FiniteElement while registering the weighting callback function, and then simply access them via FE indices for each cell.

With this PR, I wanted to make my observation/implementation available in deal.II. I did not identify it as a bottleneck. I just recently stumbled over the weighting part in my code again and thought it would be handy to have it in the library as well.

In my opinion, I over-engineered the class in #7264, #8687 (comment). The only application it was built for was to weight cells by the number of degrees of freedom. There was never a reason to use a DoFCellAccessor to access properties, and I still can't think of one now. In the long term, I would like to change and simplify the interface of the WeightingFunction accordingly to only take a FiniteElement. If a user needs information on a DoFCellAccessor, they can still build their own function (like I did for the Stokes code) and attach it to the signal.

kronbichler · 2024-01-27T09:50:56Z

Even though my experience is different from

In most cases of weighted load balancing, we only require information about the number of degrees of freedom.

because most applications I've dealt with have specific requirements (think of local time stepping, cells which do more work than others for other reasons than for different numbers of DoFs, such as CutFEM algorithms, higher cost of cells with hanging nodes compared to unconstrained cells), I think the present class was primarily designed for use in hp methods where this might be justified. In fact, all applications I've worked with manually connect to the signals, where the number of DoFs is part of the metric, but the actual cell is also needed to collect the information. Therefore, I would support the interface @marcfehling thinks is most useful, as he has most experience with the present class.

Now I should of course question whether it would make sense for some of my applications to use CellWeights. The question would be if my use cases fit into the class or not. Opinions, @marcfehling?

marcfehling · 2024-01-31T00:01:32Z

I think the present class was primarily designed for use in hp methods

Yes, exactly that's the motivation behind it. We kept an option open to potentially extend the class to use information from cell_iterators, but as this class was never really designed for this case, I would rather like to make the purpose of the class clearer, limit its functionality, and optimize for it. Keeping it simple for vanilla hp applications. Thus, in a follow-up PR I would like to deprecate the old interface, and adjust the documentation. Maybe we should even move it in the hp namespace. What's your opinion?

Users which need more sophisticated weighting functions can still write their own function and connect it to the signal, as you already pointed out.

@kronbichler Do you think your uses cases can be generalized to become part of the selection of weighting functions? If so, I would be happy to leave the interface as is.

dealii/source/distributed/cell_weights.cc

Lines 64 to 122 in 2aa0f75

    
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::constant_weighting(const unsigned int factor) 
        
           { 
        
             return [factor](const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                             const FiniteElement<dim, spacedim> &) -> unsigned int { 
        
               return factor; 
        
             }; 
        
           } 
        
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::ndofs_weighting( 
        
             const std::pair<float, float> &coefficients) 
        
           { 
        
             return [coefficients]( 
        
                      const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                      const FiniteElement<dim, spacedim> &future_fe) -> unsigned int { 
        
               const float result = 
        
                 std::trunc(coefficients.first * 
        
                            std::pow(future_fe.n_dofs_per_cell(), coefficients.second)); 
        
               Assert(result >= 0. && 
        
                        result <= 
        
                          static_cast<float>(std::numeric_limits<unsigned int>::max()), 
        
                      ExcMessage( 
        
                        "Cannot cast determined weight for this cell to unsigned int!")); 
        
               return static_cast<unsigned int>(result); 
        
             }; 
        
           } 
        
           template <int dim, int spacedim> 
        
           typename CellWeights<dim, spacedim>::WeightingFunction 
        
           CellWeights<dim, spacedim>::ndofs_weighting( 
        
             const std::vector<std::pair<float, float>> &coefficients) 
        
           { 
        
             return [coefficients]( 
        
                      const typename DoFHandler<dim, spacedim>::cell_iterator &, 
        
                      const FiniteElement<dim, spacedim> &future_fe) -> unsigned int { 
        
               float result = 0; 
        
               for (const auto &pair : coefficients) 
        
                 result += 
        
                   pair.first * std::pow(future_fe.n_dofs_per_cell(), pair.second); 
        
               result = std::trunc(result); 
        
               Assert(result >= 0. && 
        
                        result <= 
        
                          static_cast<float>(std::numeric_limits<unsigned int>::max()), 
        
                      ExcMessage( 
        
                        "Cannot cast determined weight for this cell to unsigned int!")); 
        
               return static_cast<unsigned int>(result); 
        
             }; 
        
           }

bangerth

You're currently keeping the cache in the lambda function, but I think that's asking for trouble. It's not unreasonable if people use an idiom like

  FECollection fe;
  fe.push_back(FE_Q(1));
  dof_handler.distribute_dofs(fe);
  ...solve with a Q1 element...

  fe.push_back(FE_Q(2));
  ...refine mesh., assign active_fe_indices...
  dof_handler.distribute_dofs(fe); // now uses a different FECollection

In the best case, you'll get a segfault here. You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

include/deal.II/distributed/cell_weights.h

source/distributed/cell_weights.cc

marcfehling · 2024-04-11T19:08:03Z

You would need to find a way to invalidate the cache whenever distribute_dofs() is called, and to re-build it upon first use.

For re-building the cache in this case, I would need to introduce a signal in the distribute_dofs() function, to which the CellWeights object connects.

Instead, I opted for comparing sizes of weight cache and the FECollection in use. If they don't match, an assertion is triggered. Do you think this is enough?

dealii/source/distributed/cell_weights.cc

Lines 278 to 280 in fab350d

    
           // Check if cache still matches FECollection. 
        
           AssertDimension(dof_handler.get_fe_collection().size(), 
        
                           weight_cache.size());

In addition, I can extend the check to compare each element of the FECollection. I would then also copy-capture the FECollection for validation purposes. I proposed this in a separate commit -- if we don't like it I can remove it quickly.

marcfehling added hp adaptivity Parallel labels Jan 25, 2024

bangerth reviewed Jan 26, 2024

View reviewed changes

bangerth mentioned this pull request Jan 31, 2024

Minor doc update. #16573

Merged

bangerth reviewed Jan 31, 2024

View reviewed changes

Enable cache for parallel::CellWeights.

842ce7e

marcfehling force-pushed the weight_cache branch from fab350d to 842ce7e Compare April 11, 2024 19:10

Validate weight_cache with the complete fe_collection.

1508b14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cache for parallel::CellWeights. #16537

Enable cache for parallel::CellWeights. #16537

marcfehling commented Jan 25, 2024 •

edited

bangerth left a comment

marcfehling commented Jan 26, 2024 •

edited

kronbichler commented Jan 27, 2024

marcfehling commented Jan 31, 2024 •

edited

bangerth left a comment

marcfehling commented Apr 11, 2024 •

edited

Enable cache for parallel::CellWeights. #16537

Are you sure you want to change the base?

Enable cache for parallel::CellWeights. #16537

Conversation

marcfehling commented Jan 25, 2024 • edited

bangerth left a comment

Choose a reason for hiding this comment

marcfehling commented Jan 26, 2024 • edited

kronbichler commented Jan 27, 2024

marcfehling commented Jan 31, 2024 • edited

bangerth left a comment

Choose a reason for hiding this comment

marcfehling commented Apr 11, 2024 • edited

marcfehling commented Jan 25, 2024 •

edited

marcfehling commented Jan 26, 2024 •

edited

marcfehling commented Jan 31, 2024 •

edited

marcfehling commented Apr 11, 2024 •

edited