Enable multiple precisions in CeedVector #948

nbeams · 2022-04-22T20:38:36Z

In this branch, I have attempted to start collecting the result of many discussions, particularly with @jeremylt and @jedbrown, about expanding CeedVectors to be able to contain multiple precisions. This PR is meant as a place for further discussion and refinement, whether or not we end up merging from it.

Summary of changes so far (as of opening of the draft PR, April 22):

Vector access functions (Get/Set/Take, as well as Sync) have changed to *Generic at the backend level, that is, the functions that get added to the CeedVector_private struct. They now take void pointers instead of CeedScalar, and an additional parameter for the data type. (These can hopefully be used within other backend functions in the future where we allow mixed precision.) This has been changed in ref, cuda-ref, and hip-ref.
At the user level, these functions remain the same, but they now simply call the Generic version with CEED_SCALAR_TYPE as the type parameter in their implementations in interface/ceed-vector.c
At the user level, additional versions for specific precisions have been added (named *FP32 and *FP64). These also just call the Generic version "under the hood"

in the hip-ref backend ONLY so far:

A first implementation of all the Generic functions, aimed at implementing the previously-discussed multiprecision vector access rules for validity
The new CeedScalarArray struct is used to store data

In fd75fdf I added a user-interface-level function for checking the current array status, inspired by the way MFEM returns memory status flags. I was using this in my own local tests. We may or may not want to keep something like this in the future.

Related issue: #778

…tions to generic version with precision parameter

…ns to hip-ref

…for different precisions

backends/hip-ref/ceed-hip-ref-vector.c

…ackend functions

jedbrown

Sorry I'm slow on this. It's looking really good. I made a couple comments that I hope will help keep CEED_SCALAR_TYPE out of library code. My intent is to use C11 _Generic to dispatch to the appropriate *FP32 or *FP64 when available, otherwise fall back to a default that the user can select by including ceed-f64.h or ceed-f32.h.

I think we should be consistent, and either call the functions *F32 or name the header ceed-fp32.h; they're currently *FP32 and ceed-f32.h.

jedbrown · 2022-05-03T01:23:30Z

include/ceed/ceed.h

+  CEED_SCALAR_FP64,
+  /// Total number of allowed scalar precision types
+  CEED_NUM_PRECISIONS,
 } CeedScalarType;
+/// Struct for holding data in multiple precisions for mixed-precision-enabled
+/// backends
+typedef struct {
+  void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS
+} CeedScalarArray;


This automatically gets the right sizes. Can we move CeedScalarArray to ceed-impl.h or backend.h?

jedbrown · 2022-05-03T01:28:34Z

include/ceed/ceed.h

+/// backends
+typedef struct {
+  void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS
+} CeedScalarArray;
 /// Base scalar type for the library to use: change which header is 
 /// included to change the precision.
 #include "ceed-f64.h"


This would allow a user to include the type-specific header before ceed.h, and thus be able to select the default at file granularity.

Suggested change

#include "ceed-f64.h"

#ifndef CEED_SCALAR_TYPE

# include "ceed-f64.h"

#endif

Do you think we can avoid referencing CEED_SCALAR_TYPE in library code? I think that's the eventual goal.

jedbrown · 2022-05-03T01:32:52Z

interface/ceed-vector.c

+int CeedVectorSetArray(CeedVector vec, CeedMemType mem_type,
+                       CeedCopyMode copy_mode,
+                       CeedScalar *array) {
+  return CeedVectorSetArrayGeneric(vec, mem_type, CEED_SCALAR_TYPE, copy_mode,
+                                   (void **) array);
+}


I think we can have ceed.h use a macro to make CeedVectorSetArray map to either CeedVectorSetArrayFP32 or CeedVectorSetArrayFP64 so that we don't need an actual symbol in the library for this, and thus can remove this use of CEED_SCALAR_TYPE.

Okay, so I've spent a bit of time looking into this, because I didn't realize removal of CEED_SCALAR_TYPE was a goal.

What would be the benefit of this? Just that the user wouldn't have to re-compile the library in order to reset the default precision?

One thing I am not sure about is how it would (or wouldn't) affect the bindings for other languages. I remember the CeedGetScalarType interface function was added specifically for the Julia bindings, for example. We also use CEED_SCALAR_TYPE in the Python bindings right now.

I went through and made a list of all the functions in headers that take CeedScalar or CeedScalar * in the prototype, which would need to have versions for every allowed precision if CeedScalar is not set at library compile time (as is the case if no CEED_SCALAR_TYPE/no re-compiling). They can be broken into several groups:

Functions that we would likely already be modifying or adding multiple versions of for different precisions:

the Get/Set/Take/RestoreArray functions for Vectors, obviously, as well as SetValue, Norm, Scale, and AXPY (which we haven't yet decided what to do with, but are clear candidates for multiple precisions).

basis creation: CreateTensorH1, CreateH1, CreateHdiv : makes sense that if we allow different computation and/or input/output precisions for Basis objects (mixed-precision setting), we'd maybe want versions allowing the user to create the object with quadrature/interp/gradient data in the precision that will be used in computation. So that makes sense

other basis stuff: GetQRef, GetQWeights, GetInterp, GetInterp1D, GetGrad, GetGrad1D, GetDiv, GaussQuadrature, LobattoQuadrature, QRFactorization, SymmetricSchurDecomposition, SimultaneousDiagonalization -- most of these probably would also make sense to have versions for different precisions in a mixed-precision setting, but we are definitely starting to "explode" the number of added functions at this point (especially if we expand to more precisions in the future...)

the definition of QFunctionUser function pointer prototype -- we already need a solution for mixed-precision QFunctions anyway

Miscellaneous other stuff:

Two preconditioning Operator functions: MultigridLevelCreateTensorH1 and MultigridLevelCreateH1 -- not sure about those

a few functions in ceed/backend.h: BasisGetCollocatedGrad, HouseholderApplyQ, TensorContractApply, MatrixMultiply. These are in backend.h instead of ceed.h like the rest -- this may affect how easy it is to switch versions without re-compiling?
But at any rate, we'd need "generic" versions of most/all these functions, as well, for the library to use internally, to avoid re-compilation if we change CeedScalar, right? Though for many of the computational routines, we'd still need separate implementations for each precision, I think, unlike the vector access functions...

We would also need to change the definitions of some of the objects (private structs), like Basis and QFunctionContext, which include CeedScalar types.

use a macro to make CeedVectorSetArray map to either CeedVectorSetArrayFP32 or CeedVectorSetArrayFP64 so that we don't need an actual symbol in the library for this

Hmm, I guess I had thought we would define different versions of all the above functions from ceed.h in ceed-f32 and ceed-f64 with the corresponding types -- to have a version that would still work with C99, too. Unless we want to require C11? I thought not? I guess a macro would achieve the same thing.

I'm not sure how any of these options with affect the bindings for other languages, though, as mentioned previously.

I thought CEED_SCALAR_TYPE was a stop-gap until we could get rid of it with decent ergonomics. The foreign language bindings are fine (simpler even) because they'll just call to the correct type-specialized one.

I don't know that CreateH1 and other quadrature/basis stuff need specializations for f32 (they can easily be obtained by casting the f64 specification). Doing the setup in f64 and casting the result when it starts operating on problem-sized data is more accurate at equivalent cost. They would need specializations if we add __float128 support sometime. But not urgent in any case.

I think the internal stuff like tensor contraction needs specializations anyway to actually do the basis apply work in the selected precision.

IMO, we can't require C11 _Generic (not least because we need to support C++ callers). The ceed-f{32,64}.h headers would just define CeedScalar so people can be oblivious to precision. Foreign language callers wouldn't touch it. C11 _Generic or C++ templates would enable callers to do mixed precision stuff without explicitly acknowledging it when calling libceed.

I think I'm still missing something. What's the point of getting rid of CEED_SCALAR_TYPE (which defines CeedScalar at compile time) unless we want to select the type of CeedScalar without re-compiling the library? In which case, we'd need multiple versions of and/or a generic version of any function with CeedScalar in its prototype whether or not we intend to expose it to users, right?

Actually, if CeedScalar isn't known at compile time, we couldn't use it internally for computation anywhere in the library...?

Yeah, I envision that everyone will be content using f64 to compute basis data, except a rare few who want to experiment with __float128. Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar and make that next step in a future PR?

Another thing to consider with the basis data, I guess, is that we may also want the storage to be CeedScalarArray type, so if, for example, the compute precision doesn't match CeedBasisScalar, we would have a convenient place to store the converted array and re-use every time the basis is applied.

Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar and make that next step in a future PR?

Not sure. I feel like this will already be a big PR -- I haven't even added/finalized any tests for making sure the multiprecision vectors are working, and once hip-ref is more or less finalized, I will add the same changes for cuda-ref (and applicable ones for ref).

We might also want to add in this PR the extra versions/changes for the other vector functions: SetValue, Norm, Scale, and AXPY...

I should also highlight a few other discussion points in this draft that I haven't pointed out yet; I'll do that now.

There's a merge conflict with main now thanks to the SyncArray update PR, making rebasing less straightforward (when I tried it previously). If we're okay with squash-merging this eventually, it will be fine, but that may also be something to consider...unless I start a "clean" branch with the finalized Vector changes, delete this branch, and we continue the PR from there.

Another question about the possible CeedBasisScalar, this would be defined at libCEED compile time, right? And could not be changed without re-compiling? And only things that we don't switch to CeedBasisScalar would need to be available in multiple versions in order to select the correct one with ceed-f32.h?

(Also, I agree about the naming being consistent -- from your much earlier review comment. I like FP better, so I'd vote for ceed-fp32.h, but would it be too much of a hassle to change the header names at this point since we've already released versions with the current names?)

Exactly right in the first paragraph.

I think our norms and axpy are mostly just used for testing without PETSc or MFEM -- libCEED doesn't intend to be a complete solvers library so any real caller has a library for that stuff. We could just use one precision (CeedBasisScalar or maybe there's a better name for this concept that it's not problem-sized) or we could make it dispatch by having the internal computation always work in CeedBasisScalar, but promote f32 inputs to AXPY and demote outputs of Norm (rather than plumbing both variants into backends).

I don't especially care on the names, just consistency. There may not be many direct includes in the wild. f32 and f64 are the Rust type names, fwiw.

I just don't want any unintuitive surprises for users...for example, in the norm implementation for CPU backends (done in the interface), we call GetArrayRead followed by using that data to compute the norm. But in a multiprecision case, what precision should be used to get the array? If the vector only has float data and CeedBasisScalar (or whatever we want to call it) is double, for example, calling GetArrayRead in CeedBasisScalar would do unnecessary conversion (and maybe allocation, too) in order to do the computation in CeedBasisScalar when the expected behavior would be to use float.

So, maybe it would be useful to make the backend versions of Norm, SetValue, Scale, and AXPY be Generic like the Get/Set/Take/Sync functions, and have different interface versions which call the generic function with the correct precision? (And also move the Norm implementation into the ref backend instead of interface-level.)

Then what if the interface level had, as an example for CeedVectorNorm:

CeedVectorNormFP32: calls the generic version with CEED_SCALAR_FP32

CeedVectorNormFP64: calls the generic version with CEED_SCALAR_FP64

CeedVectorNorm : result is always CeedBasisScalar (for the prototype, known at compile time), but computation happens in whatever precision the function deems "most likely to be intended" by the call (if only one precision is currently valid for the vector, use that precision; if two or more precisions are valid, pick one based on some heuristic: most precise? Least, for better performance?)

Of course, I guess that would ruin the plan to have *FP32/*FP64/etc. functions defined as their corresponding "regular" versions in the ceed-fp32 or ceed-fp64 headers as a way to set CeedScalar without recompiling the library, if CeedVectorNorm already had a distinct use beyond being a stand-in for one of the specific precision versions. So maybe we shouldn't do that. But I think the intended behavior of Norm based on whatever data is currently valid in the vector is unclear...

Then there is the issue with AXPY: in the case where the input vectors do not have the same valid precisions, what should we do? Should we specify what precision we want to use, or let the function pick based on some heuristic?
In the same vein, we would also need to decide how we want PointwiseMult to behave in the case of multiple precisions, even thought it doesn't take a scalar in its prototype, with the same questions as AXPY.

For Scale, I'd assume we'd want the behavior to be to scale all currently valid precisions in the vector, so that's easy. It could take the CeedBasisScalar type to avoid needing multiple versions.

(We really probably should have a different name for that type...ideas? CeedBaseScalar? CeedDataScalar?)

nbeams · 2022-05-09T20:40:07Z

backends/hip-ref/ceed-hip-ref-vector.c

+static inline int CeedScalarTypeGetSize_Hip(Ceed ceed, CeedScalarType prec,
+    size_t *size) {
+  switch(prec) {
+  case CEED_SCALAR_FP32:
+    *size = sizeof(float);
+    break;
+  case CEED_SCALAR_FP64:
+    *size = sizeof(double);
+    break;
+  default:


I put this in the backend for now while testing with hip-ref, but is this something that could be useful/necessary to have at an interface level for all backends, rather than each ref backend duplicating it?

nbeams · 2022-05-09T20:44:04Z

interface/ceed-vector.c

+/**
+  @brief Check the current status of the CeedVector's data arrays.
+
+  This function sets unsigned int parameters indicating for which precisions
+  the data for mem_type is not NULL, for borrowed and owned arrays, and also
+  which precisions are currently valid.
+  The values can be checked bitwise, such that, e.g.,
+  (valid_status & (1 << CEED_SCALAR_FP32)) will be true if the FP32 array is
+  currently valid, and false otherwise.
+
+  @param vec              CeedVector for which to check status
+  @param mem_type         Mem type for which to check status
+  @param valid_status     Status indicator for valid data
+  @param borrowed_status  Status indicator for borrowed arrays
+  @param owned_status     Status indicator for owned arrays
+**/
+int CeedVectorCheckArrayStatus(CeedVector vec, CeedMemType mem_type,


I added this function to the interface level, thinking it would be used by tests for multiprecision vector functionality (and potentially, users may want this feature).
It also has a backend implementation, so it's added to the CeedVector_private struct. Since this affects the user interface, I wanted to make sure and point it out to see what people thought.

If we keep this, we may also want an interface function for convenient "pretty" printing of status for all vector arrays, or we could implement such a function in the tests directly.

nbeams added 6 commits April 4, 2022 16:11

WIP multiprecision vectors: Change Vector read/write/set backend func…

7016e0b

…tions to generic version with precision parameter

WIP multiprecision vectors: adding multiprecision storage and some fc…

775eb4c

…ns to hip-ref

WIP hip-ref: use backend fcn for sync, streamline logic

3b67d1a

WIP: make style

88a3338

Add specific FP32/FP64 Vector access functions to interface

3e94c0d

WIP, experimental: add user-level function for checking array status …

fd75fdf

…for different precisions

nbeams added design interface GPU CPU 0-WIP labels Apr 22, 2022

nbeams commented Apr 22, 2022

View reviewed changes

backends/hip-ref/ceed-hip-ref-vector.c Show resolved Hide resolved

nbeams mentioned this pull request Apr 25, 2022

Use backend functions for SyncArray in CUDA and HIP #950

Merged

nbeams and others added 3 commits April 26, 2022 17:30

WIP: use Generic vector access functions and void pointers in other b…

80606a9

…ackend functions

include: cleaner CEED_NUM_PRECISIONS

5c8e206

WIP: backends/cuda: fix type mismatch

df27fe0

jedbrown reviewed May 3, 2022

View reviewed changes

nbeams commented May 9, 2022

View reviewed changes

Merge branch 'main' into natalie/multiprec-vec

ed90cc3

nbeams mentioned this pull request Aug 26, 2022

Mixed precision functionality #778

Open

nbeams added 2 commits August 26, 2022 15:14

Merge branch 'main' into natalie/multiprec-vec

36d404f

Resolve conflicts when updating from main

112a04a

nbeams force-pushed the natalie/multiprec-vec branch from 703f404 to 112a04a Compare August 26, 2022 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multiple precisions in CeedVector #948

Enable multiple precisions in CeedVector #948

nbeams commented Apr 22, 2022 •

edited

jedbrown left a comment

jedbrown May 3, 2022

jedbrown May 3, 2022

jedbrown May 3, 2022

nbeams May 5, 2022

jedbrown May 6, 2022

nbeams May 6, 2022

nbeams May 6, 2022

jedbrown May 9, 2022

nbeams May 9, 2022 •

edited

nbeams May 11, 2022

jedbrown May 11, 2022

nbeams May 12, 2022

nbeams May 9, 2022

nbeams May 9, 2022

nbeams May 9, 2022

-#include "ceed-f64.h"
+#ifndef CEED_SCALAR_TYPE
+#  include "ceed-f64.h"
+#endif

Enable multiple precisions in CeedVector #948

Are you sure you want to change the base?

Enable multiple precisions in CeedVector #948

Conversation

nbeams commented Apr 22, 2022 • edited

jedbrown left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbeams May 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbeams commented Apr 22, 2022 •

edited

nbeams May 9, 2022 •

edited