Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify CAMLalign and use C11 max_align_t #13139

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Apr 30, 2024

Some cleanups removing checks and workarounds for older compilers, assuming that the compiler supports C11 or C++11 out of the box. We may use _Alignas (since C11) or alignas (since C23) directly, and use the max_align_t type. Unfortunately, support for max_align_t is missing from the Windows C standard library.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. +1 for using _Alignas and alignas in preference to attributes. One suggestion below concerning the MSVC fallback.


struct pool_block {
#ifdef DEBUG
intnat magic;
#endif
struct pool_block *next;
struct pool_block *prev;
union max_align data[]; /* not allocated, used for alignment purposes */
max_align_t data[]; /* not allocated, used for alignment purposes */
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside with this use of double as fallback max_align_t type is that double has rather low alignment: 8 on x86_64, 4 on x86_32, while most SSE vector instructions require 16-alignment. (In Linux x86_64 and macOS x86_64, max_align_t has 16 alignment.) What about using an explicit 16 alignment as the fallback case?

struct pool_block {
#ifdef DEBUG
  intnat magic;
#endif
  struct pool_block *next;
  struct pool_block *prev;
#ifdef HAVE_MAX_ALIGN_T
  max_align_t data[]; /* not allocated, used for alignment purposes */
#else
  CAMLalign(16) char data[];  /* 16 is a reasonable alignment default */
#endif
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to align (ha, ha) with your suggestion, but: MSVC C++ cstddef.h header uses double alignment for max_align_t, and clang-cl uses double too, which I think makes it a reasonable default for Windows.
As for a general fallback, I hope that other compilers+libc aren't as buggy, and I wonder if we do need to provide a definition, or rather catch a missing definition as a compilation failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC C++ cstddef.h header uses double alignment for max_align_t

It is in error, then. The program below, compiled with MSVC, prints 16, showing that there are types with alignment greater than 8.

#include <iostream>
#include <xmmintrin.h>

int main() {
    std::cout << alignof(__m128) << '\n';
    return 0;
}

In this PR, you're not trying to emulate whatever strange choices MSVC does, but to make sure OCaml's pool allocator works as intended. If the intent is to align for the maximal alignment constraint of the target platform, the alignment must be >= 16 on x86, because SSE instructions. If the intent is to align for the biggest datatype OCaml stores in heap blocks, word-alignment is enough and you don't need to add anything to struct pool_block to guarantee it, as it already contains two word-sized pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about defaulting to long double rather than double then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in error, then. The program below, compiled with MSVC, prints 16, showing that there are types with alignment greater than 8.

At least for C++, I don't think there's an error.

The type max_align_t is a POD type whose alignment requirement is at least as great as that of every scalar type, and whose alignment requirement is supported in every context. [C++11]

#include <iostream>
#include <xmmintrin.h>
#include <cstddef>
#include <type_traits>

int main() {
    std::cout << alignof(__m128) << std::endl
              << alignof(std::max_align_t) << std::endl
              << std::is_scalar<__m128>() << std::endl;
    return 0;
}

shows 16, 8, 0, indicating that __m128 isn't considered a scalar type, and thus the definition of std::max_align_t is consistent (with respect to the __m128 type).

The definition for C is more vague and I guess it could be argued that 16 would be a correct value.

max_align_t which is an object type whose alignment is as great as is supported by the implementation in all contexts; [C11]

What about defaulting to long double rather than double then?

long double is identical to double under MSVC 1.

Thanks for the thorough review. I think the real question is indeed whether the intent is to align for the maximal alignment constraint of the target platform, or to align for the biggest datatype OCaml stores in heap blocks. None of the fields in the former union max_align had a 16 bytes alignment, and all worked well, hasn't it? Now it's not clear to me that this field is actually needed.

Footnotes

  1. https://learn.microsoft.com/en-us/cpp/c-language/type-long-double?view=msvc-170

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #12212. My guess is that allocation pools are realigned no matter how struct pool_block is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the same fallback as GCC and clang do. They don't seem to account for vector extensions (also, why stop at __m128 when there's also 256-bits vectors?). My macOS M1 which supports NEON still defines alignof(max_align_t) == 8.

#if !defined(HAVE_MAX_ALIGN_T)
typedef struct {
  CAMLalign(long long) long long ll;
  CAMLalign(long double) long double ld;
} max_align_t;
#endif

MinGW-w64 defines a 16-bytes alignment for max_align_t, but contrary to MSVC, it supports long double, aligned to 16 bytes.

The original justification comes from this comment:

That trick with max_align is not needed at all for correctness (void* would do just fine as far as correctness goes); however, it is supposed to increase the chances of getting a more favourable alignment of data in terms of performance (it is important, since data is going to be accessed much more often than the header of the block).

This patch would move data from an 8-bytes boundary to a 16-bytes boundary. I don't know if it would affect performance, but it would likely waste a bit of space.

Thinking about it, the data field type probably shouldn't be max_align_t but rather char as in your first suggestion.

  CAMLalign(max_align_t) char data[];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this comment" refers to the OCaml 4 memory allocator, which has been extensively rewritten for OCaml 5. Please see #12212 and the corresponding OCaml 5 code to check how alignment is actually handled in pool blocks.

Copy link
Member

@gasche gasche Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that you are talking about two different things, which very conveniently have the exact same name in the codebase. In memory.c (which the PR is hacking on), "pool" refers to a single global pool of blocs (struct pool_block) that have been caml_stat_alloced by the runtime, and is used to make sure that they are all freed on program termination -- I think that it is only used in cleanup-at-exit mode? There is a single memory pool, which is basically a doubly-linked list, and each caml_stat_alloc adds one element to it.

In shared_heap.c, struct pool refers to one block or "slab" of memory in the memory, of size 4096 words, and owned by a domain-local caml_heap_state structure; each pool has its own free list. I think that the alignment constraints that @xavierleroy has in mind apply to values stored in the shared_heap.c pools, but that those are in fact currently not stored in the global memory.c pool, as they are allocated by caml_mem_map in shared_heap.c:pool_acquire. (Large objects are allocated differently in large_allocate, and oddly enough they seem to use malloc and not caml_stat_alloc.)

@MisterDA MisterDA force-pushed the alignas-simplify-CAMLalign branch from d226d67 to 3b09e44 Compare June 3, 2024 13:57
In our public headers, we're using either:
- C23 where `alignas` is a keyword;
- C++11 or later where `alignas` is also available;
- C11/C17 where `_Alignas` is available.
For C++, MSVC defines `using max_align_t = double`. Take inspiration
from GCC and LLVM for the fallback implementation.

It's unlikely that we need to carry a fallback implementation for
other compilers. If so, the following could be used:

    #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L || \
        defined(__cplusplus)
    #define CAMLalignof(n) alignof(n)
    #else
    #define CAMLalignof(n) _Alignof(n)
    #endif

    typedef struct {
      long long ll CAMLalign(CAMLalignof(long long));
      long double ld CAMLalign(CAMLalignof(long double));
    } max_align_t;

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ginclude/stddef.h
https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__stddef_max_align_t.h

https://en.cppreference.com/w/c/types/max_align_t
@MisterDA MisterDA force-pushed the alignas-simplify-CAMLalign branch from 3b09e44 to bc1c885 Compare June 3, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants