Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 cycle count #216

Open
hannesm opened this issue Mar 19, 2024 · 2 comments
Open

ARM64 cycle count #216

hannesm opened this issue Mar 19, 2024 · 2 comments

Comments

@hannesm
Copy link
Member

hannesm commented Mar 19, 2024

Since we switched from Cstruct.t to string, there are some failures on arm64 architectures that read_virtual_count returns the same value in subsequent calls to it (see tests/test_entropy.ml).

What we use:

#if defined (__arm__)
/*
 * The ideal timing source on ARM are the performance counters, but these are
 * presently masked by Xen.
 * It would work like this:

#if defined (__ARM_ARCH_7A__)
  // Disable counter overflow interrupts.
  __asm__ __volatile__ ("mcr p15, 0, %0, c9, c14, 2" :: "r"(0x8000000f));
  // Program the PMU control register.
  __asm__ __volatile__ ("mcr p15, 0, %0, c9, c12, 0" :: "r"(1 | 16));
  // Enable all counters.
  __asm__ __volatile__ ("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));

  // Read:
  unsigned int res;
  __asm__ __volatile__ ("mrc p15, 0, %0, c9, c13, 0": "=r" (res));
*/
#if defined(__ocaml_freestanding__) || defined(__ocaml_solo5__)
static inline uint32_t read_virtual_count (void)
{
  uint32_t c_lo, c_hi;
  __asm__ __volatile__("mrrc p15, 1, %0, %1, c14":"=r"(c_lo), "=r"(c_hi));
  return c_lo;
}
#else
/* see https://github.com/mirage/mirage-crypto/issues/113 and
https://chromium.googlesource.com/external/gperftools/+/master/src/base/cycleclock.h
   The performance counters are only available in kernel mode (or if enabled via
   a kernel module also in user mode). Use clock_gettime as fallback.
 */
#include <time.h>
static inline uint32_t read_virtual_count (void)
{
  uint32_t pmccntr;
  uint32_t pmuseren;
  uint32_t pmcntenset;
  // Read the user mode perf monitor counter access permissions.
  __asm__ __volatile__ ("mrc p15, 0, %0, c9, c14, 0" : "=r" (pmuseren));
  if (pmuseren & 1) {  // Allows reading perfmon counters for user mode code.
    __asm__ __volatile__ ("mrc p15, 0, %0, c9, c12, 1" : "=r" (pmcntenset));
    if (pmcntenset & 0x80000000ul) {  // Is it counting?
      __asm__ __volatile__ ("mrc p15, 0, %0, c9, c13, 0" : "=r" (pmccntr));
      // The counter is set up to count every 64th cycle
      return pmccntr;
    }
  }
  struct timespec now;
  clock_gettime (CLOCK_MONOTONIC, &now);
  return now.tv_nsec;
}
#endif /* __ocaml_freestanding__ || __ocaml_solo5__ */
#endif /* arm */

#if defined (__aarch64__)
#define	isb() __asm__ __volatile__("isb" : : : "memory")
static inline uint64_t read_virtual_count(void)
{
  uint64_t c;
  isb();
  __asm__ __volatile__("mrs %0, cntvct_el0":"=r"(c));
  return c;
}
#endif /* aarch64 */

I suspect the "mrs %0, cntvct_el0":"=r"(c) is what is called. This is for both freestanding (i.e. MirageOS) and Unix applications (the test that fails above is only executed on Unix).

How should this be dealt with?

  • conditional compilation, and use e.g. clock_gettime(CLOCK_MONOTONIC, _) for Unix applications?
  • use some other instruction (since I'm not an expert on arm64, I'm interested to understand what should be used there)

The issue with conditional compilation is potential bitrot and code that is executed less often (and thus may be broken once you deploy it for real). But since we already have conditional compilation for arm32, riscv - it may be the path to go.

Any opinions?

@dinosaure
Copy link
Member

dinosaure commented Mar 19, 2024

I suspect the "mrs %0, cntvct_el0":"=r"(c) is what is called.

I confirm that it's what utime use for aarch64:
https://github.com/robur-coop/utime/blob/main/lib/tscclock.c#L80-L83

@hannesm
Copy link
Member Author

hannesm commented Mar 19, 2024

From https://cpucycles.cr.yp.to/counters.html:

  • arm64-pmc: Requires a 64-bit ARMv8-A platform. Uses mrs %0, PMCCNTR_EL0 to read the cycle counter. Requires user access to the cycle counter, which is not enabled by default but can be enabled under Linux via a kernel module.

  • arm64-vct: Requires a 64-bit ARMv8-A platform. Uses mrs %0, CNTVCT_EL0 to read a "virtual count" timer. This is an off-core clock, typically running at 24MHz. Results are scaled by libcpucycles.

So, if it is "only" 24MHz (the vct that we're using), there's no wonder we're sometimes retrieving the same value out of it. At the same time, I'm not sure what to do (esp. since I don't use any arm64 device - questions include whether that is good enough, good enough for MirageOS, or we should change to something else)? Any opinions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants