Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizated calculation of shared power of 2 in bn_gcd #24332

Closed
wants to merge 14 commits into from

Conversation

landgrafhomyak
Copy link
Contributor

No description provided.

@openssl-machine openssl-machine added the hold: cla required The contributor needs to submit a license agreement label May 5, 2024
@landgrafhomyak
Copy link
Contributor Author

https://www.openssl.org/policies/cla.html:

Every non-trivial contribution needs to be covered by a signed Contributor License Agreement (CLA) from all original authors.

It's trivial contribution, so CLA not needed.

@kroeckx
Copy link
Member

kroeckx commented May 5, 2024 via email

@landgrafhomyak
Copy link
Contributor Author

This looks like it undoes constant time code.

Execution time of this function depends on size of input numbers, so it's not constant initially.

@kroeckx
Copy link
Member

kroeckx commented May 5, 2024 via email

@paulidale
Copy link
Contributor

I don't think this is a trivial contribution from a copyright viewpoint.

I'm also concerned about the loss of constant time here.

Do you have any performance figures for comparison?

@landgrafhomyak
Copy link
Contributor Author

Updated optimization to be consttime.

I didn't do perfomance measures, but there is 64 (32 on old CPUs) times less loop iterations so it must be faster.

mask = r->d[i] | g->d[i];
if ((pow2_flag == 0) | (mask == 0)) {
shifts += BN_BITS2 & pow2_flag;
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes everything non-constant time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent of input numbers (except a==0 || b==0, but this case checked before) it will run loop to check each bit in chunk exactly once:

mask    branch
0000    mask == 0, add 64 and continue
0000    mask == 0, add 64 and continue
0000    mask == 0, add 64 and continue
...
0000    mask == 0, add 64 and continue
1234    mask != 0, check each bit
5678    pow2_flag == 0, add 64&0 and continue
0000    pow2_flag == 0, add 64&0 and continue
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the if and continue concern me for being non-constant time. I guess that you can write code so that the if is evaluated exactly the same amount of times, but the CPU will speculatively execute different code and you end up being able to measure the difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we talk about constant time code, we're mostly talking about branch free code, where the for loop is the only branch code.

@landgrafhomyak
Copy link
Contributor Author

landgrafhomyak commented May 6, 2024

I made some measures of perfomance. New code faster ~25 times when compiled with gcc -O0 on WSL:

Old | Size: 100000 Clocks per test: 27043
New | Size: 100000 Clocks per test: 1251
Test program (spoiler)
#include <cstdint>
#include <cstdlib>
#include <ctime>
#include <iostream>

#define SIZE (100000)

#define TESTS (1000)

static std::uint64_t r[SIZE];
static std::uint64_t g[SIZE];

template<bool new_impl>
void test();

int main() {
  std::size_t i;
  std::srand(123);
  for (i = 0; i < SIZE; i++) {
      r[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
      g[i] = (((std::uint64_t) std::rand()) << 32) | (std::uint64_t) std::rand();
  }

  auto start = std::clock();

  for (i = 0; i < TESTS; i++)
      test<false>();

  std::cout << "Old | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;

  start = std::clock();

  for (i = 0; i < TESTS; i++)
      test<true>();

  std::cout << "New | Size: " << SIZE << " Clocks per test: " << (std::clock() - start) / TESTS << std::endl;
}

template<bool new_impl>
[[clang::noinline, gnu::noinline]]
void test() {

  if constexpr (new_impl) {
      std::uint64_t pow2_temp, pow2_mask;
      int pow2_flag, pow2_chunk_index, pow2_shifts, i, j;
      pow2_flag = 1;
      pow2_chunk_index = 0;
      pow2_shifts = 0;
      for (i = 0; i < SIZE; i++) {
          pow2_mask = r[i] | g[i];
          pow2_temp = (pow2_flag != 0) & (pow2_mask != 0);
          pow2_flag &= !pow2_temp;
          pow2_temp = ((~pow2_temp & (pow2_temp - 1)) >> 63) - 1;
          pow2_shifts += 1uL & pow2_temp;
          pow2_temp = (i ^ pow2_chunk_index) & pow2_temp;
          pow2_chunk_index ^= pow2_temp;
      }
      pow2_shifts *= 64;
      pow2_mask = r[pow2_chunk_index] | g[pow2_chunk_index];
      pow2_flag = 1;
      for (j = 0; j < 64; j++) {
          pow2_flag &= pow2_mask;
          pow2_shifts += pow2_flag;
          pow2_mask >>= 1;
      }
  } else {
      std::uint64_t mask;
      int j, bit, shifts=0, i;
      bit = 1;
      for (i = 0; i < SIZE; i++) {
          mask = ~(r[i] | g[i]);
          for (j = 0; j < 64; j++) {
              bit &= mask;
              shifts += bit;
              mask >>= 1;
          }
      }
  }
}

crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
@t8m t8m added branch: master Merge to master branch tests: exempted The PR is exempt from requirements for testing triaged: performance The issue/pr reports/fixes a performance concern labels May 6, 2024
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 7, 2024
@kroeckx
Copy link
Member

kroeckx commented May 7, 2024

We have a lot of test failing in CI, so I assume the result of the calculation is wrong now.

@kroeckx
Copy link
Member

kroeckx commented May 7, 2024

@romen: Is this something you can/want to review?

@t8m
Copy link
Member

t8m commented May 7, 2024

I still have some suspicions about how the compiler might make the logical ! operations non-const time. Perhaps the const_time_select...() functions could be used instead? Otherwise this looks fairly good to me.

@kroeckx
Copy link
Member

kroeckx commented May 7, 2024

If this is merged, we might want to reevaluate if ossl_bn_rsa_fips186_4_derive_prime() should use gcd or not.

@paulidale
Copy link
Contributor

I made some measures of perfomance. New code faster ~25 times when compiled with gcc -O0 on WSL:

Old | Size: 100000 Clocks per test: 27043
New | Size: 100000 Clocks per test: 1251

I think it would be enlightening to see the min and max times. That would confirm constant time execution.

@romen
Copy link
Member

romen commented May 8, 2024

I will try to find the time for a proper review, but I am a bit swamped at the moment.

The first thing that came to mind inspecting the code changes, is that as a general rule of thumb, one generally wants to avoid mixing bitwise operations and the ! operator.
Quite often the ! is processed by the compiler in a way that introduces a branch in the resulting binary code, or that could be handled as a branch by the hw optimization heuristics.
I computing this kind of bitwise masks, we tend to prefer using only bitwise operators &¦^~ as that usually prevents optimization steps to unfold the computation into branches.

I might try to summon @cpereida and @bbbrumley if they can volunteer their expertise.

@bbbrumley
Copy link
Contributor

Thank you for adding the unit tests! I think this is a solid contribution.

@paulidale paulidale added the approval: review pending This pull request needs review by a committer label May 12, 2024
@t8m t8m added tests: present The PR has suitable tests present and removed tests: exempted The PR is exempt from requirements for testing labels May 13, 2024
@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer approval: done This pull request has the required number of approvals labels May 13, 2024
@t8m
Copy link
Member

t8m commented May 13, 2024

Commits to be squashed when merging.

@t8m t8m added the approval: done This pull request has the required number of approvals label May 13, 2024
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: done This pull request has the required number of approvals labels May 14, 2024
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels May 15, 2024
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@t8m
Copy link
Member

t8m commented May 15, 2024

Squashed and merged to the master branch. Thank you for your contribution.

@t8m t8m closed this May 15, 2024
openssl-machine pushed a commit that referenced this pull request May 15, 2024
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #24332)
openssl-machine pushed a commit that referenced this pull request May 15, 2024
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #24332)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources tests: present The PR has suitable tests present triaged: performance The issue/pr reports/fixes a performance concern
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants