Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library for compile time instantiation of elliptic curves #3979

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

randombit
Copy link
Owner

@randombit randombit commented Apr 6, 2024

This new approach has a lot of benefits. It avoids the side channels that plague anything based on BigInt (or any general purpose arbitrary length integer type), it also avoids almost all heap allocations and inlines nicely.

Performance advantage depends on curve and compiler. For smaller curves (eg secp256r1) ECDSA seems be almost twice as fast. For larger such as secp521r1 the advantage is more like 40%. Clang seems to be able to optimize everything much better than GCC here, for reasons I haven't been able to nail down.

Performance can be improved significantly from here; besides P-521 we're not taking advantage of specialized reductions possible with many primes, nor is there a system to express the precomputed addition chains used for inversions. The BigInt based code makes use of both of these already.

One disadvantage is that the code size is quite significant. The worst overhead is actually the symbol names. I've applied various tricks to reduce these but still over half the size of math_pcurves.o is (per Google's bloaty tool) symbols. I'd like to reduce these in the future, but I don't think it's a blocker to merge.

@randombit randombit added this to the Botan 3.5.0 milestone Apr 6, 2024
@randombit randombit requested a review from reneme April 6, 2024 10:24
@reneme
Copy link
Collaborator

reneme commented Apr 6, 2024

Awesome! I'll certainly have a look. Not before tomorrow, though.

@randombit
Copy link
Owner Author

randombit commented Apr 6, 2024

No rush! This is certainly not going into 3.4 so there is plenty of time.

Most interesting thing about the sea of red in CI - GCC 11.4 on x86 fails with constexpr timeout, while same version of GCC on say Aarch64 or S390 accepts. So constexpr limits are architecture dependent [*] :( and thus intrinsically flaky since you can be near some limit without knowing it

[*] And also version dependent since GCC 13 on my machine is fine with the code without increasing the constexpr limit.

@randombit
Copy link
Owner Author

Clang seems to have a nasty bug where std::is_constant_evaluated returns false incorrectly in constexpr context, then it rejects the code because we used asm or volatile in constexpr context in the !std::is_constant_evaluated branch 😡

It works fine for me in Clang 17 on my machines so I'm assuming this was a Clang bug that was fixed subsequently. I'd vote for just bumping the Clang minimum version - that's an absolutely insane bug and nearly impossible to work around - but unfortunately the version in Android NDK also has the bug and we are stuck with that version at least until the next NDK release.

@coveralls
Copy link

coveralls commented Apr 6, 2024

Coverage Status

coverage: 91.771% (-0.1%) from 91.868%
when pulling 256adc3 on jack/pcurves
into 1a5cf87 on master.

@randombit
Copy link
Owner Author

Downside of this approach is that (due to name mangling being somewhat horribly thought out for this case) object sizes are really big. Even with just 3 curves, on my machine the object file is one of the largest from the whole library, that will get a lot worse with 27. And compile times may prove untenable.

I'm sure 90% of this can be salvaged but I suspect in the end the string based instantiation won't work in practice 😭 which is a shame since it's quite pretty imo

@randombit
Copy link
Owner Author

Clang bug might be llvm/llvm-project#55638

@randombit
Copy link
Owner Author

This error

 call to consteval function 'Botan::p_minus<(unsigned char)'\x01', unsigned long, 4UL>' is not a constant expression

Looks like llvm/llvm-project#51182

@randombit
Copy link
Owner Author

I can repro the std::if_constant_evaluated problem using Clang 16 on my machine so it looks to be a bug fixed in Clang 17, though I don't see anything that looks relevant in the release notes.

@randombit randombit force-pushed the jack/pcurves branch 4 times, most recently from 3496464 to cd40575 Compare April 7, 2024 13:17
Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! 😄

I merely skimmed through the changes and left a few suggestions and comments here and there. No thorough review whatsoever.

*/

#ifndef BOTAN_PCURVES_UTIL_H_
#define BOTAN_PCURVES_UTIL_H_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for this file (and perhaps other places):

I'd suggest to use std::span<const W, N> instead of std::array<>& for the parameters of those utilities. No need to restrict those functions to arrays, IMO. And a static-length span should provide the same guarantees and optimization opportunities as an array, no?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main problem is for some technical reason I'm not bothering to learn the details of, C++ can't deduce the conversion from a statically sized array to a statically sized span. https://stackoverflow.com/questions/70983595

This can be worked around with some additional noise but the additional flexibility doesn't buy us anything in this context so I'd rather keep things as simple as possible.

src/lib/pubkey/pcurves/pcurves_util.h Outdated Show resolved Hide resolved
src/lib/pubkey/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines 217 to 225
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
auto bytes = store_be(v);

if constexpr(Self::BYTES == Self::N * WordInfo<W>::bytes) {
return bytes;
} else {
// Remove leading zero bytes
const size_t extra = Self::N * WordInfo<W>::bytes - Self::BYTES;
std::array<uint8_t, Self::BYTES> out;
copy_mem(out.data(), &bytes[extra], Self::BYTES);
return out;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could get away without the if constexpr and the potential memory copy entirely?

Suggested change
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
auto bytes = store_be(v);
if constexpr(Self::BYTES == Self::N * WordInfo<W>::bytes) {
return bytes;
} else {
// Remove leading zero bytes
const size_t extra = Self::N * WordInfo<W>::bytes - Self::BYTES;
std::array<uint8_t, Self::BYTES> out;
copy_mem(out.data(), &bytes[extra], Self::BYTES);
return out;
}
auto v = bigint_monty_redc(m_val, Self::P, Self::P_dash);
std::reverse(v.begin(), v.end());
std::array<uint8_t, Self::BYTES> out = {0};
static_assert(Self::N * WordInfo<W>::bytes >= Self::BYTES); // is this guranteed somehow anyway?
const size_t zero_padding_offset = Self::N * WordInfo<W>::bytes - Self::BYTES;
store_be(std::span{out}.template subspan<zero_padding_offset>(), v);
return out;

... this is untested. Just a sketch of the idea. store_be with a statically-sized out-param will static_assert that the byte buffer matches the input range exactly.

src/lib/pubkey/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/pubkey/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
void conditional_assign(bool cond, const Self& pt) {
m_x.conditional_assign(cond, pt.x());
m_y.conditional_assign(cond, pt.y());
m_z.conditional_assign(cond, pt.z());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is needed for a constant-time implementation? So far, I saw it as an unspoken rule that all methods that are "supposed to be constant-time" are prefixed with ct_.... I found this a helpful convention, that might be applicable here as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code I'm going for the opposite convention namely that everything is constant time by default and anything that is intentionally variable time has a _vartime suffix.

(There are some current exceptions eg in the point multiplication that will be fixed before merging.)

@randombit
Copy link
Owner Author

GCC 11 miscompilation, excellent

@randombit randombit force-pushed the jack/pcurves branch 12 times, most recently from 2feb181 to 3bde391 Compare April 14, 2024 09:51
@randombit
Copy link
Owner Author

@reneme I think this is ready to go.

Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions/nits. I'm still figuring out the structure of this. So far: really nice!
Currently, my biggest concerns are:

  • Why are the instances of the PrimeOrderCurveImpl template singletons? My concern is that the singletons' states will become an issue in testing (for instance). And can those singletons be avoided? I didn't get deep enough into the implementation to be able to have a decent opinion on that.
  • Should there be a way to select specific curves at compile-time? At the moment, its "all-or-nothing" and, I feel, especially for embedded use cases it might be helpful for binary size to be able to fine-tune which curves should be compiled.

And one general wish:

Some Doxygen comments describing the rough use case of each class and also their relationship would really help to get started with this. 😅

src/tests/test_pcurves.cpp Outdated Show resolved Hide resolved
src/tests/tests.cpp Show resolved Hide resolved
Comment on lines 96 to 97
/// Creates a generic non-optimized version
//static std::shared_ptr<const PrimeOrderCurve> from_params(...);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO or left over?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point a left over - removed.

It was (somewhat still is) unclear how to handle curves that are not baked in (either application specific curves but also the various "misc" curves like secp192k1, x962_p239v1, etc). One approach would be to somehow implement PrimeOrderCurve for these in a generic way and that's what this commented out decl was leading to.

In practice though, examining the list of curves remaining, some have some rather unappealing properties that would be difficult to fit into this model. Particularly curves where p % 4 != 3, or where the order one bit larger than the field.

So I think what's going to happen, at least in the short term (ie prior to Botan 4 where we can maybe remove some of the most unappealing builtin curves, plus remove explicit encoding of EC curves in keys/ASN.1, plus put some real restrictions on what applications use for custom curves) is that for anything that's not a builtin, we default back to the old generic BigInt approach.

Comment on lines +99 to +59
typedef std::array<word, StorageWords> StorageUnit;
typedef std::shared_ptr<const PrimeOrderCurve> CurvePtr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move up those definitions to use CurvePtr for from_name() and from_id(), perhaps?

src/lib/math/pcurves/pcurves.h Show resolved Hide resolved
src/lib/math/pcurves/pcurves_util.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
@reneme
Copy link
Collaborator

reneme commented May 28, 2024

For the record: I'm working on some high-level overview to structure my reading. Perhaps its helpful for others as well:

Updated: 30.05.2024

image
Excalidraw.com

Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I mostly looked into the structure of the implementation. We'll probably go deeper into the math next week (when @FAlbertDev is back from vacation 😰). Some more nits and suggestions I found on the way are below.

src/lib/math/pcurves/pcurves.cpp Show resolved Hide resolved
Comment on lines +35 to +43
class Params final : public EllipticCurveParameters<
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFF0000000000000000FFFFFFFF",
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFF0000000000000000FFFFFFFC",
"B3312FA7E23EE7E4988E056BE3F82D19181D9C6EFE8141120314088F5013875AC656398D8A2ED19D2A85C8EDD3EC2AEF",
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC7634D81F4372DDF581A0DB248B0A77AECEC196ACCC52973",
"AA87CA22BE8B05378EB1C71EF320AD746E1D3B628BA79B9859F741E082542A385502F25DBF55296C3A545E3872760AB7",
"3617DE4A96262C6F5D9E98BF9292DC29F8F41DBD289A147CE9DA3113B5F0B8C00A60B1CE1D7E819D7A431D7C90EA0E5F",
-12> {
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the compactness of this! Though, you could replace the generic EllipticCurveParameters struct with a concept and re-define the parameter struct for each curve. That would "label" each parameter value to their intended parameter name and also remove the need for the StringLiteral helper. Like shown in this GodBolt.

Admittedly, after typing it out, I'm not super convinced about it myself anymore. However, the concept might still be worthwhile to restrict the typename Params in the EllipticCurve class and elsewhere (eg. MontgomeryRep). Also, this concept could be extended to statically check further requirements of the parameter relationships as needed.

src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves.h Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines +254 to +223
std::array<W, 2 * N> z;
comba_mul<N>(z.data(), a.data(), b.data());
return Self(Rep::redc(z));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps implement this in terms of operator*= like:

Suggested change
std::array<W, 2 * N> z;
comba_mul<N>(z.data(), a.data(), b.data());
return Self(Rep::redc(z));
auto result = a;
result *= b;
return result;

... same for operator+ and the others, of course. That would reduce code duplication a tiny bit.

src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines 1174 to 1292
/*
It's impossible for this addition to ever be a doubling.

Consider the sequence of points that are operated on, and specifically
their discrete logarithms. We start out at at the point at infinity
(dlog 0) and then add the initial window which is precisely P*w_0

We then perform WindowBits doublings, so accum's dlog at the point
of the addition in the first iteration of the loop (when i == 1) is
at least 2^W * w_0.

Since we know w_0 > 0, then in every iteration of the loop, accums
dlog will always be greater than the dlog of the table element we
just looked up (something between 0 and 2^W-1), and thus the
addition into accum cannot be a doubling.
*/
accum += AffinePoint::ct_select(m_table, w_i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for all cases? Since we use blinding (with scalars bigger than the group order), we could end up with an accumulator of, for example, $accum = (order + 1)w_0$ (i.e., we have a dlog of $order + 1 \equiv 1$ ), which is effectively $w_0$.
Of course, this would be extremely unlikely.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, this analysis did not consider "overflow" of the dlog of the accumulator due to blinding. Alas.

Possibly there is some other scalar multiplication algorithm we can use here, I'll have to investigate. IMO it doesn't block merging -- the situation wrt side channels here is still strictly better than with the BigInt based code, it even uses the same algorithm for this -- but this should be addressed.

src/lib/math/pcurves/pcurves_wrap.h Show resolved Hide resolved

constexpr size_t n_words = C::NW.size();

uint8_t maskb[mask_bytes] = {0};

This comment was marked as outdated.

size_t get_window(size_t offset) const { return read_window_bits<WindowBits>(std::span{m_bytes}, offset); }

private:
secure_vector<uint8_t> m_bytes;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we need a dynamic container here? Edit: I'm guessing you don't want to allocate the entire lookup table on the stack. Perhaps leave a comment for future-us?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a stack allocation here is pretty cheap, the main motivation was to avoid putting the scalar onto the stack. But maybe we can just admit that zeroization is a lost cause and use the stack for everything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just do it in the destructor of this helper class?

src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
src/lib/math/pcurves/pcurves_impl.h Outdated Show resolved Hide resolved
Comment on lines +1192 to +1296
if(i <= 3) {
accum.randomize_rep(rng);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any rationale why randomizations aren't necessary after i == 3?

Copy link
Owner Author

@randombit randombit Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH no. This is not well motivated. The idea is that in the first few iterations we only have a few secret bits that have been introduced into accum, and so rerandomziation might be helpful. But this is extremely handwavey.

I'd be open to removing this entirely, or just doing rerandomization once during the first iteration.

If you have contacts at BSI -- they may have better ideas on how to approach this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: Given that the entire multiplication is implemented as constant-time: in particular the table lookup and the point operations; Do we even need blinding here? Or is that merely an additional precaution?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need blinding here?

  • Programmers make mistakes
  • Compilers are smart (sometimes)
  • Might help with EM / power analysis (questionable)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defense in depth. Thanks for the background.

Comment on lines 993 to 1014
constexpr size_t mask_words = BlindingBits / WordInfo<W>::bits;
constexpr size_t mask_bytes = mask_words * WordInfo<W>::bytes;

constexpr size_t n_words = C::NW.size();

uint8_t maskb[mask_bytes] = {0};
rng.randomize(maskb, mask_bytes);

W mask[n_words] = {0};
load_le(mask, maskb, mask_words);
mask[mask_words - 1] |= WordInfo<W>::top_bit;

W mask_n[2 * n_words] = {0};

const auto sw = scalar.to_words();

// Compute masked scalar s + k*n
comba_mul<n_words>(mask_n, mask, C::NW.data());
bigint_add2_nc(mask_n, 2 * n_words, sw.data(), sw.size());

std::reverse(mask_n, mask_n + 2 * n_words);
m_bytes = store_be<secure_vector<uint8_t>>(mask_n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would become more compact if comba_mul and bigint_add2_nc would be able to handle spans.

Suggested change
constexpr size_t mask_words = BlindingBits / WordInfo<W>::bits;
constexpr size_t mask_bytes = mask_words * WordInfo<W>::bytes;
constexpr size_t n_words = C::NW.size();
uint8_t maskb[mask_bytes] = {0};
rng.randomize(maskb, mask_bytes);
W mask[n_words] = {0};
load_le(mask, maskb, mask_words);
mask[mask_words - 1] |= WordInfo<W>::top_bit;
W mask_n[2 * n_words] = {0};
const auto sw = scalar.to_words();
// Compute masked scalar s + k*n
comba_mul<n_words>(mask_n, mask, C::NW.data());
bigint_add2_nc(mask_n, 2 * n_words, sw.data(), sw.size());
std::reverse(mask_n, mask_n + 2 * n_words);
m_bytes = store_be<secure_vector<uint8_t>>(mask_n);
constexpr size_t mask_words = BlindingBits / WordInfo<W>::bits;
constexpr size_t n_words = C::NW.size();
std::array<W, n_words> mask = {};
std::array<W, 2 * n_words> masked_scalar = {};
auto effective_mask = std::span{mask}.template first<mask_words>();
load_le(effective_mask, rng.random_array<effective_mask.size_bytes()>());
effective_mask.back() |= WordInfo<W>::top_bit;
const auto sw = scalar.to_words();
// Compute masked scalar s + k*n
comba_mul<n_words>(masked_scalar.data(), mask.data(), C::NW.data());
bigint_add2_nc(masked_scalar.data(), masked_scalar.size(), sw.data(), sw.size());
std::reverse(masked_scalar.begin(), masked_scalar.end());
m_bytes = store_be<secure_vector<uint8_t>>(masked_scalar);

... also, this assumes that we add the following overload to RandomNumberGenerator:

/**
* Create an array of @p N random bytes.
*
* @tparam N  number of desired random bytes
* @return    an array of @p N random bytes
* @throws Exception if RNG fails
*/
template <size_t N>
std::array<uint8_t, N> random_array() {
   std::array<uint8_t, N> result;
   random_vec(std::span{result});
   return result;
}

@reneme
Copy link
Collaborator

reneme commented Jun 4, 2024

While reviewing the multiplication table helpers, we introduced a few refactorings and added some comments for better comprehension (in #4095): 030c919

Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! Thanks for bearing with the load of comments we've had about that. There are certainly some more nits to be found, but I feel its more efficient if we address them in follow-ups.

@randombit
Copy link
Owner Author

👍 Thanks for the reviews! Some of your review comments still seem worth fixing here, also I need to clean up this dumpster fire of a commit history before merge. I'd also like #4096 to merge prior to this landing so we have some assurance wrt constant time, just in case more substantial reworking is required.

@randombit randombit mentioned this pull request Jun 6, 2024
@randombit randombit force-pushed the jack/pcurves branch 3 times, most recently from e98e9d0 to 0ef936b Compare June 7, 2024 02:24
Copy link
Collaborator

@reneme reneme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added documentation and the co-authorship! 🤩

* are set then add G or H resp. If both bits are set, add GH.
*/
template <typename C, size_t W>
class WindowedMul2Table final {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fabian and I had done a refactoring on this, here: 030c919

I don't remember receiving feedback about this commit from you. Just making sure it doesn't fall through the cracks.
At this point it might not apply neatly anymore, though. I'll rebase later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased: b6e847e

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rebase. I did see the commit but still need to think that one over.

Comment on lines +1310 to +1315
* The W = 1 case is simple; we precompute an extra point GH = G + H,
* and then examine 1 bit in each of x and y. If one or the other bits
* are set then add G or H resp. If both bits are set, add GH.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #4095 we added a bit of ASCII art for the W = 2 case. Perhaps that's worth adding:

     /**
       * The example below is a precomputed table for W=2. The flattened table
       * begins at (x_i,y_i) = (1,0), i.e. the identity element is omitted.
       * The indices in each cell refer to the cell's location in m_table.
       *
       *  x->           0          1          2         3
       *       0  |/ (ident) |0  x     |1  2x      |2  3x     |
       *       1  |3    y    |4  x+y   |5  2x+y    |6  3x+y   |
       *  y =  2  |7    2y   |8  x+2y  |9  2(x+y)  |10 3x+2y  |
       *       3  |11   3y   |12 x+3y  |13 2x+3y   |14 3x+3y  |
       */

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is also part of #3979 (comment))

randombit and others added 5 commits June 7, 2024 19:46
This is designed from the start to avoid side channels and for good
performance. On most benchmarks it is approximately 2x faster.

Co-Authored-By: René Meusel <rene.meusel@rohde-schwarz.com>
Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants