Generalised version of improved fp2 #212

jon-chuang · 2021-02-06T23:52:55Z

Related to #207

The main contribution of this PR is to get rid of multiple redundant functions, and to unify all mul by small const methods into default implementations.

Results vis a vis pre #207:

Often, we get speedups for free (for instance for bw6). I'm still investigating why this is.

mnt4_298's residue is G2 basefield non-residue is 17, so it's not worthwhile.

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (master)
Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer

jon-chuang · 2021-02-07T00:04:30Z

@ValarDragon @Pratyush

jon-chuang · 2021-02-07T00:40:06Z

I think to really generalise this, we need to define some way to define multiplications with small integers for fields. Or, we should have some way to indicate that a field element is really small. This should be some compile-time data, to indicate that a particular field element is a small constant.

However, the threshold for how small would be considered small would depend on the number of limbs in the field.

Further, this only makes sense for constant data. So this is sad. However, we can use const generics to achieve this...

Nonetheless, we actually don't need so much generality, so I will investigate tweaking the concrete FpN impls to use more efficient autogenerated versions of mul by residue if we make them available.

Pratyush · 2021-02-07T03:27:22Z

What about adding a SmallValue associated type to Field?

pub trait Field: Mul<<Self as Field>::SmallValue, Output = Self> {
	...
	type SmallValue: Neg + Mul<Self, Output = Self> + Into<Self>;
}

We can use it as follows:

// Impl for prime fields FpN:
impl Field for FpN {
	type Smallvalue = i8;
}

// Impl for quadratic extension fields:
impl Field for QuadExtField {
	type Smallvalue = Pair<Self::BaseField>;
}

pub trait QuadExtParameters {
	const NONRESIDUE: Pair<Self::BaseField>; // No need for separate NONRESIDUE_SMALL
	// No need for mul_by_nonresidue, because it's subsumed by the trait bounds above
}

pub struct Pair<F: Field> {
	// The coordinates can be either small or full-sized.
	c0: Either<F, F::SmallValue>, 
	c1: Either<F, F::SmallValue>,
}

(Similarly for cubic ext field)

jon-chuang · 2021-02-07T03:37:21Z

Hmm, that was something that crossed my mind, but I wanted something that could be dropped in so that one wouldn't have to rewrite any formulas. However, this seemed too complicated. Do you think your idea might have any advantages over what this PR does?

Hence, I just resorted in the end to defining a specific method, mul_by_i8, where the integer is meant to be constant, or literal, so that llvm can optimise out the branches and loops.

Do you know if there is any methods apart from nonresidue muls where mul_by_i8 could also be deployed? Meaning, places that could benefit from a generic mul_by_small_constant type function?

jon-chuang · 2021-02-07T03:40:47Z

I should probably write some tests for mul_by_i8.

Pratyush · 2021-02-07T03:47:40Z

The benefit of the foregoing approach is primarily that it enables multiplication by arbitrary small field elements cheaply. For example, one could use it also to replace the mul_by_a methods in SWParameters, and similarly for TEParameters.

It would also reduce some complexity in writing new curves and fields. For example, right now, when implementing a new extension field, one could forget to specialize the mul_by_nonresidue method, resulting in slowdowns that are difficult to trace. By going this way, there's no way to forget. (similarly for mul_by_a)

We could also use it for #209

Overall it would allow us to remove many unseemly hacks from the codebase =)

jon-chuang · 2021-02-07T03:52:43Z

Hmm, but do we have the same idea of what mul_by_small_element entails?

There are two possibilities essentially:

the double and add method used in this PR
an improvement over montgomery mul can multiply field elements by small numbers which are too large for method 1.

In my mind, it makes sense to define these as two separate functionalities, and one can already create these functions generically in the PrimeField trait or Fp struct, for instance.

Pratyush · 2021-02-07T03:57:40Z

There are two possibilities essentially:

the double and add method used in this PR

an improvement over montgomery mul can multiply field elements by small numbers which are too large for method 1.

Why not have a threshold? For small enough numbers, we use custom addition chains, and then after that we use the specialized montgomery mul (which we can add at a later date)

jon-chuang · 2021-02-07T04:03:54Z

Well, determining that threshold is probably not that straightforward. I would rather let the downstream user decide which method they'd like to deploy, or if they'd like to set a threshold.

Btw, regarding mul_by_a, I noticed that

#[inline(always)]
pub fn sub_noborrow(x: &mut [u64; 6], other: &[u64; 6]) -> bool {
    let mut borrow = 0u8;

    unsafe {
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[0], other[0], &mut x[0]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[1], other[1], &mut x[1]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[2], other[2], &mut x[2]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[3], other[3], &mut x[3]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[4], other[4], &mut x[4]);
        borrow = core::arch::x86_64::_subborrow_u64(borrow, x[5], other[5], &mut x[5]);
    }

    borrow != 0
}

pub fn zero() -> [u64; 6] {[0u64; 6]}

pub fn f(x: &mut [u64; 6]) {
    sub_noborrow(x, &zero());
}

results in non-trivial assembly for f. This means that rust probably can't optimise the zero additions away, sadly.

Hence I will also submit a PR to rewrite some formulas in adding results of mul_by_a, when a is zero.

OK, short weierstrass already does this for the addition formula.

Pratyush · 2021-02-07T04:08:50Z

Well, determining that threshold is probably not that straightforward. I would rather let the downstream user decide which method they'd like to deploy, or if they'd like to set a threshold.

For the case of PrimeFields we can support that use case by also requiring Mul<i64, Output = Self>, where which uses the montgomery mul, while Mul<i8, Output = Self> uses addition chains, and maybe delegates to the i64 impl at some threshold.

EDIT:

Actually you don't even need separate cases, you just defined Fp::SmallValue = SmallInt, where SmallInt is defined as

pub enum SmallInt {
	UseAdditionChain(i8),
	UseSpecialMontMul(i64),
}

This way the choice is left to the user.

jon-chuang · 2021-02-07T06:04:06Z

You are indeed right that having a faster mod mul for u64, let's say, would already be of great value. However, it unfortunately does not fit within the Montgomery representation framework, for there even small values in biginteger space are large values in montgomery rep space.

Hence, I do not have any straightforward idea on how to take advantage of small values that are not suitable for the double and add chain via mul_by_i8.

I have confirmed some positive results for converting some mul_by_a to use mul_by_i8:

Affected functions are double, mul, and deser (subgroup check).

jon-chuang · 2021-02-08T03:13:55Z

Turns out I made a mistake in implementation, by default, I used naive mul_assign with Self::Residue rather than mul_base_field_by_residue. Now, it should be back to before for tower fields.

jon-chuang force-pushed the quadfield_autoconst branch 2 times, most recently from 123f692 to 4f711ab Compare February 7, 2021 00:49

jon-chuang added 2 commits February 7, 2021 08:52

init

5619a49

merge

3163912

jon-chuang force-pushed the quadfield_autoconst branch from c35c023 to 3163912 Compare February 7, 2021 00:53

jon-chuang added 2 commits February 7, 2021 09:00

remove extraneous.

b8559fe

Improve Fp6_3on2 with generic non_residue

1ae82dd

jon-chuang requested review from ValarDragon and Pratyush February 7, 2021 02:56

use mul_base_field...

184d649

minor correction

bcef957

jon-chuang mentioned this pull request Mar 11, 2021

Support operations over small field elements #225

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalised version of improved fp2 #212

Generalised version of improved fp2 #212

jon-chuang commented Feb 6, 2021 •

edited

jon-chuang commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 •

edited

Pratyush commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021

Pratyush commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

Pratyush commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 •

edited

Pratyush commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

jon-chuang commented Feb 8, 2021

Generalised version of improved fp2 #212

Are you sure you want to change the base?

Generalised version of improved fp2 #212

Conversation

jon-chuang commented Feb 6, 2021 • edited

jon-chuang commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 • edited

Pratyush commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 • edited

jon-chuang commented Feb 7, 2021

Pratyush commented Feb 7, 2021 • edited

jon-chuang commented Feb 7, 2021 • edited

Pratyush commented Feb 7, 2021

jon-chuang commented Feb 7, 2021 • edited

Pratyush commented Feb 7, 2021 • edited

jon-chuang commented Feb 7, 2021 • edited

jon-chuang commented Feb 8, 2021

jon-chuang commented Feb 6, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

Pratyush commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited

Pratyush commented Feb 7, 2021 •

edited

jon-chuang commented Feb 7, 2021 •

edited