WIP: Wide int ergonomic #544

PabloLION · 2022-09-26T06:03:07Z

Moved from PR #541. For issue #246

Context

I tried to fix the Wide Arithmetic Support (128bit uint) #236 in WIP: Wide arithmetic #541 but found it very different from what I wanted to do. So I created a new branch instead of reverting all commits of it.
This PR is based on Multivalue-opcodes #543, after which addw, expw, mulw, divmodw should be implemented with MultiValue.
This PR focus on the ergonomic features like Operator Overloading, WideSummation and WideRatio. (names TBD)
I'm trying to finish this ASAP, with test, as our team needs it urgently.

Tasks

Add a new class WideInt similar to MultiValue and use it in "wideexpr" to substitute MultiValue. For a variable w:WideInt, w.high() load the high part of the wide integer from scratch space, w.low() the low part. WideInt has 3 constructor overloads (see next chapter)
Update WideExpr to return a WideInt (in scratch space)
Operator Overloading II: Use + ( __pos__) to convert a Int to a WideInt (with high word = 0)
Convert / cast a WideInt to Int. Fail when it has a non-zero high word. This is a shortcut for w.low.load() if w.high.load()==0 else ERROR
More Operator Overloading for Int OP WideInt and WideInt OP WideInt, where OP can be +,-,*,etc. This was done here and I'll update them(TBD) if allowed.
Add ergonomics ModW, MinusW, WideAdd,WideMul, WideSummation and WideRatio to process calculation between WideInts
DocStr with small examples for docs (I think the doc is auto-generated by the doc string)
An example in "/examples"

`WideInt` constructors

WideInt(num: int) -> WideInt
WideInt(expr_high: Expr, expr_low: Expr) -> WideInt
WideInt(expr_high: ScratchSlot, expr_low: ScratchSlot) -> WideInt

Create a new WideInt.

For int:
Pass the int as the only argument. For example, WideInt((1<<64)+1).
For two Expr() objects:
Pass two Exprs. They must be of type TealType.uint64 For example, WideInt(Int(1),Int(1)).
For two ScratchSlot() objects:
Pass two ScratchSlots. For example, WideInt(ScratchSlot(), ScratchSlot()).

To be decided

If the names WideInt, WideExpr,WideAdd,WideMul are good? Alternatives are: WideUint64, MultiExpr, AddWW, MulWW respectively
If using + ( __pos__) to convert a Int to a WideInt (high word = 0) is a good practice?
Too much constructor overloading? (will explain below)
Should I change DivW so it accept params of WideInt (I think not, but add a WideDiv instead)
Should I touch the codes here to perform the universal operator overloading.

pyteal/pyteal/ast/expr.py

Lines 72 to 75 in 53c8e00

    
           def __add__(self, other): 
        
               from pyteal.ast.naryexpr import Add 
        
               return Add(self, other)

PabloLION · 2022-09-26T06:13:54Z

Too much constructor overloading

In the last commit a11490a, I created a class WideInt which can be initiated / constructed from Int or Expr,Expr.

However, when moving to the next task, where I want mulw to return a WideInt, I saw that I might need a new overload with two scratch slots.
I think this would be too much and would make the code hard to understand. But it seems a must if we want to use MultiValue.

My solution was to use some @classmethod like fromInt, fromExpr, fromExprExpr, etc to construct a WideInt but this will make the creation process harder. I don't have a satisfying solution... Please help 🥲

Should I use multiple dispatch @dispatch in this case?

PabloLION · 2022-09-26T09:33:47Z

I found some ugly solution with plum 🤪 which passes pytest but fails flake. I try to move back to @overload

PabloLION · 2022-09-26T20:52:52Z

I agree on __invert__ to be used by BitwiseNot. I now use __pos__ instead, to convert uint64 to uint128.

PabloLION · 2022-09-27T11:49:50Z

I'm sorry that this WIP PR is progressing slowly due to the inactive support. Our team decided to pause here.
The unfinished part is not that hard with these ideas below in mind. I hope someone can finish it or I would come back to finish it (much) later. I can offer a VSCode live share when help is needed.

Assume that we always start with an empty stack and a bunch of WideInt (scratched variables), to save the "load / store" costs, we only load what's needed. So we should use more TealSimpleBlock. I suggest to create a new class taking __teal__ as a parameter of __init__.
Notations: Note the list of WideInt for summation/production as W1,W2,...,Wn, their high words are h1,h2,...,hn, and l1,l2,...,ln for their low words. Note S_k as the partial summation of the first k terms.
The .low and .high access the corresponding part of a WideInt. hk=Wk.high, lk=Wk.low.
Let M = 2**64, then W1 = h1*M + l1, W2 = h2*M + l2, ..., Wn = hn\*M + ln.
A IntSum: *Ints => WideInt. Note N as the length of the input. Run in each iteration addw then uncover {N-1}. Starting with stack i1, i2, ..., iN, after k iterations, we have the stack i_(k+2), i_(k+2),...,i_(N), carrier2, carrier3, carrier4,... carrier_(k+1), S_(k+1).low. After (N-1) iterations the stack becomes carrier2, carrier3, carrierN, S.low. Store S.low, then run add (N-2) times to get S.high. Store S.high and we have both parts of the final sum S.
In N-ary WideSum, we can add the low bits together, and the high bits together, it should have different __teal__ than a bunch of binary WideSum. The N-ary S=WideSum(W1,W2,...,Wn) can be split to 2 parts;
a. Load l1,l2,...,ln to stack. Perform L:WideInt = IntSum(l1,l2,...,ln). Now we have low word of the final result S.low = L.low. Store it and leave L.high on stack.
b. Load h1,h2,...,hn to stack. Perform H:WideInt = IntSum(L.high,h1,h2,...,hn). Overflow if H.hi != 0. Now we have high word of the final result S: S.high=H.low.
For production, note P_k or Pk as the partial product of first k factors. Note i_k or ik as the k-th Int (factor).
N-ary IntProd: Use a loop of multiplyFactors from "widemath.py".
Binary WideProd produces product of two WideInt, and only one of them can have non-zero high word. (otherwise the product overflows 128bit uint.) So we can cast the one with high word == 0 to an Int. Then use multiplyFactors from "widemath.py": [..., A*C+highword(B*C), lowword(B*C)]
To perform a N-ary WideProd, at most 1 of the multiplicands can have non-zero high word (overflow otherwise). Say it's W1. Then the product P is:

$$ \begin{align} \begin{aligned} P &= W_1\cdot W_2\cdot \ldots \cdot W_n \\ &= (h_1\cdot M+l_1) \cdot (0\cdot M+l_2) \cdot \ldots \cdot (0\cdot M+l_n) \\ &= (h_1\cdot l_2\cdot l_3\cdot \ldots \cdot l_n) \cdot M + (l_1\cdot l_2\cdot l_3\cdot \ldots \cdot l_n) \\ &= h_1\cdot (l_2\cdot l_3\cdot \ldots \cdot l_n) \cdot M + l_1\cdot (l_2\cdot l_3\cdot \ldots \cdot l_n) \end{aligned} \end{align} $$

If we regard (l2*l3*...*ln)==(W2*W3*...*Wn) as H, and W = W1. The algorithm here is same as W*H.
So we only need IntProd and a binary WideProd to implement the N-ary WideProd.

zejoeh · 2022-10-03T12:40:30Z

Just chiming in to say this is really important work, there are many scenarios in which you cannot safely multiply two integers on Algorand since e.g. base units are capped at the entire 64 bits, so the lack of proper 128 bit support shoe-horns you into using byte arrays for everything (or writing raw TEAL), even when they're otherwise overkill. For my use case it was enough to create my own Expr subclasses, but it certainly is not an ideal developer experience to need to do such things.

PabloLION · 2022-10-04T06:17:19Z

@zejoeh Thanks for your appreciation, although this is not the priority of the team right now (see #543 (comment)). I tried to achieve the the purpose you said with this PR (didn't finish it due to the time limitation).
Doing it dirty yet quick, I've reduced the decimals on everything so 128-bit uint is not that heavily required. But still, I had to do similar things like writing some Expr or Seq to get the functionalities I needed.

Speaking of Seq, it comes to me that more ergonomic features are needed, like the following CachedSeq. I'm thinking to merge them to pyteal-utils or to this repo after this intensive period.

class CachedSeq:
    cached: bool

    @overload
    def __init__(self, *exprs: Expr) -> None:  # overload_1
        pass

    @overload
    def __init__(self, exprs: list[Expr]) -> None:  # overload_2
        pass

    def __init__(self, *exprs):  # type: ignore : type copied from seq.py
        if len(exprs) == 1 and isinstance(exprs[0], list):
            exprs = exprs[0]
        self.expr = Seq(*exprs)
        self.cache_scratch = ScratchVar(self.expr.type_of())
        self.cached = False

    def load(self):
        if not self.cached:
            self.cached = True
            return Seq(self.cache_scratch.store(self.expr), self.cache_scratch.load())
        return self.cache_scratch.load()

zejoeh · 2022-10-04T09:30:14Z

@PabloLION Since you are in control of decimals, maybe you have a similar situation as I did. I use the following much-less-general solution, but given the amount of work you put into this I'm sure you already have something similar or better for what you need:

class MulwDivw(pt.Expr):
    """Wide multiplication of two 64 bit uints a and b followed by wide division by a 64 bit uint c."""

    def __init__(self, a: pt.Expr, b: pt.Expr, c: pt.Expr):
        super().__init__()
        self.a = a
        self.b = b
        self.c = c

    def __teal__(self, options: "CompileOptions"):
        # mulw of a and b
        mulw_start, mulw_end = pt.TealSimpleBlock.FromOp(options, pt.TealOp(self, pt.Op.mulw), self.a, self.b)
        # divw of result and c
        divw_start, divw_end = pt.TealSimpleBlock.FromOp(options, pt.TealOp(self, pt.Op.divw), self.c)
        mulw_end.setNextBlock(divw_start)
        return mulw_start, divw_end

    def __str__(self):
        return f"(MulwDivw ({self.a},{self.b},{self.c}))"

    def type_of(self):
        return pt.TealType.uint64

    def has_return(self) -> bool:
        return False

For me this was enough to replace a lot of byte array operations in intermediary computations after verifying that the results all still fit in 64 bits, without having to handle the multiple return values problem. Using c=2**64-1 it also lets you anticipate and handle overflow at runtime with some branching.

Of course with only this I still need to make sure my domain is otherwise bounded by 64 bits (or use byte arrays where it can't be). It would be great to have native 16 byte (128 bit)- and 32 byte (256 bit) arrays instead - a man can dream. 😄

PabloLION · 2022-10-04T16:31:57Z

@zejoeh I see this is correct and useful because divw will conveniently "Fail if C == 0 or if result overflows". And actually I don't have a similar implementation yet, although this is my most wanted functionality. Thank for sharing. But in only one occasion I still need 128bit add. So I'm still planning to finish this later. The difficulty, to be honest, is allow user to use the 128-bit uint variable at any time, so scratch slot seems inevitable to me.

I was implementing WideAdd (two 128-bit uint => one 128-bit uint) with a lot of scratch slot. After seeing your code I found maybe my method is way less performant than yours.

With my plan, to achieve the same calculation, with WideMul then WideDiv. The generated TEAL would first convert two 64-bit uint to two 128-bit with high word==0 (opcodes: int 0x2, storex4) , then load them (opcost 4) and run a 5-opcost method between 2 128-bit. These sums up to 15 opcost. The divide procedure should be another 15-ish opcost, totalling up to 30 opcost.
In comparison, your MulwDivw has a opcost 5. Pessimistically thinking, seeing your code added more tasks for this PR 🤪

Also FYI, it's already possible to do 512-bit calculation with Byteslice (docs at https://pyteal.readthedocs.io/en/stable/arithmetic_expression.html#byteslice-arithmetic)

PabloLION added 16 commits September 24, 2022 20:59

refactor: lint

f141687

feat: add operator Addw with test

4ae1973

ref: split multiexpr to a new file

d2ed2a1

ref: lint

5f60659

test: add an invalid test case for addw

345a71e

ref: rename multiexpr to wideexpr

0a2cdeb

ref: rewrite AddW with WideExpr

64ab039

feat: add mulw, expw, divmodw w/ their tests

a2f5085

ref: rename Divw to DivW

ba4fb9f

ref: move test_divw_invalid_version

fae9223

fix: hotfix

d04976d

ref: refactor 'bytes.py' to reduce complexity

7e8e478

ref: use Literal type in 'bytes.py'

07869ad

ref: lint 'maybe_test.py' with typing

75c163a

fix: typo

b9534dc

feat: WindInt from int and Expr,Expr

a11490a

PabloLION mentioned this pull request Sep 26, 2022

WIP: Wide arithmetic #541

Closed

PabloLION added 2 commits September 26, 2022 11:05

ref: move overload to FromInt and FromExprExpr

34b9b5a

ref: add plum-dispatch. (failed flake)

b2ecee5

PabloLION added 3 commits September 26, 2022 13:02

feat: finish WideInt constructor overload

f2dc706

feat: make WideExpr return a WideInt

31d5d1d

fix: typo

6b1a5a2

This was referenced Sep 27, 2022

Provide general-purpose wide arithmetic (128-bit) API #246

Open

Multivalue-opcodes #543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Wide int ergonomic #544

WIP: Wide int ergonomic #544

PabloLION commented Sep 26, 2022 •

edited

PabloLION commented Sep 26, 2022 •

edited

PabloLION commented Sep 26, 2022 •

edited

PabloLION commented Sep 26, 2022

PabloLION commented Sep 27, 2022 •

edited

zejoeh commented Oct 3, 2022

PabloLION commented Oct 4, 2022 •

edited

zejoeh commented Oct 4, 2022

PabloLION commented Oct 4, 2022 •

edited

	def __add__(self, other):
	from pyteal.ast.naryexpr import Add

	return Add(self, other)

WIP: Wide int ergonomic #544

Are you sure you want to change the base?

WIP: Wide int ergonomic #544

Conversation

PabloLION commented Sep 26, 2022 • edited

Context

Tasks

WideInt constructors

To be decided

PabloLION commented Sep 26, 2022 • edited

Too much constructor overloading

PabloLION commented Sep 26, 2022 • edited

PabloLION commented Sep 26, 2022

PabloLION commented Sep 27, 2022 • edited

zejoeh commented Oct 3, 2022

PabloLION commented Oct 4, 2022 • edited

zejoeh commented Oct 4, 2022

PabloLION commented Oct 4, 2022 • edited

PabloLION commented Sep 26, 2022 •

edited

`WideInt` constructors

PabloLION commented Sep 26, 2022 •

edited

PabloLION commented Sep 26, 2022 •

edited

PabloLION commented Sep 27, 2022 •

edited

PabloLION commented Oct 4, 2022 •

edited

PabloLION commented Oct 4, 2022 •

edited