Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a vector of integers instead of a string for faster computation #24

Open
faheel opened this issue Mar 12, 2018 · 19 comments
Open

Use a vector of integers instead of a string for faster computation #24

faheel opened this issue Mar 12, 2018 · 19 comments
Labels
enhancement New feature or request priority: high High priority

Comments

@faheel
Copy link
Owner

faheel commented Mar 12, 2018

Instead of having the number's magnitude stored in a string, it would be much more efficient, in terms of both time and space, to have it stored in a vector of unsigned 64-bit integers (std::vector<uint64_t>). That is, store a number's digits in base 264.

The following table lists how the number's magnitude will be represented in base 264:

Magnitude Representation Comment
0 {0} 0⋅(264)0
1 {1} 1⋅(264)0
2 {2} 2⋅(264)0
264 - 1 {18446744073709551615} 18446744073709551615⋅(264)0
264 {0, 1} 1⋅(264)1
264 + 1 {1, 1} 1⋅(264)0 + 1⋅(264)1
264 + 2 {2, 1} 2⋅(264)0 + 1⋅(264)1
265 {0, 2} 2⋅(264)1
550 {11670147153572883801, 4814824860968089} 11670147153572883801⋅(264)0 + 4814824860968089⋅(264)1
2128 {0, 0, 1} 1⋅(264)2
2192 {0, 0, 0, 1} 1⋅(264)3
2256 {0, 0, 0, 0, 1} 1⋅(264)4

The way how it could speed up performance is that, for example, when adding two BigInts, instead of adding their corresponding decimal digits together, their corresponding base 264 digits will be added, each digit in a single operation. Moreover, this addition (or any other arithmetic operation) of vectors can be vectorised by using SSE or AVX, which would further improve performance by a few folds.

Say, for instance, that you were adding two 5000 digit numbers. In base 10 they have 5000 digits, which means that it would take 5000 operations to add them together. Instead, if they were in base 264, they would have 260 digits, and would therefore require only 260 operations. Using AVX-512, vectors with up to 8 64-bit digits can be added together in a single operation. Which means that by using vectorized operations, two 5000 digit numbers could be added using just 33 operations.

@faheel faheel added enhancement New feature or request suggestion labels Mar 12, 2018
@surajumang
Copy link

The idea of using numbers in base 2^64 seems interesting. But the representation you have used is quite opposite of what is used commonly i.e Leftmost digit is the most significant one and rightmost digit is the least significant. Is there any reason behind this choice?

@faheel
Copy link
Owner Author

faheel commented Apr 2, 2018

@surajumang Since all arithmetic operations (except division) are done from right to left, it's simpler to store the right-most digit as the first digit in an array since it will be processed first. But this is just to make the code simpler. When written formally, the most significant digits are always written first, followed by less significant digits to the right.

@surajumang
Copy link

I would like to work on this but think it will require a complete overhaul of the existing codebase like redefining all the arithmetic operations. What do you say?

@faheel
Copy link
Owner Author

faheel commented Apr 3, 2018

@surajumang Yes, it would require a complete rewrite of all the arithmetic operators and a few other methods.

I'll be creating a new branch and will implement some basic stuff like converting from base 10 to base 264. Once that's done, I'll notify you and you can contribute to that branch.

@faheel
Copy link
Owner Author

faheel commented Apr 9, 2018

@SingleJourney English please! Google Translate isn't perfect :)

@OmriHab
Copy link

OmriHab commented Apr 9, 2018

Could you let me know when you've added that branch?
I would like to help contribute to the math functions using the new representation and anything else.

@faheel
Copy link
Owner Author

faheel commented Apr 10, 2018

@surajumang @OmriHab I've added the branch base-2-to-the-64. The changes made so far on this branch can be tracked by comparing the branch with master.

@OmriHab
Copy link

OmriHab commented Apr 10, 2018

Can i recommend using uint64_t from cstdint, instead of unsigned long long?
unsigned long long is promised to be at least 64 bit but not certainly exactly 64.
I assume that could create some problems if you use calculations that assume base 2^64.

@OmriHab
Copy link

OmriHab commented Apr 10, 2018

Alternatively unsigned long long can be used and just take into consideration that it isn't necessarily 64 bit.
E.g the maximum would be numeric_limits<unsigned long long>::max() instead of UINT_64_MAX

@surajumang
Copy link

if you are implementing any feature then probably you should create a issue and then implement that particular feature like first create issue Implement sum of BigInts in vectorized implementation and then sumbit PR for that particular issue. This way different people can work on it

@faheel
Copy link
Owner Author

faheel commented Apr 10, 2018

@OmriHab Good suggestion! I'll change the type to uint64_t.

@faheel faheel added priority: high High priority and removed suggestion labels May 14, 2018
@ankur54
Copy link

ankur54 commented Oct 3, 2018

Is the issue still open? If it is, I will start working on it.

@faheel faheel pinned this issue Jan 2, 2019
@DarkenedOrigins
Copy link

Hi, I see the issue as open but seeing as its been over a year I would like to touch base and ask what I should work on as I would like to help.

@faheel
Copy link
Owner Author

faheel commented Jul 13, 2020

@DarkenedOrigins If you're interested in working on it then I would be glad to help you by reviewing the code and helping with the logic.

You can start by creating a draft PR in which you can implement a utility function that takes a string representation of an integer and converts it to its base 2⁶⁴ integer vector representation.

If anyone else is interested in working on this too then they can contribute in the same PR.

@DarkenedOrigins
Copy link

Awesome, would you prefer that I create this in its own class called BigInt64 or i bolt it on to the current implementation

@faheel
Copy link
Owner Author

faheel commented Jul 14, 2020

@DarkenedOrigins The existing class can be renamed to BigIntString and then you can create a new BigInt class.

@DarkenedOrigins
Copy link

inorder to be less destructive I made a new class called BigInt64. My implementation actually uses the string based BigInt internally. The only problem I had was difficulty in compiling since I have never used cmake before nor have I worked with such fragmented code where each implementation almost has its own file. Some help would be greatly appreciated.

@faheel
Copy link
Owner Author

faheel commented Jul 16, 2020

inorder to be less destructive I made a new class called BigInt64

Good call! We can do the renaming later.

The only problem I had was difficulty in compiling [...]

Yeah, the documentation related to development is not very good at the moment. I'll add details regarding how and when to create a new header file, how to write tests for it, and how to compile and run the tests using CMake and CTest. If you need help on anything more specific, you can leave a comment in PR #58 that you opened and @mention me in it. I'll get back to you whenever I can.

@szdytom
Copy link

szdytom commented Feb 10, 2021

use std:bitset<>, long long calculations are much slower than 2 ints on some CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: high High priority
Projects
None yet
Development

No branches or pull requests

6 participants