Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom implementation of data storage #18

Open
yegor256 opened this issue Oct 19, 2022 · 16 comments
Open

custom implementation of data storage #18

yegor256 opened this issue Oct 19, 2022 · 16 comments

Comments

@yegor256
Copy link
Member

yegor256 commented Oct 19, 2022

At the moment, every time we call put, a new allocation in heap is happening (if I'm not mistaken). This may be very expensive, if we have many put calls with small data chunks (just a few bytes). Let's investigate how we can optimize this and then implement a new method. Before we do this, let's implement a performance test in order to measure the speed of current implementation. Then, compare new implementation with the current one.

@yegor256
Copy link
Member Author

@UARTman try this one, please

@UARTman
Copy link
Contributor

UARTman commented Oct 20, 2022

Acknowledged. Before I start writing perf tests, we need to make a couple of decisions.

First, the benchmarking instrument. We can either use Bencher, which will allow us to only measure basic performance, or we can use criterion, which is a more comprehensive solution that can draw graphs, supports benchmarks with inputs, and does some statistic analysis, but is a bit bulkier due to being more complex.

The second is whetner we use Rust's custom test harness feature. Enabling it will make the benchmarking code cleaner (by implementing a macro similar to #[test]), but it will require switching to nightly, since the feature is unstable. Not using it will mean separating all our benchmarks into what's essentially another crate, with all the API boundary-related pain this implies.

@yegor256
Copy link
Member Author

@UARTman I vote in favor of criterion and nightly.

@UARTman
Copy link
Contributor

UARTman commented Oct 20, 2022

I was leaning this way myself, so I'm glad we're on the same page. I'll start prototyping performance tests with those tools, as soon as I get a working proof-of-concept that benchmarks some operations, I'll link the branch here.

UARTman added a commit to UARTman/sodg that referenced this issue Oct 20, 2022
@UARTman
Copy link
Contributor

UARTman commented Oct 20, 2022

I have some bad news: even with the nightly features on, we'll still have to keep the benchmarks outside of our src tree. This, however, means we don't actually need nightly right now.

Another possible issue is that Sodg::add performance seems to degrade linearly with the amount of vertices already present in the Sodg. Whether this is actually an issue or not, I don't know, since even with 10000s of nodes, inserting a new one costs about 35-40 microseconds(µs). That's probably enough. With millions of nodes, it rises to milliseconds, though.

Regarding benchmarking framework choice, we definitely made the right one, because graphing and granular loops were definitely very useful.

@yegor256
Copy link
Member Author

yegor256 commented Oct 20, 2022

@UARTman I think it's only logical, to keep test code (benchmarking or whatever) outside of src. BTW, would be great to guarantee O(1) (or something close to it) cost of all operations. Now, it's O(log n), I believe, for most of them? Our graph will not grow forever. It will grow once, when the program is loaded to memory. Then, it will start "surging", going up and down in size, usually dancing around the original size.

@UARTman
Copy link
Contributor

UARTman commented Oct 20, 2022

In this case, it's probably best to count how many nodes a program uses (depending on its complexity), so that we know if we're dealing with hundreds, thousands, or millions of nodes. That will probably tell us if this is even worth optimizing.

BTW, I've got a question on the size of chunks you use in the Sodg::put. What is the median and the upper bound on a typical small-sized data chuck? Are we talking about basic integers and such? At which point should we switch to the allocations?

@yegor256
Copy link
Member Author

@UARTman a small program will have 5-10K nodes. A big one may have 100K+. I don't think we will ever reach the point of 1M nodes, no matter how large is the program. It's just a guess for now.

We will store anything in put, from single bytes up to large chunks of data (kilobytes). I don't think we'll have megabytes often. Maybe even never.

@yegor256
Copy link
Member Author

@UARTman any progress?

@UARTman
Copy link
Contributor

UARTman commented Oct 31, 2022

The last two weeks were very busy for me, so I haven't been able to work on the problem, but the exam week is over, so I'm returning to it. I do have an idea about what we can do about the unnecessary allocations, as well as the backlog of all the operations I haven't benchmarked yet.

The idea is to make Hex into an enum, either having a Vec<u8> for chunks of more than 16 bytes, or a [u8;16] and a usize denoting the size of the object inside. Essentially, if the chuck (plus its size info) fits into the space that a Vec would take, we just put it there. This will cut down on allocations while costing us almost nothing in terms of memory space.

To make it work properly, I would probably need to redesign a couple of APIs around Hex to work with slices instead of Vecs.

@UARTman
Copy link
Contributor

UARTman commented Oct 31, 2022

A remark on the current state of benchmarks: they're not particularly reliable right now, since they tend to produce pretty inconsistent results. So this is definitely something to look into later.

@yegor256
Copy link
Member Author

@UARTman if you put a heavy load on Sodg, I think any benchmark will give you reasonable data. Just run a billion operations and measure time (should be in minutes). Then, do the same after your refactoring. Obviously, if there is a positive effect, you will see it.

UARTman added a commit to UARTman/sodg that referenced this issue Nov 6, 2022
@UARTman
Copy link
Contributor

UARTman commented Nov 6, 2022

Please look at my basic re-implementation of Hex. The only significant difference in API is that bytes is now a method, not a field. All tests are passing.

@yegor256
Copy link
Member Author

yegor256 commented Nov 6, 2022

@UARTman where should I look? :) Please, submit a pull request

@UARTman UARTman mentioned this issue Nov 6, 2022
@yegor256
Copy link
Member Author

@UARTman any more progress?

@UARTman
Copy link
Contributor

UARTman commented Nov 23, 2022

@yegor256 For now, I've been stalling on this because there isn't anything obvious to do. I've tried to make benchmarking make sense and been mostly unsuccessful: the timing of, say, 10000 operations still changes by ~80% up and down between runs of the same code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants