New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVEMENT] Investigate performance bottleneck in v1 data path #8436
Comments
This is potentially the 3rd bottleneck:
|
We already know the input |
Agree with this idea. I will give it a try to see if it can improve the read speed |
Side track a little bit. Do you think there is a danger with integer overflow when we keep increasing the counter over here https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/pkg/replica/revision_counter.go#L143 ? If we keep writing long enough, it might become negative |
It is |
you are right, if we are writing with speed 50k IOPs, it would take 5849424 years to go overflow. None of us can see that day :))) |
Update: a tiny small improvement to the revision counter logic Current implementationWe are converting the int64 to a string then a slice of bytes https://github.com/longhorn/longhorn-engine/blob/be5e072e02cd959ab371a04264957624223b01fb/pkg/replica/revision_counter.go#L45-L46 The Modification:Using LittleEndian to encode and decode the int64: func (r *Replica) writeRevisionCounter(counter int64) error {
if r.revisionFile == nil {
return fmt.Errorf("BUG: revision file wasn't initialized")
}
copy(revisionCounterBuf, int64ToBytes(counter))
_, err := r.revisionFile.WriteAt(revisionCounterBuf, 0)
if err != nil {
return errors.Wrap(err, "failed to write to revision counter file")
}
return nil
}
func int64ToBytes(n int64) []byte {
bytes := make([]byte, 8)
binary.LittleEndian.PutUint64(bytes, uint64(n))
return bytes
}
func bytesToInt64(bytes []byte) int64 {
return int64(binary.LittleEndian.Uint64(bytes))
} ResultA small improvement in the write performance maybe 1-2%. |
@derekbit Looks like the performance gets worse. So we will not do this:
|
@shuo-wu This is the CPU usage of |
As dissed with @shuo-wu in the sync up. We rerun some of the Broadcom tests. The setup is the same setup in our initial Broadcom report in SUSE data center Case 1: In a CPU/RAM limited constraints
Case 2: In no CPU/RAM constraint and workload are NOT rate-limitedWith master-head engine:
With the PR:
Conclusion
|
Btw, this article is very helpful in understand a common mistake in go concurrency https://eli.thegreenplace.net/2019/go-internals-capturing-loop-variables-in-closures/ |
Update: There is potentially another bottleneck in the read flow in the replica. For every read, we have to make a systemcall |
Good catch! I am fine with using the cached size. The snapshot immutability should be guaranteed by the ctime/checksum mechanism |
After testing, the snapshot file caching idea doesn't seem to improve performance in practice even though sounds good in theory. CPU usage is also relatively the same. Let's abandon it. Sorry for the noise. |
Pre Ready-For-Testing Checklist
|
I don't think there is much danger here, especially around compatibility, as the changes are all self-contained within the engine. On the other hand, connection count will increase (and possible CPU utilization as well) as a tradeoff for the performance. Maybe that's surprising in a patch update? I lean slightly towards backporting, but I'm also fine with the gains being associated with a new minor version of Longhorn. I'll defer to whatever you decide. |
Test Plan:
|
Verified on master-head 20240508
The test steps Result Passed
v1.6.1 =========================
FIO Benchmark Summary
For: test_device
CPU Idleness Profiling: disabled
Size: 30G
Quick Mode: disabled
=========================
IOPS (Read/Write)
Random: 10,329 / 4,684
Sequential: 17,677 / 10,204
Bandwidth in KiB/sec (Read/Write)
Random: 413,252 / 141,958
Sequential: 461,723 / 170,358
Latency in ns (Read/Write)
Random: 431,773 / 553,568
Sequential: 357,682 / 547,949 master-head =========================
FIO Benchmark Summary
For: test_device
CPU Idleness Profiling: disabled
Size: 30G
Quick Mode: disabled
=========================
IOPS (Read/Write)
Random: 11,024 / 5,082
Sequential: 18,149 / 10,672
Bandwidth in KiB/sec (Read/Write)
Random: 415,402 / 137,600
Sequential: 445,611 / 167,325
Latency in ns (Read/Write)
Random: 409,800 / 525,464
Sequential: 342,326 / 525,112 Here is the summary table
Hi @PhanLe1010 |
Is your improvement request related to a feature? Please describe (馃憤 if you like this request)
Investigate performance bottleneck in v1 data path. So far we have identified 2 bottlenecks:
We are still investigating if there are more bottlenecks
Describe the solution you'd like
Resolve the bottleneck and increase the performance of v1 data path
Additional Context:
This is the test result on equinix metal m3.small.x86, ubuntu 22.04, 5.15.0-101-generic:
The text was updated successfully, but these errors were encountered: