New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom storage performance tuning quesitons #7657
Comments
Hi @tjjh89017 It appears the mmap storage in libtorrent 2.0 indeeds performs worse than the There are quite a few knobs that can affect the transfer rate, the disk buffer size is one, but also making sure peers are saturated with piece requests (to cover the bandwidth delay product). There are heuristics around determining the number of outstanding piece requests, that also interfere with timeout logic. There's essentially a "slow-start" mechanism, where the number of outstanding piece requests is doubled until the download rate plateaus. The main tool to tune and optimize the disk I/O throughput is the stats logging that can be enabled in libtorrent. There's this python script that interprets the resulting output: https://github.com/arvidn/libtorrent/blob/RC_2_0/tools/parse_session_stats.py To enable the logging:
the resulting log file can then be parsed by Regarding The current disk I/O backend in libtorrent 2.0 (mmap) does not have its own read buffer, its intention is to rely on the block cache. The patch I'm working on for a multi-threaded pread/pwrite backend also doesn't have a read-buffer per-se. It has a store buffer while blocks are waiting to be flushed to disk. It's possible that one performance benefit libtorrent 1.2 has is that it actually holds a disk buffer in user space, implementing an ARC cache. It might save syscalls when pulling data from the cache. |
I read this PR already.
I think I should try this first.
Ok, it seems in my case it will be hard to control
I also changed my implementation to pread/pwrite without ARC cache.
but i think if I need to implement ARC on my own, it will take too much effort. In our today's testing, in NVMe SSD with 10Gbps network, pread/pwrite is faster than mmap way. Thank you! |
Hi @arvidn And hope you will not be affected by those people with "strong" wording in the PR. |
Please provide the following information
Hi,
I wrote a program named EZIO, which is a disk/partition filesystem deployment program which is using Libtorrent custom_storage feature.
We faced some issues that old ver of EZIO with Libtorrent 1.x is always faster than the current EZIO with libtorrent 2.0.
Even I change mmap to pread/pwrite to avoid some page fault, it really increase some performance, but still slower than the old one somehow. (Our scenario is usually reading the pieces from disk, and cache missed.)
e.g. in our first release with Clonezilla, we have some journal paper with almost full 1Gbps speed to deploy the 32 machines.
But right now, we could only get half performance in the same environment. (by the way, another weird thing, Multicast deployment is much faster than the case in few years ago.)
In my environment, I could always prove BT is much better and more stable than Multicast in deployment usage.
Clonezilla team and I just guess, maybe it's related to the cache model. In libtorrent 1.x, libtorrent will handle the cache, and it may suggest the pieces to other peers.
But libtorrent 2.0 will not be aware of OS cache (mmap or pread/pwrite.)
It will depend on the custom storage buffer implement.
Our current buffer implement is a fixed length array (16MB), and split those into 16KB for an unit.
In our testing, we found our cache will not exceed to even the half.
disk_buffer_holder
is always released immediately.Does that mean libtorrent doesn't always suggest pieces that is in the cache?
I wanted to do some "read-ahead" and put it into the suggest pieces pool.
I found a API that I could ask torrent_handle to read some specific pieces with
read_piece
and sugest to other peers withsuggest_read_cache
.But I want to check about the timing when store_buffer or
disk_buffer_holder
will be released.especially, read cache timing.
If I have to read the alert, and then the
disk_buffer_holder
will be released from the cache, that means I need to keep my eyes on the alert to make sure the cache will not be occupied by "read-ahead" mechanism.To make sure the process could be faster, if the bottleneck is not HDD random read speed, but some gaps that libtorrent doesn't read the data, and disk is idle in some timing.
Although, I know it may already be the fastest performance for the HDD without any tuning. In the past few years, sometimes we left all peers in chaos, and that will make the fastest deployment ever, but we don't know the reason.
We also tested with NVMe SSD and 10Gbps network. It went to about 200~300MB/s which we thought it could be faster.
Of course, we always try to keep EZIO as simple as possible because Clonezilla team and I all have a full-time job.
This side project is just a side project, we will not put too much effort, only get little increasing, and raise the maintainence difficulty.
Is there any suggestion to profile or benchmark the bottleneck?
Thank you!
And thank you again for this awesome project and your contribution, it helps us a lot that we don't need to implement BT from scratch
Date
libtorrent version (or branch): RC2.0
platform/architecture: Linux / amd64 (Debian sid)
compiler and compiler version: gcc 11
please describe what symptom you see, what you would expect to see instead and
how to reproduce it.
The text was updated successfully, but these errors were encountered: