Skip to content
Arvid Norberg edited this page Jan 15, 2017 · 7 revisions

goal

  1. Significantly lower the memory pressure libtorrent inflicts on systems
  2. Significantly improve efficiency of memory usage on a system running libtorrent (as in, using RAM more efficiently).
  3. hopefully simplify some of the disk I/O code

1 and 2 are achieved by using memory mapped I/O and using the kernel's page cache as the cache. The kernel can evict pages and flush dirty pages at its leisure.

non-goals

  • reduce amount of copying of memory (or implement zero copy, where page-cache memory is passed directly into a socket)

features

copy file data into anonymous memory

  1. Linux tries harder to not evict anonymous pages than file-backed page cache pages. We don't want the kernel to evict anything unless we have written it to a file-backed dirty page, or read it from file-backed memory mapped page. Other memory is either in a socket send buffer to be sent (and will be accessed in the near future) or in a write queue to be copied to a file-backed page.
  2. It is believed that a significant portion of client connections are encrypted in some form, requiring buffers read from disk to be mutated (encrypted) before hung onto a socket send buffer.
  3. Since the disk cache will be stored in file-space (and not piece-space), buffer stitching will need to occur when reading. It would be complicated to make the disk interface return a (potentially very long) list of buffers to be hung onto the socket send queue.
  4. end-game-mode requires receiving the same block from multiple peers, this special case would also require receiving into separate buffers than the page cache.
  5. This means still maintaining a disk buffer pool allocator and a very similar (if not identical) disk_interface for peers and torrents to interact with the disk I/O subsystem.

multi-index write job queue

In order to be able to serve blocks to other peers before we've written them to the page-cache, make the write job queue indexable by its (torrent, piece, block)-triple. Design the job queue with this in mind.

Write jobs must stay in the queue, or at least be indexable and available for cache hits while it's being copied into the page-cache as well. If this feature is not provided, guaranteeing ordering between a write and a subsequent read would be very difficult, given more than one disk thread.

storage customization point

The new storage customization point could be the disk_interface interface. This would still let systems not supporting memory mapped I/O easily implement lighter-weight I/O subsystems (android for instance). This would mean removing the storage_interface customization point.

There should probably be a simple, single-threaded, implementation of disk_interface that can be used for embedded or old systems (lacking mmap() support or 64 bit virtual address space). Ideally entirely implemented with the standard C library.

I/O threads

  1. Use a small number of threads dedicated for write jobs (probably no more than 2 or 3).
  2. write threads mmap() files and copy block buffers into file buffers (which is a scatter operation, based on the torrent file layout)
  3. The file_pool may need to be modified to also handle mapped views of files. And to adjust the file abstraction to be appropriate for mmap and MapViewOfFile. This probably means file descriptors and HANDLE.
  4. There should be a dynamically sized pool of reader threads. Whenever a read thread has been idle for a certain amount of time, the pool size is decremented. If all read threads are busy, and the number of threads is below a threshold, spawn a new one to handle the new job.
  5. read threads stitch together memory from multiple locations (files) into a single bittorrent block (in piece-space). This is a gather operation, the inverse of the write thread's scatter operation.
  6. Before attempting to post a read job to a thread, see if the write queue can satisfy the read.

error handling

The actual copy operations in the read and writer threads should handle signals. Specifically SIGBUS and SIGSEGV. These may be issued if the disk is full. On posix systems, this could probably be done by installing a signal handler _just for those threads (this is presumably hairy and may involve running pthread_sigmask() is all other threads). On windows we could use __try/__catch. This appears to be working in try_signal.

challenges

1. We would still need the concept of fence jobs to allow for jobs that cannot run concurrently with other any other access to a torrent's storage. For instance:
  1. renaming files
  2. moving files
  3. generating resume data (stat()ing files)
  4. releasing file handles and unmapping storage
  1. If disk_interface is the new customization point, how are torrents kept track of? The current storage_interface is instantiated once per torrent, letting the implementation track state like save-path, stat-cache, partfile instance, file priority (to decide whether to direct accesses to the partfile or not), and possibly some other settings. Perhaps the interface would have to be extended to explicitly be notified of when torrents are added and removed.