Navigation Menu

Skip to content

Why is libvips quick

John Cupitt edited this page Sep 9, 2021 · 5 revisions

We have a How it works page which goes into some detail, but briefly:

  • Fully demand-driven: libvips doesn't process entire images in memory, instead images are streamed through your computer as a series of small regions. This reduces memory use.

  • Fast operations: The libvips primitives are implemented carefully and some use techniques like run-time code generation. The convolution operator, for example, will examine the matrix and the image and at run-time write a short SSE3 program to implement exactly that convolution on exactly that image.

  • Threaded image input-output (IO) system: Most image processing libraries have threaded operations. Each operation has code, generally using a framework like OpenMP, to allocate a section of each image to a thread. libvips instead puts the threading into the image IO system and gives each thread a separate (very light-weight) copy of the entire image pipeline to work on. This style of horizontal threading makes better use of processor caches and reduces locking.

  • Overlapped input and output: libvips is able to run the load, process and save parts of a pipeline in parallel, even though most image format libraries are single-threaded. It uses a set of threads for input and processing (which queue up on the load library), plus an extra background write-behind thread which runs whenever a set of output scanlines are completed.

  • libvips is (almost) tile-less: Most image processing systems split images into tiles for processing --- non-overlapping areas of pixels which can be cached and reused. Ensuring tiles do not overlap forces threads to continually synchronise, plus there needs to be special treatment of tile edges. libvips instead uses regions (rectangular areas of images which can overlap) removing a lot of housekeeping. It has a set of rules to try to keep region overlap (and therefore recomputation) to a minimum.

  • libvips is (almost) lock-less: Threads need to talk to each other to coordinate their work. On systems with a large number of threads, this can become expensive. Because libvips gives each thread a private copy of the entire image processing pipeline, it only needs one mutex on file read and one on file write. The whole of the rest of the system does not need any locking or synchronisation.

  • Variety of pixel formats: libvips supports 10 pixel formats, from 8-bit unsigned to 128-bit complex, and almost all operations can work on any format. This means that it can usually process data directly with no need to repack to another format for computation.

Clone this wiki locally