Skip to content

News_(2010)

John Cupitt edited this page Mar 29, 2017 · 1 revision

title: News (2010) permalink: /News_(2010)/

NOTOC

16 December 2010

7.24 is now officially done. We'll announce to the world on Monday, hopefully. What's New in 7.24 lists the major features of this release.

13 December 2010

I've added OpenCV to the Speed and Memory Use page. vips-7.24 just scrapes a win on a two-cpu machine, phew.

8 November 2010

I've got convolution with Orc going significantly quicker:

$ time vips --vips-novector im_conv wtc.v wtc2.v blur3x3.con real   0m7.956s user   0m7.810s sys    0m0.920s $ time vips im_conv wtc.v wtc2.v blur3x3.con real   0m4.881s user   0m2.280s sys    0m0.880s

So that's about 3.5x faster than C. It does array tiling, so you can have masks up to about 10x10 non-zero elements.

1 November 2010

The Orc work has now landed in trunk. We have vectorised versions of dilate, erode, conv and add. These all go about 4x faster.

$ time vips --vips-novector im_dilate wtc1bit.v wtc2.v morph.mor real    0m0.727s user    0m0.730s sys     0m0.290s $ time vips im_dilate wtc1bit.v wtc2.v morph.mor real    0m0.453s user    0m0.150s sys     0m0.260s

This is for a 4-connected dilate of a 10,000 x 10,000 8-bit mono image running on a 2 x Opteron 254 machine at 2.7 Ghz.

They only works for 8-bit unsigned char images though, and im_conv() only speeds up for small masks.

28 October 2010

vips trunk now loads FITS images using cfitsio. The FITS format is widely used in astronomy.

It should be able to load 1, 2 and 3 dimensional images in any format except 64-bit int, though I've only tested 2D 16-bit unsigned. It ought to load all the metadata and comments too. Any feedback very welcome.

18 October 2010

The Orc branch now has 2D convolution:

http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/

Orc is a wrapper around sse/sse2/sse3/mmx/altivec/arm/ti-dsp etc. You write a bit of code in a pseudo-assembly language and at runtime it "compiles" it down to real instructions for the exact host CPU, using whatever capabilities it has. The idea is that many apps have to maintain multiple code paths for various annoying media instruction sets, so this thing abstracts that away and lets you write your code just once.

im_conv() goes one step further and generates the Orc code at runtime, optimised for your matrix.

Here's VIPS doing im_conv on a 10,000 x 10,000 pixel 8-bit RGB image:

$ time vips im_conv wtc.v wtc2.v blur.con
real    0m10.833s
user    0m7.660s
sys 0m0.720s

That's with -O2 on a 2.7GHz Opteron 254. Now with Orc:

$ time vips --vips-orc im_conv wtc.v wtc2.v blur.con
real    0m7.117s
user    0m4.040s
sys    0m0.800s

A bit less than a 2x speedup in user time.

The code for im_conv() is here:

http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/libvips/convolution/im_conv.c?view=markup

The Orc code it generates for the convolution pass is something like:

 mulubw sum s1 c0
 loadoffb value s1 b1
 mulubw product value c1
 addssw sum sum product
 loadoffb value s1 b2
 mulubw product value c2
 addssw sum sum product
 mulubw product s2 c3
 addssw sum sum product
 loadoffb value s2 b1
 mulubw product value c4
 addssw sum sum product
 loadoffb value s2 b2
 mulubw product value c5
 addssw sum sum product
 mulubw product s3 c6
 addssw sum sum product
 loadoffb value s3 b1
 mulubw product value c7
 addssw sum sum product
 loadoffb value s3 b2
 mulubw product value c8
 addssw d1 sum product

Then for the round, scale, offset and clip pass:

 addssw t1 s1 rounding
 divluw t1 t1 scale
 addssw t1 t1 offset
 convsuswb d1 t1

7 October 2010

nip2 trunk now has a "View workspace as graph" menu item. It uses GraphViz to show the relationships between rows in the workspace. Here's a screenshot:

1

The graph updates as you edit the workspace. It's fun! And arguably a bit clearer than the normal workspace view.

However for large workspaces, graph display is less useful. Here's the graph it draws for a more complex workspace:

2

Useless! The graph view thing looks like a dead-end to me. To get nip2 to scale to larger workspaces I think it'll need some way to turn columns into functions and some restrictions on how columns can be linked together.

2 September 2010

We have a new OS X build system working, see Build on OS X. That's enough for 7.22 to be formally 'done', phew.

4 August 2010

I've put up a tarball of the new Windows build system on the supported download area. Build on windows has some instructions.

This system is based on jhbuild and can build the whole of nip2, including some pre-compiled dependencies and some that have to built from source, and some that need patching, in a single command. There are a couple of extra scripts which strip the build area down and make a setup.exe install as well.

Hopefully, we'll have a OS X binary built with almost the same system very soon as well.

31 July 2010

vips-7.23 has picked up it's first few features.

Tim Elliott has contributed im_vips2bufpng(), a function that can output a PNG image to a memory buffer. This is useful in web programming: a script can output an image directly to the client without having to go through the filesystem. He has a Ruby binding in preparation as well.

vips has a new open mode which opens via a disc file. At the moment, when you open a large image in a format which does not support random access (such as JPEG), the image is uncompressed to memory and then processed from that. So for example:

time vips im_rot180 wtc.jpg wtc90.png real 0m49.1s user 0m48.1s sys  0m0.5s peak RES 310mb

where wtc.jpg is a 10,000 x 10,000 pixel RGB image, and im_rot180 does a 180 degree rotate, is processed like this:

  • vips allocates a large memory buffer (300MB in this case) and runs im_jpeg2vips() into this buffer.
  • vips creates a "p" virtual image and runs im_rot180() from the memory image into the virtual image.
  • Finally, it runs im_vips2png() from the virtual image to the output file name.

The problem here is that memory is limited and images can be very large. vips-7.23 has a new open mode, "rd", which is used everywhere. This mode allocates a temporary disc file and uncompresses to that rather than to memory.

Here's what you get now:

time vips im_rot180 wtc.jpg wtc90.png real 0m51.8s user 0m48.1s sys  0m1.1s peak RES 10mb

So higher systime, because of all the extra disc IO, but now there is very little memory use and there is no longer any filesize limit. You can use the --vips-disc-threshold command-line flag and the IM_DISC_THRESHOLD environment variable to turn the temp file feature on and off, see the docs for details.

26 May 2010

vips-7.22 is finally being prepared. A late addition has snuck in: we've been sent a translation to German by Chris Leick. We've fixed some problems in the i18n system and we now get:

$ LANG=de_DE.utf8 vips --help
Aufruf:
  vips [OPTION …] - VIPS-Treiberprogramm

Hilfeoptionen:
  -?, --help                        Hilfeoptionen anzeigen
  --help-all                        Alle Hilfeoptionen anzeigen
  --help-vips                       VIPS-Optionen anzeigen
......

Very nice!

19 April 2010

The new threading system is now everywhere in SVN trunk and the old one has been removed.

It really helps screen painting in nip2. The old repaint system could not scale beyond 4 processors and in practice never got more than about 3x faster. The new system should scale just as well as whole image calculation. If you have 4 or more processors (eg. a 2-core chip with hyperthreading, for example), you should see a good improvement.

22 March 2010

libvips has a new thread scheduler which should help scalability on very-many-way machines.

The current version uses a conventional threadpool system. When libvips generates an image it creates a pool of worker threads and a manager thread. The manager loops over the tiles in an image assigning tasks to workers as they become idle. As each section of tiles completes it sends that batch of pixels off to disc (actually, it's a bit more complicated than this: the manager is also sending evaluation progress messages, and the workers are sending tile-complete messages to an extra background write thread).

This system is simple and flexible, but if you consider the sequence of synchronisation operations that are performed to keep the threads in step, rather inefficient. For each tile we do something like this:

  • the idle list is empty 99% of the time ... then a worker finishes a task
  • the worker locks the idle list, adds itself, and unlocks idle
  • the worker raises the 'idle' semaphore to wake up the manager thread
  • the worker blocks on its 'go' semaphore
  • the manager wakes up, locks the idle list, gets the worker, and unlocks idle again
  • the manager assigns a task, then raises the thread's 'go' semaphore to set it working
  • the manager sleeps again on the 'idle' semaphore

A semaphore operation involves a lock/unlock pair and either a wait or a signal on a condition variable, so in total the above list is 12 mutex operations and 4 condition variable operations (in fact the true picture is more complex than this). We have quite a complicated dance between workers and the manager.

Rather than having the manager pick tasks, what if workers did it themselves? Here's how the new system works (thanks to Christian Blenia for the idea):

  • a worker finishes a task
  • worker locks the assign-task mutex, runs a function to set new parameters, and unlocks
  • the task can be 'generate a tile' or 'job done, you can quit', or 'there has been an error, abort' or anything really
  • worker starts on the next thing

So that's two mutex operations per tile and no context switches, much simpler! (again, reality is more complex, workers actually send off two messages per tiles as well, one to update progress feedback, the other to trigger the background buffer write).

How great is the performance improvement? None at all in normal operation, sadly. On my two-core laptop I get:

$ time vips im_rot90 wtc.v wtc2.v
real    0m5.020s
user    0m2.040s
sys     0m2.860s
$ time vips --vips-wbuffer2 im_rot90 wtc.v wtc2.v
real    0m4.978s
user    0m1.920s
sys     0m2.570s

(wtc.v is a 10,000 x 10,000 pixel RGB image, the --vips-wbuffer2 flag turns on the new system, the improvement in systime is just noise)

However if you switch to tiny tiles (the default is 64x64 pixels) and huge numbers of threads, you can see an improvement. I get these times:

$ time vips --vips-concurrency=1024 --vips-tile-width=16 --vips-tile-height=16 im_rot90 wtc.v wtc2.v
real    0m10.155s
user    0m7.940s
sys     0m2.400s
$ time vips --vips-wbuffer2 --vips-concurrency=1024 --vips-tile-width=16 --vips-tile-height=16 im_rot90 wtc.v wtc2.v
real    0m6.067s
user    0m1.450s
sys     0m0.890s

So we've probably doubled the efficiency of the threading system, though unfortunately the threading stuff is not a bottleneck at the moment for most users.

On a 64-processor computer we did see a loss of linearity above 32 processors so perhaps this change will fix that. We've been offered some time on this monster machine in the next few months --- we'll be testing.

If you'd like to try the new code out, there's a tarball here:

http://www.vips.ecs.soton.ac.uk/development/7.21/vips-7.21.2.tar.gz

or you can build from SVN trunk.

17 March 2010

I've just finished a series of changes to libvips and nip2 which should really help image repaints. The whole system had become a bit wobbly, but it's all overhauled and should now feel much faster, look a lot prettier and be more reliable.

The big changes are:

  • nip2 always repaints images in sections following tile borders. If a tile is not yet ready, it defers that section of the paint action. This means it will never paint a black tile and then a moment later repaint with pixels.
  • Invalidation is handled cleanly. If you paint on an image, downstream caches, including image tile caches, are all marked invalid. When vips later tries to reuse one of these cached areas, it knows to drop cache and recalculate. Invalidation is used no more frequently than necessary.
  • The system for propagating changes through an image and its views has been rewritten and tuned. It should only need the minimum number of paint actions to update a view.

The new changes really help the nip2 paintbox. You can open a complex workspace, paint on one of the images, and it should keep up with your changes and never error or mispaint.

In another (small) improvement, libvips can now tell how many CPUs the host machine has and adjust concurrency for you automatically. You can override the detected setting with --vips-concurrency and IM_CONCURRENCY, as before.

4 February 2010

There's a branch in SVN for a vips that uses Orc:

http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/

Orc is a wrapper around sse/sse2/sse3/mmx/altivec/arm/ti-dsp etc. You write a bit of code in a pseudo-assembly language and at runtime it "compiles" it down to real instructions for the exact host CPU, using whatever capabilities it has. The idea is that many apps have to maintain multiple code paths for various annoying media instruction sets, so this thing abstracts that away and lets you write your code just once.

Here's VIPS doing im_add on a 10,000 x 10,000 pixel 8-bit RGB image:

$ time vips im_add wtc.v wtc.v wtc2.v
real    0m10.699sO
user    0m1.890s
sys     0m1.520s

Pretty quick, huh? That's with -O2 on a 2.7GHz Opteron 254. Now with ORC!

$ time vips --vips-orc im_add wtc.v wtc.v wtc2.v
real    0m9.668s
user    0m0.530s
sys     0m1.410s

About a 3x to 4x speedup in user time. You get more with a core2duo, it seems to have a better vector unit.

Orc is still rather experimental, so I don't want to spend too long rewriting operators yet. The next version should add addressing modes and then we'll be able to use it for things like im_conv().

The code for im_add() is here:

 http://vips.svn.sourceforge.net/viewvc/vips/vips7/branches/orc/libvips/arithmetic/im_add.c?view=markup

The ORC inner loop is:

 357   p = add_programs[IM_BANDFMT_UCHAR];
 358   orc_program_append_ds_str( p, "convubw", "t1", "s1" );
 359   orc_program_append_ds_str( p, "convubw", "t2", "s2" );
 360   orc_program_append_str( p, "addusw", "d1", "t1", "t2" );

ie.

cast s1 up from unsigned byte to word in t1
cast s2 up from unsigned byte to word in t2
unsigned word add of add t1 and t2 to make d1

27 January 2010

The improved nohalo interpolators from last summer's Google Summer of Code have now landed in VIPS trunk:

http://socghop.appspot.com/gsoc/student_project/show/google/gsoc2009/gimp/t124022365999

We'll probably tidy up the three or four nohalo interpolators and just have one sensible one.

15 January 2010

Trunk has a new command-line program, vipsthumbnail:

http://vips.svn.sourceforge.net/viewvc/vips/vips7/trunk/tools/iofuncs/vipsthumbnail.c?view=markup

This is a simple program to make image thumbnails. It's fast and needs very little memory. Run it like this:

$ time vipsthumbnail wtc.jpg
real    0m0.452s
user    0m0.410s
sys 0m0.040s

That makes tn_wtc.jpg, the original 10,000 x 10,000 pixel RGB image sized down to fit inside 128 x 128. It needs about 5MB of memory. By contrast, ImageMagick is slower:

$ time convert -define jpeg:size=256x256 wtc.jpg -thumbnail 128x128 -unsharp 0x.5 tn_wtc.jpg
real    0m3.772s
user    0m3.230s
sys 0m0.510s

And needs about 700m of memory.

Features:

  • can thumbnail any image format supported by vips
  • colour management
  • three-stage resample: block average by integer factor to size above final dimensions, bilinear resample to final size, sharpen
  • if the decompressed image is below a certain size, vipsthumbnail will decompress to memory before thumbnailing. Above this threshold, it decompresses to a temporary disc file and then shrinks from that. You can use this to limit the maximum memory that vips needs to thumbnail an image
  • command-line options to control colour management, threading, file formats, thumbnail name, location and size, maximum memory use, and so on
Clone this wiki locally