A recent neuroscience project [*] required drawing first into a small (anywhere from 8x8 to 150x150) pixel map, then blitting a scaled-up copy to the screen. The scaling was always by an integral factor (from 2X to as much as 50X), and resulted in a final size on screen on the order of 300x300. Moreover, it was absolutely critical that we be able to do this at video rates (67 fps), never missing a frame, even when some additional computation (e.g., generation of pseudorandom numbers) is required between frames. Both the source buffer and the screen use 32-bit color, and the stimulus computer is a Macintosh G3.
The CopyBits function of the MacOS toolbox has been extensively optimized, and its performance on straight (unscaled and unmasked) pixel copies is very hard to beat. However, when asked to scale the result, e.g. copying a small pixel map to a large area of the screen, its performance degrades severely. Our initial implementation used CopyBits to do the scaling, but it proved to be too slow in many cases. A faster scaling blitter was needed.
Four different blitters were evaluated at a variety of source buffer sizes and scaling factors. The blitters were as follows:
Tests involved performing 100 or 1000 scaling copies of the source buffer to the screen, measuring the elapsed time with UpTime, and calculating a frame rate. Note that while in the actual application there would be no point in blitting faster than the display's refresh rate, in these tests the framerate was measured independent of the screen rate to get a feel for absolute speed. A higher framerate means faster blitting, which allows more time for other computations.
All tests were performed on a 1998 Power Macintosh G3, 300 MHz, with 128 MB built-in RAM plus 1 MB of virtual memory, 1 MB backsize L2 cache, and equipped with Rage II (on the motherboard) and a RagePro 3D accelerator card. The system software was MacOS 8.5.1. The display was a Sony Trinitron set to 1152 x 870 32-bit pixels, except for some RAVE tests which required reducing the resolution to 832 x 624.
The frames per second of most blitters increased substantially as the final (onscreen) size decreased. The exception was the RAVE blitter, the framerate of which was fairly constant over a wide range of output sizes. None of the blitters were significantly affected by the source size (except for the unscaled case, i.e., source and output the same size).
The nearly constant performance of the RAVE blitter produces a strange result. At small output sizes, 256x256 or less, the RAVE blitter was the worst performer, yielding about 75 fps. But at larger output sizes, over 400x400 or so, the RAVE blitter's performance (45-60 fps) had degraded less than the others, making it the best of the four.
The other three blitters produced more consistent results: DirectMem always achieved the fastest framerate, followed by Direct, with standard CopyBits coming in last. On average, DirectMem was over twice as fast as CopyBits for scaling blits. The Direct blitter was significantly slower when the destination was not aligned on an 8-byte boundary; the other blitters were not affected by alignment. (Nonaligned results are not shown.)
A closer look at the larger problem sizes (right) shows the improved performance of RAVE when the output is greater than 400x400 pixels. In this regime, it outperforms the other blitters by about 15 fps. However, note that even RAVE performance drops at the largest problem sizes, where output is 512x512 and input is 256x256 (leftmost point of graph). Its performance here is hardly better than Direct and DirectMem. This suggests that the superior performance of RAVE may be restricted to a fairly limited area of the problem space.
This approach does require that the scale factor must be an even multiple of two. This is an acceptable constraint for our intended use. It is not affected by the alignment of the destination buffer.
The performance of RAVE at small blits was disappointing. Probably RAVE was redrawing the entire screen on each refresh, even when only a small area was changing. It may be possible to direct the RAVE library to redraw only part of the screen, and this may result in superior performance throughout the problem space. This avenue was not pursued further for several reasons:
Usage notes are in the ScaleBlit.h header file. Note that while the source file itself is C++, the interface is C, and it may be called from any language capable of using a standard C interface. This code is public domain. But if you find any bugs or have other suggestions for improvement, please let me know.
* This work was performed in the
Chichilnisky Lab at the
Salk Institute.
The result? CopyBits under these conditions is nearly as fast as the fancy custom blitter, even after adding cacheline clearing, loop unrolling, etc.
Conclusions
For our purposes, the approach benchmarked as "DirectMem" (blue in the graphs above) provides the best scaling blitter over a variety of problem sizes. For any output over 400x400 pixels, it is superior to any other method tested, over twice as fast as CopyBits.
(But please see the Epilogue for a very important update.)
However, it is worth noting that RAVE brings some unique advantages as well, such as the possibility of scaling smoothly (i.e. with anti-aliasing). For some developers, these advantages and the possible increased speed may outweigh the disadvantages listed above.
Download
The "DirectMem" blitter has been renamed to ScaleBlit, and is available here:
Epilogue
or, Getting There the Hard Way
After the above tests were completed and posted, I made the blitter above a little more general by checking for cases where the scale factor is not divisible by 2. In this case, the code now calls CopyBits to do the scaling -- to the offscreen GWorld, just like my fancy blitter -- and then, as usual, another CopyBits (with no scaling) to the screen.
Revised Conclusion
If you need to do scaling in a hurry, you don't need a fancy blitter -- just use two CopyBits calls:
The result is about twice as fast as CopyBits with scaling direct to the screen, basically equivalent to the DirectMem blitter in the graphs above.
http://www.strout.net/info/coding/macdev/scaleblit/index.html
Last Updated:
6/21/99
. . . . . .
webmaster@strout.net