Well the title's a bit of an exaggeration. But this piece of code is used in ColorLab for calculating the gaussian blur kernel. You get an almost arbitrary blur strength (sigma) for an almost fixed cost (in fact it's faster the stronger the blur). It does this by approximating a little:
- Assumes you first scale down the image using a box filter N times (the algorithm gives you the N).
- You then run the shader pass, using M bilinear texture samples. Most of those samples actually pick up two pixels, for a near 2x speedup.
I haven't actually seen correct calculations of the bilinear sample points online so I worked it out myself. Turned out to be trivial, I wonder why I couldn't find it.
Since this kind of set-up code is hard to come by (for some reason) I'm posting it here:
Click to get the code
I rather like how it turned out. With contrast rich images you can sometimes get "popping" due to the decimations, but by setting MAX_WEIGHTS arbitrarily high you can pretty much bypass the decimation step and still use the generator (and in particular OptimizeBilinear).
Sunday, June 21, 2009
Tuesday, May 19, 2009
Windows Installer
Windows Installer/MSI files are such a nightmare to work with. I think this MSDN page speaks for itself. And that was just one detail of what I just went through.
Wednesday, May 13, 2009
64-bit Win!
I got ColorLab running natively in the public Windows 7 RC today, under Vegas Pro 9 64-bit (yes indeed), and the experience was painless. My ColorLab code is 64-bit safe already, thanks to the Visual C++ compiler checks. There were just a few hiccups (and even one blatant bug) in the DXMedia SDK files.
So the thing is, Sony Vegas as of 2009 still uses smoking hot DirectX Transform plugin technology. It's at least ten years old, DirectX Media saw its last release in the age of DirectX 6 (!), and that's really quite depressing. Why Sony, why? But despite this, ColorLab seems to be working and I'm impressed with Microsoft again. It could have been a nightmare but just... wasn't.
Now I just have to make a 64-bit installer and 1.0 RC2 is nearly at yer doorstep.
So the thing is, Sony Vegas as of 2009 still uses smoking hot DirectX Transform plugin technology. It's at least ten years old, DirectX Media saw its last release in the age of DirectX 6 (!), and that's really quite depressing. Why Sony, why? But despite this, ColorLab seems to be working and I'm impressed with Microsoft again. It could have been a nightmare but just... wasn't.
Now I just have to make a 64-bit installer and 1.0 RC2 is nearly at yer doorstep.
Friday, February 13, 2009
DLL headaches
My ColorLab project has an ATL DLL which exposes a couple of COM objects. These COM objects are plugins that are spawned from the host application (Sony Vegas/Movie Studio), and there may be any number of instances. However, there is only meant to be one back-end "engine" in this DLL running for a Sony Vegas process, independent of how many COM objects are created.
In managing this per-process "engine", proper cleanup was the most problematic. When is it no longer needed? Any DLL "unloading" hooks would be too late. DLL_PROCESS_DETACH is also too late. So I was listening for DLL_THREAD_DETACH in my DllMain, and looking at _Module.GetLockCount() for a kind of "reference count". If this got down to zero, I assumed Vegas was no longer using my DLL and destroyed the engine. How naïve I was. It worked fine for a while, but after other revisions to my code GetModuleCount() didn't seem to reach zero any more.
Turns out, it's quite forbidden to do heavy cleanup involving threads and COM from within DllMain. In fact, don't do anything in DllMain unless it's explicitly allowed.
The solution for me was to explicitly reference count my back-end engine, and let the COM objects created from Vegas do an AddRef() in their constructor, and a Release() in their destructor. So when no more COM objects are active, the back-end is no longer needed and is destroyed. The shortcut I had taken turned out to be a slippery mountain road.
Moral of the story: be very very careful with DLL:s.
In managing this per-process "engine", proper cleanup was the most problematic. When is it no longer needed? Any DLL "unloading" hooks would be too late. DLL_PROCESS_DETACH is also too late. So I was listening for DLL_THREAD_DETACH in my DllMain, and looking at _Module.GetLockCount() for a kind of "reference count". If this got down to zero, I assumed Vegas was no longer using my DLL and destroyed the engine. How naïve I was. It worked fine for a while, but after other revisions to my code GetModuleCount() didn't seem to reach zero any more.
Turns out, it's quite forbidden to do heavy cleanup involving threads and COM from within DllMain. In fact, don't do anything in DllMain unless it's explicitly allowed.
The solution for me was to explicitly reference count my back-end engine, and let the COM objects created from Vegas do an AddRef() in their constructor, and a Release() in their destructor. So when no more COM objects are active, the back-end is no longer needed and is destroyed. The shortcut I had taken turned out to be a slippery mountain road.
Moral of the story: be very very careful with DLL:s.
Saturday, February 7, 2009
White balance and highlight protection
When you fix the white balance of an image, you typically shift the colors using some linear or nonlinear transform. In the case of ColorLab, it's a chromatic adaptation transform based on the CIECAT02 matrix, and it shifts the image colors between two illuminants. The source illuminant is one that the user has helped to define by clicking a pixel in the image that is meant to be white. If this is an orange-tinted color, a light source with these characteristics is found by the algorithm. The destination illuminant is always based on the standard D65 one, which is what modern PAL/NTSC/HDTV as well as computer video assumes.
Anyway, when you shift colors in this way, and some colors are clipped to pure white or nearly pure white in the original image (which has a bad color cast overall), those highlights of the image might look pretty ugly afterwards. See this clock for an example:

As you can see, overall the image has been improved but the clock face is now a dull cyan shade. The linear white balance transform didn't take into account that this was meant to be a highlight. Trying to white balance this image in Adobe Lightroom will give a result that isn't cyan-tinted. However, it seems that Lightroom does this by raising the brightness of the entire image. This causes the clock face to again clip at white. I don't like that, since it might compromise the rest of the image.
So I had to find my own way. After a little experimentation, and discussion with John O., I found a solution that seems to work at least most of the time. I added an option called "Protect highlights" to ColorLab, and made the strength of the effect configurable.

This looks a lot better. :) What the algorithm does is examine the R/G/B value of the original pixel, and depending on how close it is to the maximum (255 for each channel), it shifts the WB-corrected pixel towards a desaturated version of the original. Therefore the luma of the original is retained, while removing the original color cast.
I still have to test this technique on more images, but it looks promising.
Anyway, when you shift colors in this way, and some colors are clipped to pure white or nearly pure white in the original image (which has a bad color cast overall), those highlights of the image might look pretty ugly afterwards. See this clock for an example:
So I had to find my own way. After a little experimentation, and discussion with John O., I found a solution that seems to work at least most of the time. I added an option called "Protect highlights" to ColorLab, and made the strength of the effect configurable.
This looks a lot better. :) What the algorithm does is examine the R/G/B value of the original pixel, and depending on how close it is to the maximum (255 for each channel), it shifts the WB-corrected pixel towards a desaturated version of the original. Therefore the luma of the original is retained, while removing the original color cast.
I still have to test this technique on more images, but it looks promising.
Fast and easy float-to-int conversion with SSE
If you’ve ever found float-to-integer conversion to be a hotspot in your application, you’ve probably run into using fistp in inline assembly as an alternative. This article is a great writeup of why conversion is slow, and benchmarks various ways to do float-to-integer on the x86 platform. However, it omits a nice alternative if you’re not into inline assembly (and on the x64 platform you might not even be allowed to use it by the compiler!), and it’s right in the SSE instruction set (which happens to be guaranteed on x64).
If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.
For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!
If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:
If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.
For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!
If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:
__forceinline int FastToInt( float f )
{
return _mm_cvtss_si32( _mm_load_ss( &f ) );
}
Subscribe to:
Comments (Atom)