Friday, February 13, 2009

DLL headaches

My ColorLab project has an ATL DLL which exposes a couple of COM objects. These COM objects are plugins that are spawned from the host application (Sony Vegas/Movie Studio), and there may be any number of instances. However, there is only meant to be one back-end "engine" in this DLL running for a Sony Vegas process, independent of how many COM objects are created.

In managing this per-process "engine", proper cleanup was the most problematic. When is it no longer needed? Any DLL "unloading" hooks would be too late. DLL_PROCESS_DETACH is also too late. So I was listening for DLL_THREAD_DETACH in my DllMain, and looking at _Module.GetLockCount() for a kind of "reference count". If this got down to zero, I assumed Vegas was no longer using my DLL and destroyed the engine. How naïve I was. It worked fine for a while, but after other revisions to my code GetModuleCount() didn't seem to reach zero any more.

Turns out, it's quite forbidden to do heavy cleanup involving threads and COM from within DllMain. In fact, don't do anything in DllMain unless it's explicitly allowed.

The solution for me was to explicitly reference count my back-end engine, and let the COM objects created from Vegas do an AddRef() in their constructor, and a Release() in their destructor. So when no more COM objects are active, the back-end is no longer needed and is destroyed. The shortcut I had taken turned out to be a slippery mountain road.

Moral of the story: be very very careful with DLL:s.

Saturday, February 7, 2009

White balance and highlight protection

When you fix the white balance of an image, you typically shift the colors using some linear or nonlinear transform. In the case of ColorLab, it's a chromatic adaptation transform based on the CIECAT02 matrix, and it shifts the image colors between two illuminants. The source illuminant is one that the user has helped to define by clicking a pixel in the image that is meant to be white. If this is an orange-tinted color, a light source with these characteristics is found by the algorithm. The destination illuminant is always based on the standard D65 one, which is what modern PAL/NTSC/HDTV as well as computer video assumes.

Anyway, when you shift colors in this way, and some colors are clipped to pure white or nearly pure white in the original image (which has a bad color cast overall), those highlights of the image might look pretty ugly afterwards. See this clock for an example:

As you can see, overall the image has been improved but the clock face is now a dull cyan shade. The linear white balance transform didn't take into account that this was meant to be a highlight. Trying to white balance this image in Adobe Lightroom will give a result that isn't cyan-tinted. However, it seems that Lightroom does this by raising the brightness of the entire image. This causes the clock face to again clip at white. I don't like that, since it might compromise the rest of the image.

So I had to find my own way. After a little experimentation, and discussion with John O., I found a solution that seems to work at least most of the time. I added an option called "Protect highlights" to ColorLab, and made the strength of the effect configurable.

This looks a lot better. :) What the algorithm does is examine the R/G/B value of the original pixel, and depending on how close it is to the maximum (255 for each channel), it shifts the WB-corrected pixel towards a desaturated version of the original. Therefore the luma of the original is retained, while removing the original color cast.

I still have to test this technique on more images, but it looks promising.

Fast and easy float-to-int conversion with SSE

If you’ve ever found float-to-integer conversion to be a hotspot in your application, you’ve probably run into using fistp in inline assembly as an alternative. This article is a great writeup of why conversion is slow, and benchmarks various ways to do float-to-integer on the x86 platform. However, it omits a nice alternative if you’re not into inline assembly (and on the x64 platform you might not even be allowed to use it by the compiler!), and it’s right in the SSE instruction set (which happens to be guaranteed on x64).

If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.

For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!

If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:

__forceinline int FastToInt( float f )
return _mm_cvtss_si32( _mm_load_ss( &f ) );

Dev blg!

Decided to set up another blog where I'll post development diary entries etc. Welcome!