The Aav dev blg

Skylight shadow and stuff.

2010-11-17T23:40:00.005+01:00

Been working on the ambient/sky shadowing. Since the sky shadowing is block based, I had to blur it to avoid artefacts (currently a simple averaging filter). It is quite fuzzy, but better than anticipated. Also by adding another filter tap directly above a block, this provided a nice and simple way to simulate light "bleeding" into columns of blocks that lie in shadow, so the shadows are softened in the downward direction (provided that they neighbor some bright ones). I think I'm approaching a look that I like... still lots of tweaking remains.

For OpenGL 3 capable cards, I added support for EXT_texture_arrays. This helps to solve the texture atlas issue, so now there is full mipmapping and anisotropic filtering in there with zero tiling problems. Just wish it was available on older hw too.

1) Using a favorite 32x32 mid-res texture pack (John Smith Textures v4), this shot shows some light patterns on the floor projected by the ceiling window. This is before the blurring was added, so if you walked up close you'd be annoyed by the vertex interpolation.

2) Same set of textures, and still point filtered in this one.

3) My coplayers modified the train station while I was coding, as this new TCP dump shows, ithas even more corners, ledges and interesting light patterns to test with now. This also shows the skylight bleeding in from the cross shaped "window", and some of the new 128 x 128 texture set. Torches are missing, that's why it's quite dark.

Next up either torch lighting or some player controls. I want to jump around in there, and not have to use the official client to connect me anymore.

Eventually I will port over my sky system from the old game. That'll be quite dramatic.

MC Hammer

2010-11-12T09:18:00.009+01:00

Well, in my limited spare time I found a new obsession. Again of questionable use... but I'll be dumping some progress here. Eventually this will become MC Hammer, an alternate Minecraft client.

First steps. Ran Minecraft and dumped the TCP stream via Wireshark. Thanks to #mcc on irc.esper.net and this excellent wiki documenting the protocol, I managed to parse some wireframe geometry out of it!

Some time later. I have texturing working etc, but it's Minecraft-style so not very pretty. I focused on working out the protocol and now render all world geometry. Next up, implementing my ambient occlusion scheme. Here are some first results, showing AO and some basic ambient shadow. The latter is done per block, and you can clearly see the squares. It's going to be blurred and interpolated.

Ambient shadow is filtered and interpolated... starting to look nice now I think. A little bland though. I need a good texture set.

Fixed cost gaussian blur (or: what no-one else will tell you)

2009-06-21T00:11:00.004+02:00

Well the title's a bit of an exaggeration. But this piece of code is used in ColorLab for calculating the gaussian blur kernel. You get an almost arbitrary blur strength (sigma) for an almost fixed cost (in fact it's faster the stronger the blur). It does this by approximating a little:

- Assumes you first scale down the image using a box filter N times (the algorithm gives you the N).
- You then run the shader pass, using M bilinear texture samples. Most of those samples actually pick up two pixels, for a near 2x speedup.

I haven't actually seen correct calculations of the bilinear sample points online so I worked it out myself. Turned out to be trivial, I wonder why I couldn't find it.

Since this kind of set-up code is hard to come by (for some reason) I'm posting it here:

Click to get the code

I rather like how it turned out. With contrast rich images you can sometimes get "popping" due to the decimations, but by setting MAX_WEIGHTS arbitrarily high you can pretty much bypass the decimation step and still use the generator (and in particular OptimizeBilinear).

Windows Installer

2009-05-19T16:34:00.003+02:00

Windows Installer/MSI files are such a nightmare to work with. I think this MSDN page speaks for itself. And that was just one detail of what I just went through.

64-bit Win!

2009-05-13T20:08:00.004+02:00

I got ColorLab running natively in the public Windows 7 RC today, under Vegas Pro 9 64-bit (yes indeed), and the experience was painless. My ColorLab code is 64-bit safe already, thanks to the Visual C++ compiler checks. There were just a few hiccups (and even one blatant bug) in the DXMedia SDK files.

So the thing is, Sony Vegas as of 2009 still uses smoking hot DirectX Transform plugin technology. It's at least ten years old, DirectX Media saw its last release in the age of DirectX 6 (!), and that's really quite depressing. Why Sony, why? But despite this, ColorLab seems to be working and I'm impressed with Microsoft again. It could have been a nightmare but just... wasn't.

Now I just have to make a 64-bit installer and 1.0 RC2 is nearly at yer doorstep.

DLL headaches

2009-02-13T14:03:00.002+01:00

My ColorLab project has an ATL DLL which exposes a couple of COM objects. These COM objects are plugins that are spawned from the host application (Sony Vegas/Movie Studio), and there may be any number of instances. However, there is only meant to be one back-end "engine" in this DLL running for a Sony Vegas process, independent of how many COM objects are created.

In managing this per-process "engine", proper cleanup was the most problematic. When is it no longer needed? Any DLL "unloading" hooks would be too late. DLL_PROCESS_DETACH is also too late. So I was listening for DLL_THREAD_DETACH in my DllMain, and looking at _Module.GetLockCount() for a kind of "reference count". If this got down to zero, I assumed Vegas was no longer using my DLL and destroyed the engine. How naïve I was. It worked fine for a while, but after other revisions to my code GetModuleCount() didn't seem to reach zero any more.

Turns out, it's quite forbidden to do heavy cleanup involving threads and COM from within DllMain. In fact, don't do anything in DllMain unless it's explicitly allowed.

The solution for me was to explicitly reference count my back-end engine, and let the COM objects created from Vegas do an AddRef() in their constructor, and a Release() in their destructor. So when no more COM objects are active, the back-end is no longer needed and is destroyed. The shortcut I had taken turned out to be a slippery mountain road.

Moral of the story: be very very careful with DLL:s.

White balance and highlight protection

2009-02-07T14:58:00.007+01:00

When you fix the white balance of an image, you typically shift the colors using some linear or nonlinear transform. In the case of ColorLab, it's a chromatic adaptation transform based on the CIECAT02 matrix, and it shifts the image colors between two illuminants. The source illuminant is one that the user has helped to define by clicking a pixel in the image that is meant to be white. If this is an orange-tinted color, a light source with these characteristics is found by the algorithm. The destination illuminant is always based on the standard D65 one, which is what modern PAL/NTSC/HDTV as well as computer video assumes.

Anyway, when you shift colors in this way, and some colors are clipped to pure white or nearly pure white in the original image (which has a bad color cast overall), those highlights of the image might look pretty ugly afterwards. See this clock for an example:

As you can see, overall the image has been improved but the clock face is now a dull cyan shade. The linear white balance transform didn't take into account that this was meant to be a highlight. Trying to white balance this image in Adobe Lightroom will give a result that isn't cyan-tinted. However, it seems that Lightroom does this by raising the brightness of the entire image. This causes the clock face to again clip at white. I don't like that, since it might compromise the rest of the image.

So I had to find my own way. After a little experimentation, and discussion with John O., I found a solution that seems to work at least most of the time. I added an option called "Protect highlights" to ColorLab, and made the strength of the effect configurable.

This looks a lot better. :) What the algorithm does is examine the R/G/B value of the original pixel, and depending on how close it is to the maximum (255 for each channel), it shifts the WB-corrected pixel towards a desaturated version of the original. Therefore the luma of the original is retained, while removing the original color cast.

I still have to test this technique on more images, but it looks promising.

Fast and easy float-to-int conversion with SSE

2009-02-07T13:25:00.004+01:00

If you’ve ever found float-to-integer conversion to be a hotspot in your application, you’ve probably run into using fistp in inline assembly as an alternative. This article is a great writeup of why conversion is slow, and benchmarks various ways to do float-to-integer on the x86 platform. However, it omits a nice alternative if you’re not into inline assembly (and on the x64 platform you might not even be allowed to use it by the compiler!), and it’s right in the SSE instruction set (which happens to be guaranteed on x64).

If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.

For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!

If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:

__forceinline int FastToInt( float f )
{
return _mm_cvtss_si32( _mm_load_ss( &f ) );
}

Dev blg!

2009-02-07T13:14:00.000+01:00

Decided to set up another blog where I'll post development diary entries etc. Welcome!

Compressing normals (and other unit vectors)

2008-05-03T21:57:00.001+02:00

A while ago I was thinking about compact GPU-friendly vertex data formats to use for normals and other unit vectors (tangents, bitangents etc).

Among the most interesting are DEC3N, which is 10 bit per component (-511..511), and the simple method of just quantizing to (-127..127) range and stuffing the vector in UBYTE4 or UBYTE4N. These all let you store a unit vector in a DWORD, but the precision is rather bad. The quantization can make the normals visibly different from the original floating point versions.

So it occurred to me that just using this “shell” of integers for unit vectors isn’t making good use of the representable values. You already know your unit vector is, well, unit length, so all you care about is the direction. And there are far more directions expressable as n-length vectors than there are ones expressable as unit vectors in a 3 byte tuple. Assuming adding a normalize() to your decoding step is cheap, we can exploit this!

(Click the image to magnify.)

I hacked up some code and made a compressor that takes a 3 float vector, uses 3D DDA to scan the “3-byte space” for useful candidates, finds the best of these, and spits out a 3 byte vector for you. The code is downright horrible, and I think I lifted the DDA code from the Interweb, but the idea should be clear enough to run with. Also, the compressor seems to work well for any game data I’ve thrown at it, and gives very low error compared to the original normal.

I intend to make a follow-up post with some error numbers, and improved code later. If anyone can find ways to improve it, please comment!

Grab the code here.