Saturday, February 7, 2009

Fast and easy float-to-int conversion with SSE

If you’ve ever found float-to-integer conversion to be a hotspot in your application, you’ve probably run into using fistp in inline assembly as an alternative. This article is a great writeup of why conversion is slow, and benchmarks various ways to do float-to-integer on the x86 platform. However, it omits a nice alternative if you’re not into inline assembly (and on the x64 platform you might not even be allowed to use it by the compiler!), and it’s right in the SSE instruction set (which happens to be guaranteed on x64).

If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.

For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!

If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:

__forceinline int FastToInt( float f )
{
return _mm_cvtss_si32( _mm_load_ss( &f ) );
}

No comments:

Post a Comment