If your compiler has the “xmmintrin.h” header, you can probably use the SSE “intrinsics”, a set of functions replaced pretty much 1:1 with real instructions by the compiler. It will also do automatic register housekeeping for you, so it’s a vastly simplified way of getting access to SIMD instructions from C/C++.
For our conversion, the interesting instruction is _mm_cvtss_si32. It performs conversion of a single precision scalar with rounding, and should behave similarly to fistp. It’s not the fastest way to convert, but it’s a lot better than the standard (int). Going back to the article referenced above, _mm_cvtss_si32 does nearly as well as BitConvert23, yet passes the correctness test!
If you do want truncation, there is _mm_cvttss_si32 which does so, although without performance benefit. There are also actual SIMD versions of the instruction, converting two values at once. This may well be the overall winner, but for a drop-in replacement to (int), here’s a snippet that helped speed up my model data load times:
__forceinline int FastToInt( float f )
return _mm_cvtss_si32( _mm_load_ss( &f ) );