Flushing denormalised numbers to zero
You're looking for a platform-defined way to set FTZ and/or DAZ in the MXCSR register (on x86 with SSE or x86-64); see https://stackoverflow.com/a/2487733/567292
Usually this is called something like _controlfp
; Microsoft documentation is at http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You can also use the _MM_SET_FLUSH_ZERO_MODE
macro: http://msdn.microsoft.com/en-us/library/a8b5ts9s(v=vs.71).aspx - this is probably the most cross-platform portable method.
For disabling denormals globally I use these 2 macros:
//warning these macros has to be used in the same scope#define MXCSR_SET_DAZ_AND_FTZ \int oldMXCSR__ = _mm_getcsr(); /*read the old MXCSR setting */ \int newMXCSR__ = oldMXCSR__ | 0x8040; /* set DAZ and FZ bits */ \_mm_setcsr( newMXCSR__ ); /*write the new MXCSR setting to the MXCSR */ #define MXCSR_RESET_DAZ_AND_FTZ \/*restore old MXCSR settings to turn denormals back on if they were on*/ \_mm_setcsr( oldMXCSR__ );
I call the first one at the beginning of the process and the second at the end.Unfortunately this seems to not works well on Windows.
To flush denormals locally I use this
const Float32 k_DENORMAL_DC = 1e-25f;inline void FlushDenormalToZero(Float32& ioFloat) { ioFloat += k_DENORMAL_DC; ioFloat -= k_DENORMAL_DC; }
To do this, use the Intel Intrinsics macros during program startup. For example:
#include <immintrin.h> int main() { _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); }
In my version of MSVC, this emitted the following assembly code:
stmxcsr DWORD PTR tv805[rsp] mov eax, DWORD PTR tv805[rsp] bts eax, 15 mov DWORD PTR tv807[rsp], eax ldmxcsr DWORD PTR tv807[rsp]
MXCSR is the control and status register, and this code is setting bit 15, which turns flush zero mode on.
One thing to note: this only affects denormals resulting from a computation. If you want to also set denormals to zero if they're used as input, you also need to set the DAZ flag (denormals are zero), using the following command:
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
See https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags for more information.
Also note that you need to set MXCSR for each thread, as the values contained are local to each thread.