Is VC++ still broken Sequentially-Consistent-wise? Is VC++ still broken Sequentially-Consistent-wise? multithreading multithreading

Is VC++ still broken Sequentially-Consistent-wise?


The 'volatile' keyword will prevent that kind of optimization. That's exactly what it's for: every use of 'a' will be read or written exactly as shown, and won't be moved in a different order to other volatile variables.

The implementation of the mutex should include compiler-specific instructions to cause a "fence" at that point, telling the optimizer not to reorder instructions across that boundary. Since the implementation is not from the compiler vendor, maybe that's left out? I've never checked.

Since 'a' is global, I would generally think the compiler would be more careful with it. But, VS10 doesn't know about threads so it won't consider that other threads will use it. Since the optimizer grasps the entire loop execution, it knows that functions called from within the loop won't touch 'a' and that's enough for it.

I'm not sure what the new standard says about thread visibility of global variables other than volatile. That is, is there a rule that would prevent that optimization (even though the function can be grasped all the way down so it knows other functions don't use the global, must it assume that other threads can) ?

I suggest trying the newer compiler with the compiler-provided std::mutex, and checking what the C++ standard and current drafts say about that. I think the above should help you know what to look for.

—John


Almost a month later, Microsoft still hasn't responded to the bug in MSDN Connect.

To summarize the above comments (and some further tests), apparently it happens in VS2013 professional as well, but the bug only happens when building for Win32, not for x64. The generated assembly code in x64 doesn't have this problem.So it appears that it is a bug in the optimizer, and that there's no race condition in this code.

Apparently this bug also happens in GCC 4.8.1, but not in GCC 4.9.(Thanks to Voo, nosid and Chris Dodd for all their testing).

It was suggested to mark a as volatile. This indeed prevents the bug, but only because it prevents the optimizer from performing the loop register optimization.

I found another solution: Add another local variable b, and if needed (and under lock) do the following:

  1. Copy a into b
  2. Increment b in the loop
  3. Copy back to a if needed

The optimizer replaces the local variable with a register, so the code is still optimized, but the copies from and to a are done only if needed, and under lock.

Here's the new main() code, with arrows marking the changed lines.

int main(int argc, char* argv[]){    bool work = (argc == 1);    int b = 0;          // <----    if (work)    {        m.lock();        b = a;          // <----    }    std::thread th(other);    for (int i = 0; i < 100000000; ++i)    {        if (i % 7 == 3)        {            if (work)            {                ++b;    // <----            }        }    }    if (work)    {        a = b;          // <----        std::cout << a << "\n";        m.unlock();    }    th.join();}

And this is what the assembly code looks like (&a == 0x000744b0, b replaced with edi):

    21:     int b = 0;00071473  xor         edi,edi      22:     23:     if (work)00071475  test        bl,bl  00071477  je          main+5Bh (07149Bh)      24:     {    25:         m.lock();         ........00071492  add         esp,4      26:         b = a;00071495  mov         edi,dword ptr ds:[744B0h]      27:     }    28:          ........    33:         {    34:             if (work)00071504  test        bl,bl  00071506  je          main+0C9h (071509h)      35:             {    36:                 ++b;00071508  inc         edi      30:     for (int i = 0; i < 100000000; ++i)00071509  inc         esi  0007150A  cmp         esi,5F5E100h  00071510  jl          main+0A0h (0714E0h)      37:             }    38:         }    39:     }    40:     41:     if (work)00071512  test        bl,bl  00071514  je          main+10Ch (07154Ch)      42:     {    43:         a = b;    44:        std::cout << a << "\n";00071516  mov         ecx,dword ptr ds:[73084h]  0007151C  push        edi  0007151D  mov         dword ptr ds:[744B0h],edi  00071523  call        dword ptr ds:[73070h]  00071529  mov         ecx,eax  0007152B  call        std::operator<<<std::char_traits<char> > (071A80h)       ........

This keeps the optimization and solves (or works around) the problem.