Modifying a Python dictionary from different threads Modifying a Python dictionary from different threads multithreading multithreading

Modifying a Python dictionary from different threads


Does the same apply to dictionaries? Or is a dictionary a collection of variables?

Let's be more general:

What does "atomic operation" mean?

From Wikipedia :

In concurrent programming, an operation (or set of operations) is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes.

Now what does this mean in Python?

This means that each bytecode instruction is atomic (at least for Python <3.2, before the new GIL).

Why is that???

Because Python (CPython) use a Global Interpreter Lock (GIL). The CPython interpreter uses a lock to make sure that only one thread runs in the interpreter at a time, and uses a "check interval" (see sys.getcheckinterval()) to know how many bytecode instructions to execute before switching between threads (by default set to 100).

So now what does this mean??

It means that operations that can be represented by only one bytecode instruction are atomic. For example, incrementing a variable is not atomic, because the operation is done in three bytecode instructions:

>>> import dis>>> def f(a):        a += 1>>> dis.dis(f)  2           0 LOAD_FAST                0 (a)              3 LOAD_CONST               1 (1)      <<<<<<<<<<<< Operation 1 Load              6 INPLACE_ADD                         <<<<<<<<<<<< Operation 2 iadd              7 STORE_FAST               0 (a)      <<<<<<<<<<<< Operation 3 store             10 LOAD_CONST               0 (None)             13 RETURN_VALUE        

So what about dictionaries??

Some operations are atomic; for example, this operation is atomic:

d[x] = yd.update(d2)d.keys()

See for yourself:

>>> def f(d):        x = 1        y = 1        d[x] = y>>> dis.dis(f)  2           0 LOAD_CONST               1 (1)              3 STORE_FAST               1 (x)  3           6 LOAD_CONST               1 (1)              9 STORE_FAST               2 (y)  4          12 LOAD_FAST                2 (y)             15 LOAD_FAST                0 (d)             18 LOAD_FAST                1 (x)             21 STORE_SUBSCR                      <<<<<<<<<<< One operation              22 LOAD_CONST               0 (None)             25 RETURN_VALUE   

See this to understand what STORE_SUBSCR does.

But as you see, it is not totally true, because this operation:

             ...  4          12 LOAD_FAST                2 (y)             15 LOAD_FAST                0 (d)             18 LOAD_FAST                1 (x)             ...

can make the entire operation not atomic. Why? Let's say the variable x can also be changed by another thread...or that you want another thread to clear your dictionary...we can name many cases when it can go wrong, so it is complicated! And so here we will apply Murphy's Law: "Anything that can go wrong, will go wrong".

So what now?

If you still want to share variables between thread, use a lock:

import threadingmylock = threading.RLock()def atomic_operation():    with mylock:        print "operation are now atomic"


I think you misundertood this whole thread safety thing. It's not so much about variables (or variable variables - those are terrible anyway, and are just as pointless - not to say harmful - here as in every other case) but about -- for example, there are many nasty nasty ways threading can go wrong; they all come from accessing something mutable from more than one thread at overlapping times -- this:

  • thread N gets data from source (some place in memory or on disk - a variable, a slot in a dictionary, a file, pretty much anything mutable)
  • thread M gets data from source
  • thread N modifies the data
  • thread M modifies the data
  • thread N overwrites source with modified data
  • thread M overwrites source with modified data
  • Result: thread N's modifications are lost/the new shared value doesn't take thread N's modifications into account

And it applies to dictionaries and variable variables (which are just a horrible, horrible language-level implementation of dicts with string-only keys) as well. The only solutions are not using shared state to begin with (functional languages do this by discouraging or even completely disallowing mutability, and it works well for them) or adding some sort of locking to everything shared (hard to get right, but if you get it right, at least it works correctly). If no two threads every share anything in that dictionary, you're fine - but you should seperate everything, to be (a bit more) sure that they really don't share anything.


What you need to do is to not allow the threads direct access to the shared data structure but instead wrap access to it with something that guarantees mutual exclusion like a mutex.

Making access to the original structure look the same (shared[id] = value) takes some more work, but not that much.