ArrayList and Multithreading in Java ArrayList and Multithreading in Java multithreading multithreading

ArrayList and Multithreading in Java


There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.

The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:

public void add(T element) {    if (!haveSpace(size + 1)) {        expand(size + 1);    }    elements[size] = element;    // HERE    size++;}

Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)

The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)

The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.

    List list = Collections.synchronizedList(new ArrayList());    // Thread 1    List list2 = ...    for (Object element : list2) {        list.add(element);    }    // Thread 2    List list3 = ...    for (Object element : list) {        list3.add(element);    }

If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.


Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).

Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.

If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?

Yup. That's possible. The probability of it happening is very small1 but that just makes this kind of problem more insidious.


1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.


A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?

static class ListTester implements Runnable {    private List<Integer> a;    public ListTester(List<Integer> a) {        this.a = a;    }    public void run() {        try {            for (int i = 0; i < 20; ++i) {                a.add(i);                Thread.sleep(10);            }        } catch (InterruptedException e) {        }    }}public static void main(String[] args) throws Exception {    ArrayList<Integer> a = new ArrayList<Integer>();    Thread t1 = new Thread(new ListTester(a));    Thread t2 = new Thread(new ListTester(a));    t1.start();    t2.start();    t1.join();    t2.join();    System.out.println(a.size());    for (int i = 0; i < a.size(); ++i) {        System.out.println(i + "  " + a.get(i));    }}

edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)

This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)


I don't really see an instance where the string is half modified, I am on the right track here?

That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.

can someone please give me an example where an ArrayList causes a problem and a Vector solves it?

If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:

 // broken, even though vector is "thread-safe" if (vector.isEmpty())    vector.add(1);

The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have.You need to synchronize in your application code (and then there is no need to use Vector).

synchronized(list){   if (list.isEmpty())     list.add(1);}

The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.