Must access to scala.collection.immutable.List and Vector be synchronized? Must access to scala.collection.immutable.List and Vector be synchronized? multithreading multithreading

Must access to scala.collection.immutable.List and Vector be synchronized?


It depends on where you share them:

  • it's not safe to share them inside scala-library
  • it's not safe to share them with Java-code, reflection

Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.

However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:

private var dirty = false//from VectorPointerprivate[immutable] var depth: Int = _private[immutable] var display0: Array[AnyRef] = _private[immutable] var display1: Array[AnyRef] = _private[immutable] var display2: Array[AnyRef] = _private[immutable] var display3: Array[AnyRef] = _private[immutable] var display4: Array[AnyRef] = _private[immutable] var display5: Array[AnyRef] = _

which is implemented like:

val s = new Vector(startIndex, endIndex + 1, blockIndex)s.initFrom(this) //uses displayN and depths.gotoPos(startIndex, startIndex ^ focus) //uses displayNs.gotoPosWritable //uses dirty...s.dirty = dirty

And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.

I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:

final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {  override def tail : List[B] = tl  override def isEmpty: Boolean = false}

But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:

override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {    val h = new ::(head, Nil)    var t = h    var rest = tail    var i = 1    while ({if (rest.isEmpty) return this; i < n}) {      i += 1      val nx = new ::(rest.head, Nil)      t.tl = nx //here is mutation of t's filed       t = nx      rest = rest.tail    }    h}

So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.


A little addtion here about relations with JSR-133:

1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)

2) as long as this address is in the stack, it's known only to the current thread

3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.

So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).


I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.

P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.

Anyway, immutability is still a good way for thread-safety.

P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.


A little addition about another exotic but really terrifying way, as it pointed out in comments with @Steve Waldman and @axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have @volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.


In your question you are asking for an authoritative statement. I found the following in "Programming in Scala" from Martin Odersky et al:"Third, there is no way for two threads concurrently accessing an immutable to corrupt its state once it has been properbly constructed, because no thread can change the state of an immutable"

If you look for example at the implementation you see that this is followed in the implementation, see below.

There are some fields inside vector which are not final and could lead to data races. But since they are only changed inside a method creating a new instance and since you need an Synchronization action to access the newly created instance in different threads anyway everyting is fine.

The pattern used here is to create and modify an object. Than make it visible to other threads, for example by assigning this instance to a volatile static or static final. And after that make sure that it is not changed anymore.

As an Example the creation of two vectors:

  val vector = Vector(4,5,5)  val vector2 =  vector.updated(1, 2);

The method updated uses the var field dirty inside:

private[immutable] def updateAt[B >: A](index: Int, elem: B): Vector[B] = {    val idx = checkRangeConvert(index)    val s = new Vector[B](startIndex, endIndex, idx)    s.initFrom(this)    s.dirty = dirty    s.gotoPosWritable(focus, idx, focus ^ idx)  // if dirty commit changes; go to new pos and prepare for writing    s.display0(idx & 0x1f) = elem.asInstanceOf[AnyRef]    s  }

but since after creation of vector2 it is assigned to a final variable:Bytecode of variable declaration:

private final scala.collection.immutable.Vector vector2;

Byte code of constructor:

61  invokevirtual scala.collection.immutable.Vector.updated(int, java.lang.Object, scala.collection.generic.CanBuildFrom) : java.lang.Object [52]64  checkcast scala.collection.immutable.Vector [48]67  putfield trace.agent.test.scala.TestVector$.vector2 : scala.collection.immutable.Vector [22]

Everything is o.k.