Any reason not to use '+' to concatenate two strings? Any reason not to use '+' to concatenate two strings? python python

Any reason not to use '+' to concatenate two strings?


There is nothing wrong in concatenating two strings with +. Indeed it's easier to read than ''.join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but it's still advisable to use join when concatenating more than 2 strings.


Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.

''.join([a, b, c]) trick is a performance optimization.


The assumption that one should never, ever use + for string concatenation, but instead always use ''.join may be a myth. It is true that using + creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call. Lets take your example.

Create two lists, one from the linked SO question and another a bigger fabricated

>>> myl1 = ['A','B','C','D','E','F']>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]

Lets create two functions, UseJoin and UsePlus to use the respective join and + functionality.

>>> def UsePlus():    return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]>>> def UseJoin():    [''.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]

Lets run timeit with the first list

>>> myl=myl1>>> t1=timeit.Timer("UsePlus()","from __main__ import UsePlus")>>> t2=timeit.Timer("UseJoin()","from __main__ import UseJoin")>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)2.48 usec/pass>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)2.61 usec/pass>>> 

They have almost the same runtime.

Lets use cProfile

>>> myl=myl2>>> cProfile.run("UsePlus()")         5 function calls in 0.001 CPU seconds   Ordered by: standard name   ncalls  tottime  percall  cumtime  percall filename:lineno(function)        1    0.001    0.001    0.001    0.001 <pyshell#1376>:1(UsePlus)        1    0.000    0.000    0.001    0.001 <string>:1(<module>)        1    0.000    0.000    0.000    0.000 {len}        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}        1    0.000    0.000    0.000    0.000 {range}>>> cProfile.run("UseJoin()")         5005 function calls in 0.029 CPU seconds   Ordered by: standard name   ncalls  tottime  percall  cumtime  percall filename:lineno(function)        1    0.015    0.015    0.029    0.029 <pyshell#1388>:1(UseJoin)        1    0.000    0.000    0.029    0.029 <string>:1(<module>)        1    0.000    0.000    0.000    0.000 {len}        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}     5000    0.014    0.000    0.014    0.000 {method 'join' of 'str' objects}        1    0.000    0.000    0.000    0.000 {range}

And it looks that using Join, results in unnecessary function calls which could add to the overhead.

Now coming back to the question. Should one discourage the use of + over join in all cases?

I believe no, things should be taken into consideration

  1. Length of the String in Question
  2. No of Concatenation Operation.

And off-course in a development pre-mature optimization is evil.