Create a list with initial capacity in Python Create a list with initial capacity in Python python python

Create a list with initial capacity in Python


def doAppend( size=10000 ):    result = []    for i in range(size):        message= "some unique object %d" % ( i, )        result.append(message)    return resultdef doAllocate( size=10000 ):    result=size*[None]    for i in range(size):        message= "some unique object %d" % ( i, )        result[i]= message    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.


Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None'sfor i in xrange(1000):    # baz    l[i] = bar    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():    while foo:        #baz        yield bar        #quxfor thing in my_things():    # do something with thing

This way, the list isn't every stored all in memory at all, merely generated as needed.


Short version: use

pre_allocated_list = [None] * size

to preallocate a list (that is, to be able to address 'size' elements of the list instead of gradually forming the list by appending). This operation is very fast, even on big lists. Allocating new objects that will be later assigned to list elements will take much longer and will be the bottleneck in your program, performance-wise.

Long version:

I think that initialization time should be taken into account.

Since in Python everything is a reference, it doesn't matter whether you set each element into None or some string - either way it's only a reference. Though it will take longer if you want to create a new object for each element to reference.

For Python 3.2:

import timeimport copydef print_timing (func):  def wrapper (*arg):    t1 = time.time()    res = func (*arg)    t2 = time.time ()    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))    return res  return wrapper@print_timingdef prealloc_array (size, init = None, cp = True, cpmethod = copy.deepcopy, cpargs = (), use_num = False):  result = [None] * size  if init is not None:    if cp:      for i in range (size):          result[i] = init    else:      if use_num:        for i in range (size):            result[i] = cpmethod (i)      else:        for i in range (size):            result[i] = cpmethod (cpargs)  return result@print_timingdef prealloc_array_by_appending (size):  result = []  for i in range (size):    result.append (None)  return result@print_timingdef prealloc_array_by_extending (size):  result = []  none_list = [None]  for i in range (size):    result.extend (none_list)  return resultdef main ():  n = 1000000  x = prealloc_array_by_appending(n)  y = prealloc_array_by_extending(n)  a = prealloc_array(n, None)  b = prealloc_array(n, "content", True)  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)  d = prealloc_array(n, "content", False, "some object {}".format, None, True)  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)  print ("x[5] = {}".format (x[5]))  print ("y[5] = {}".format (y[5]))  print ("a[5] = {}".format (a[5]))  print ("b[5] = {}".format (b[5]))  print ("c[5] = {}".format (c[5]))  print ("d[5] = {}".format (d[5]))  print ("e[5] = {}".format (e[5]))  print ("f[5] = {}".format (f[5]))  print ("g[5] = {}".format (g[5]))if __name__ == '__main__':  main()

Evaluation:

prealloc_array_by_appending took 118.00003051757812 msprealloc_array_by_extending took 102.99992561340332 msprealloc_array took 3.000020980834961 msprealloc_array took 49.00002479553223 msprealloc_array took 316.9999122619629 msprealloc_array took 473.00004959106445 msprealloc_array took 1677.9999732971191 msprealloc_array took 2729.999780654907 msprealloc_array took 3001.999855041504 msx[5] = Noney[5] = Nonea[5] = Noneb[5] = contentc[5] = some object blahd[5] = some object 5e[5] = af[5] = []g[5] = ()

As you can see, just making a big list of references to the same None object takes very little time.

Prepending or extending takes longer (I didn't average anything, but after running this a few times I can tell you that extending and appending take roughly the same time).

Allocating new object for each element - that is what takes the most time. And S.Lott's answer does that - formats a new string every time. Which is not strictly required - if you want to preallocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.