deepcopy() is extremely slow deepcopy() is extremely slow python python

deepcopy() is extremely slow


Actually, deepcopy is very slow. But we can use json, ujson, or cPickle.we can use json/cPickle to dump an object, and load it later.This is my test:

Total time: 3.46068 sFile: test_deepcopy.pyFunction: test at line 15Line #   Hits          Time Per Hit   % Time  Line Contents==============================================================15                                             @profile16                                             def test():17       100       957585   9575.9     27.7        b = deepcopy(a)18       100          862      8.6      0.0        c = copy(a)19       100        42295    422.9      1.2        d = ujson.loads(ujson.dumps(a))20       100        85040    850.4      2.5        e = json.loads(json.dumps(a))21       100      2323465  23234.7     67.1        f = pickle.loads(pickle.dumps(a, -1))22       100        51434    514.3      1.5        g = cPickle.loads(cPickle.dumps(a, -1))

as what we can see, json/ujson/cPickle is faster than deepcopy, but pickle...


If you create your own class to hold these objects you can create your own methods that work with copy and deep copy. http://www.rafekettler.com/magicmethods.html#copying (Broken Link)

New Link for a github repository https://github.com/RafeKettler/magicmethods

class MyClass():    def __copy__(self):        copy_object = MyClass()        return copy_object    def __deepcopy__(self, memodict={}):        copy_object = MyClass()        copy_object.value = self.value        return copy_objectif __name__ == "__main__":    my_inst = MyClass()    print(copy.deepcopy(my_inst))

Here is a similar description from the previous broken link.

Copying

Sometimes, particularly when dealing with mutable objects, you want to be able to copy an object and make changes without affecting what you copied from. This is where Python's copy comes into play. However (fortunately), Python modules are not sentient, so we don't have to worry about a Linux-based robot uprising, but we do have to tell Python how to efficiently copy things.

__copy__(self)

Defines behavior for copy.copy() for instances of your class. copy.copy() returns a shallow copy of your object -- this means that, while the instance itself is a new instance, all of its data is referenced -- i.e., the object itself is copied, but its data is still referenced (and hence changes to data in a shallow copy may cause changes in the original).

__deepcopy__(self, memodict={})

Defines behavior for copy.deepcopy() for instances of your class. copy.deepcopy() returns a deep copy of your object -- the object and its data are both copied. memodict is a cache of previously copied objects -- this optimizes copying and prevents infinite recursion when copying recursive data structures. When you want to deep copy an individual attribute, call copy.deepcopy() on that attribute with memodict as the first argument.What are some use cases for these magic methods? As always, in any case where you need more fine-grained control than what the default behavior gives you. For instance, if you are attempting to copy an object that stores a cache as a dictionary (which might be large), it might not make sense to copy the cache as well -- if the cache can be shared in memory between instances, then it should be.


I've made a fast experiment comparing both deepcopy/json/ujson for several cases and my results contradicts @cherish's ones on certain cases, posting the little experiment here:

import ujsonimport timeitimport jsonimport randomimport stringimport copyimport ujsonimport sysdef random_string(N):    return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))def random_json(width=5, height=5, levels=1):    dct = {}    lst = [random_string(4) for i in range(width)]    lst2 = [random.randint(0, 10000) for i in range(width)]    lst3 = [bool(random.randint(0, 1)) for i in range(width)]    for j in range(height):        dct[str(j)] = lst        dct[str(width+j)] = lst2        dct[str(2*width+j)] = lst3    for i in range(levels):        new_dct = {}        for j in range(height):            new_dct[str(j)] = dct        dct = json.loads(json.dumps(new_dct))    return new_dctif __name__ == "__main__":    print(sys.version)    levels = 3    for i in range(15):        dataset = random_json(i, i, levels)        print("Comparing deepcopy/ujson/json using random dataset({},{},{}), length {}".format(i,i,levels, len(json.dumps(dataset))))        print(timeit.timeit('copy.deepcopy(dataset)',                            setup='from __main__ import copy, dataset', number=10))        print(timeit.timeit('ujson.loads(ujson.dumps(dataset))',                            setup='from __main__ import ujson, dataset', number=10))        print(timeit.timeit('json.loads(json.dumps(dataset))',                            setup='from __main__ import json, dataset', number=10))        print()

And the results would be:

3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)]Comparing deepcopy/ujson/json using random dataset(0,0,3), length 22.6842977659931844e-050.000120398649798223717.776568527950847e-05Comparing deepcopy/ujson/json using random dataset(1,1,3), length 630.00027316677265695343.552747043226263e-050.00012987264191349377Comparing deepcopy/ujson/json using random dataset(2,2,3), length 11060.00118582801309463620.000349748208922053250.0007093651596308467Comparing deepcopy/ujson/json using random dataset(3,3,3), length 68340.00422184773636722150.00211783198743432930.003378267688436718Comparing deepcopy/ujson/json using random dataset(4,4,3), length 265720.0113790540297822840.0062887570161819710.009920059244030693Comparing deepcopy/ujson/json using random dataset(5,5,3), length 792100.0288794912150434350.0279064332748709120.029595961868760734Comparing deepcopy/ujson/json using random dataset(6,6,3), length 1836780.0471429795152552840.046821258533007590.06791747047568517Comparing deepcopy/ujson/json using random dataset(7,7,3), length 3955280.082392151429131980.098713471345713510.15347433002098887Comparing deepcopy/ujson/json using random dataset(8,8,3), length 7649200.13519544648358960.194488426137007340.3020533693660834Comparing deepcopy/ujson/json using random dataset(9,9,3), length 13565700.245602587347246710.440749061186594070.5705849913806413Comparing deepcopy/ujson/json using random dataset(10,10,3), length 22877700.32378157553278350.611040516711530.8698565598118777Comparing deepcopy/ujson/json using random dataset(11,11,3), length 35987500.49582848284674520.94722236367418771.2514314609961668Comparing deepcopy/ujson/json using random dataset(12,12,3), length 56364140.62614482339097141.40667229579698021.8636325417418167Comparing deepcopy/ujson/json using random dataset(13,13,3), length 82208000.83965820994445472.0616756886704092.755659427352441Comparing deepcopy/ujson/json using random dataset(14,14,3), length 120182901.09519269902587622.967030507438864.088875914783021

Conclusion from this little experiment is:

  • When dictionaries are small ones time(ujson)<time(json)<time(deepcopy)
  • When dictionaries are big ones time(deepcopy)<time(ujson)<time(json)

So it depends the number of copies you're making per second and which type of dictionary you're dealing with, you'll prefer switching between deepcopy or ujson.