deepcopy() is extremely slow
Actually, deepcopy is very slow. But we can use json, ujson, or cPickle.we can use json/cPickle to dump an object, and load it later.This is my test:
Total time: 3.46068 sFile: test_deepcopy.pyFunction: test at line 15Line # Hits Time Per Hit % Time Line Contents==============================================================15 @profile16 def test():17 100 957585 9575.9 27.7 b = deepcopy(a)18 100 862 8.6 0.0 c = copy(a)19 100 42295 422.9 1.2 d = ujson.loads(ujson.dumps(a))20 100 85040 850.4 2.5 e = json.loads(json.dumps(a))21 100 2323465 23234.7 67.1 f = pickle.loads(pickle.dumps(a, -1))22 100 51434 514.3 1.5 g = cPickle.loads(cPickle.dumps(a, -1))
as what we can see, json/ujson/cPickle is faster than deepcopy, but pickle...
If you create your own class to hold these objects you can create your own methods that work with copy and deep copy. http://www.rafekettler.com/magicmethods.html#copying (Broken Link)
New Link for a github repository https://github.com/RafeKettler/magicmethods
class MyClass(): def __copy__(self): copy_object = MyClass() return copy_object def __deepcopy__(self, memodict={}): copy_object = MyClass() copy_object.value = self.value return copy_objectif __name__ == "__main__": my_inst = MyClass() print(copy.deepcopy(my_inst))
Here is a similar description from the previous broken link.
Copying
Sometimes, particularly when dealing with mutable objects, you want to be able to copy an object and make changes without affecting what you copied from. This is where Python's copy comes into play. However (fortunately), Python modules are not sentient, so we don't have to worry about a Linux-based robot uprising, but we do have to tell Python how to efficiently copy things.
__copy__(self)
Defines behavior for copy.copy() for instances of your class. copy.copy() returns a shallow copy of your object -- this means that, while the instance itself is a new instance, all of its data is referenced -- i.e., the object itself is copied, but its data is still referenced (and hence changes to data in a shallow copy may cause changes in the original).
__deepcopy__(self, memodict={})
Defines behavior for copy.deepcopy() for instances of your class. copy.deepcopy() returns a deep copy of your object -- the object and its data are both copied. memodict is a cache of previously copied objects -- this optimizes copying and prevents infinite recursion when copying recursive data structures. When you want to deep copy an individual attribute, call copy.deepcopy() on that attribute with memodict as the first argument.What are some use cases for these magic methods? As always, in any case where you need more fine-grained control than what the default behavior gives you. For instance, if you are attempting to copy an object that stores a cache as a dictionary (which might be large), it might not make sense to copy the cache as well -- if the cache can be shared in memory between instances, then it should be.
I've made a fast experiment comparing both deepcopy/json/ujson for several cases and my results contradicts @cherish's ones on certain cases, posting the little experiment here:
import ujsonimport timeitimport jsonimport randomimport stringimport copyimport ujsonimport sysdef random_string(N): return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))def random_json(width=5, height=5, levels=1): dct = {} lst = [random_string(4) for i in range(width)] lst2 = [random.randint(0, 10000) for i in range(width)] lst3 = [bool(random.randint(0, 1)) for i in range(width)] for j in range(height): dct[str(j)] = lst dct[str(width+j)] = lst2 dct[str(2*width+j)] = lst3 for i in range(levels): new_dct = {} for j in range(height): new_dct[str(j)] = dct dct = json.loads(json.dumps(new_dct)) return new_dctif __name__ == "__main__": print(sys.version) levels = 3 for i in range(15): dataset = random_json(i, i, levels) print("Comparing deepcopy/ujson/json using random dataset({},{},{}), length {}".format(i,i,levels, len(json.dumps(dataset)))) print(timeit.timeit('copy.deepcopy(dataset)', setup='from __main__ import copy, dataset', number=10)) print(timeit.timeit('ujson.loads(ujson.dumps(dataset))', setup='from __main__ import ujson, dataset', number=10)) print(timeit.timeit('json.loads(json.dumps(dataset))', setup='from __main__ import json, dataset', number=10)) print()
And the results would be:
3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)]Comparing deepcopy/ujson/json using random dataset(0,0,3), length 22.6842977659931844e-050.000120398649798223717.776568527950847e-05Comparing deepcopy/ujson/json using random dataset(1,1,3), length 630.00027316677265695343.552747043226263e-050.00012987264191349377Comparing deepcopy/ujson/json using random dataset(2,2,3), length 11060.00118582801309463620.000349748208922053250.0007093651596308467Comparing deepcopy/ujson/json using random dataset(3,3,3), length 68340.00422184773636722150.00211783198743432930.003378267688436718Comparing deepcopy/ujson/json using random dataset(4,4,3), length 265720.0113790540297822840.0062887570161819710.009920059244030693Comparing deepcopy/ujson/json using random dataset(5,5,3), length 792100.0288794912150434350.0279064332748709120.029595961868760734Comparing deepcopy/ujson/json using random dataset(6,6,3), length 1836780.0471429795152552840.046821258533007590.06791747047568517Comparing deepcopy/ujson/json using random dataset(7,7,3), length 3955280.082392151429131980.098713471345713510.15347433002098887Comparing deepcopy/ujson/json using random dataset(8,8,3), length 7649200.13519544648358960.194488426137007340.3020533693660834Comparing deepcopy/ujson/json using random dataset(9,9,3), length 13565700.245602587347246710.440749061186594070.5705849913806413Comparing deepcopy/ujson/json using random dataset(10,10,3), length 22877700.32378157553278350.611040516711530.8698565598118777Comparing deepcopy/ujson/json using random dataset(11,11,3), length 35987500.49582848284674520.94722236367418771.2514314609961668Comparing deepcopy/ujson/json using random dataset(12,12,3), length 56364140.62614482339097141.40667229579698021.8636325417418167Comparing deepcopy/ujson/json using random dataset(13,13,3), length 82208000.83965820994445472.0616756886704092.755659427352441Comparing deepcopy/ujson/json using random dataset(14,14,3), length 120182901.09519269902587622.967030507438864.088875914783021
Conclusion from this little experiment is:
- When dictionaries are small ones
time(ujson)<time(json)<time(deepcopy)
- When dictionaries are big ones
time(deepcopy)<time(ujson)<time(json)
So it depends the number of copies you're making per second and which type of dictionary you're dealing with, you'll prefer switching between deepcopy or ujson.