Memory not freed after Python's multiprocessing Pool is finished
The generation threshold may be getting in the way, take a look at gc.get_threshold()
try including
gc.disable()
Indeed, there is a leak problem, but it does not appear for some magical parameters. I could not understand it, but we can reduce the leak by passing a list to pool.map instead of a ndarray.images_converted = pool.map(rgb2hsv, [i for i in imgs])
This consistently reduces memory leak in my tests.
OLD ANSWER:
It does not seems there is a problem in pool. You should not expect "del pool" on line 31 to free your memory, since what is occupying it are the variables "imgs" and "images_converted". These are in the scope of the function "parallel_convert_all_to_hsv" and not in the scope of "rgb2hsv", so "del pool" is not related to them.
The memory is corrected released after deleting "images" and "images_converted" in lines 56 and 59.
As multithreading.Pool
is not able to free up memory of around 1* Gb, I have also tried replacing it with ThreadPool
but no better. I am still wondering about this memory leak problem inside Pools.
This may not be the best solution but can be a work-around solution.
By not using ThreadPool
or ProcessPool
, I am creating Threads or Processes manually and assigning each with the image to convert to HSV. Well, I have commented the line p = multiprocessing.Process(target=do_hsv, args=(imgs[j], shared_list))
because it will spawn new process for each image conversion which I think will be overkill and much expensive than Threads. Obviously, creating threads manually will take some more time (9 sec without memory leak) than ThreadPool
(4 sec but with memory leak) but as you can see it almost remains calm on memory.
Here is my code:
import multiprocessingimport osimport threadingimport timefrom memory_profiler import profileimport numpy as npfrom skimage.color import rgb2hsvdef do_hsv(img, shared_list): shared_list.append(rgb2hsv(img)) # print("Converted by process {} having parent process {}".format(os.getpid(), os.getppid()))@profiledef parallel_convert_all_to_hsv(imgs, shared_list): cores = os.cpu_count() starttime = time.time() for i in range(0, len(imgs), cores): # print("i :", i) jobs = []; pipes = [] end = i + cores if (i + cores) <= len(imgs) else i + len(imgs[i : -1]) + 1 # print("end :", end) for j in range(i, end): # print("j :", j) # p = multiprocessing.Process(target=do_hsv, args=(imgs[j], shared_list)) p = threading.Thread(target= do_hsv, args=(imgs[j], shared_list)) jobs.append(p) for p in jobs: p.start() for proc in jobs: proc.join() print("Took {} seconds to complete ".format(starttime - time.time())) return 1@profiledef doit(): print("create random images") max_images = 700 images = np.random.rand(max_images, 300, 300,3) # images = [x for x in range(0, 10000)] manager = multiprocessing.Manager() shared_list = manager.list() parallel_convert_all_to_hsv(images, shared_list) del images del shared_list print()doit()
Here is the Output:
create random imagesTook -9.085552453994751 seconds to complete Filename: MemoryNotFreed.pyLine # Mem usage Increment Line Contents================================================ 15 1549.1 MiB 1549.1 MiB @profile 16 def parallel_convert_all_to_hsv(imgs, shared_list): 17 18 1549.1 MiB 0.0 MiB cores = os.cpu_count() 19 20 1549.1 MiB 0.0 MiB starttime = time.time() 21 22 1566.4 MiB 0.0 MiB for i in range(0, len(imgs), cores): 23 24 # print("i :", i) 25 26 1566.4 MiB 0.0 MiB jobs = []; pipes = [] 27 28 1566.4 MiB 0.0 MiB end = i + cores if (i + cores) <= len(imgs) else i + len(imgs[i : -1]) + 1 29 30 # print("end :", end) 31 32 1566.4 MiB 0.0 MiB for j in range(i, end): 33 # print("j :", j) 34 35 # p = multiprocessing.Process(target=do_hsv, args=(imgs[j], shared_list)) 36 1566.4 MiB 0.0 MiB p = threading.Thread(target= do_hsv, args=(imgs[j], shared_list)) 37 38 1566.4 MiB 0.0 MiB jobs.append(p) 39 40 1566.4 MiB 0.8 MiB for p in jobs: p.start() 41 42 1574.9 MiB 1.0 MiB for proc in jobs: 43 1574.9 MiB 13.5 MiB proc.join() 44 45 1563.5 MiB 0.0 MiB print("Took {} seconds to complete ".format(starttime - time.time())) 46 1563.5 MiB 0.0 MiB return 1Filename: MemoryNotFreed.pyLine # Mem usage Increment Line Contents================================================ 48 106.6 MiB 106.6 MiB @profile 49 def doit(): 50 51 106.6 MiB 0.0 MiB print("create random images") 52 53 106.6 MiB 0.0 MiB max_images = 700 54 55 1548.7 MiB 1442.1 MiB images = np.random.rand(max_images, 300, 300,3) 56 57 # images = [x for x in range(0, 10000)] 58 1549.0 MiB 0.3 MiB manager = multiprocessing.Manager() 59 1549.1 MiB 0.0 MiB shared_list = manager.list() 60 61 1563.5 MiB 14.5 MiB parallel_convert_all_to_hsv(images, shared_list) 62 63 121.6 MiB 0.0 MiB del images 64 65 121.6 MiB 0.0 MiB del shared_list 66 67 121.6 MiB 0.0 MiB print()