What is the fastest way to generate image thumbnails in Python? What is the fastest way to generate image thumbnails in Python? python python

What is the fastest way to generate image thumbnails in Python?


I fancied some fun so I did some benchmarking on the various methods suggested above and a few ideas of my own.

I collected together 1000 high resolution 12MP iPhone 6S images, each 4032x3024 pixels and use an 8-core iMac.

Here are the techniques and results - each in its own section.


Method 1 - Sequential ImageMagick

This is simplistic, unoptimised code. Each image is read and a thumbnail is produced. Then it is read again and a different sized thumbnail is produced.

#!/bin/bashstart=$SECONDS# Loop over all filesfor f in image*.jpg; do   # Loop over all sizes   for s in 1600 720 120; do      echo Reducing $f to ${s}x${s}      convert "$f" -resize ${s}x${s} t-$f-$s.jpg   donedoneecho Time: $((SECONDS-start))

Result: 170 seconds


Method 2 - Sequential ImageMagick with single load and successive resizing

This is still sequential but slightly smarter. Each image is only read one time and the loaded image is then resized down three times and saved at three resolutions. The improvement is that each image is read just once, not 3 times.

#!/bin/bashstart=$SECONDS# Loop over all filesN=1for f in image*.jpg; do   echo Resizing $f   # Load once and successively scale down   convert "$f"                              \      -resize 1600x1600 -write t-$N-1600.jpg \      -resize 720x720   -write t-$N-720.jpg  \      -resize 120x120          t-$N-120.jpg   ((N=N+1))doneecho Time: $((SECONDS-start))

Result: 76 seconds


Method 3 - GNU Parallel + ImageMagick

This builds on the previous method, by using GNU Parallel to process N images in parallel, where N is the number of CPU cores on your machine.

#!/bin/bashstart=$SECONDSdoit() {   file=$1   index=$2   convert "$file"                               \      -resize 1600x1600 -write t-$index-1600.jpg \      -resize 720x720   -write t-$index-720.jpg  \      -resize 120x120          t-$index-120.jpg}# Export doit() to subshells for GNU Parallel   export -f doit# Use GNU Parallel to do them all in parallelparallel doit {} {#} ::: *.jpgecho Time: $((SECONDS-start))

Result: 18 seconds


Method 4 - GNU Parallel + vips

This is the same as the previous method, but it uses vips at the command-line instead of ImageMagick.

#!/bin/bashstart=$SECONDSdoit() {   file=$1   index=$2   r0=t-$index-1600.jpg   r1=t-$index-720.jpg   r2=t-$index-120.jpg   vipsthumbnail "$file"  -s 1600 -o "$r0"   vipsthumbnail "$r0"    -s 720  -o "$r1"   vipsthumbnail "$r1"    -s 120  -o "$r2"}# Export doit() to subshells for GNU Parallel   export -f doit# Use GNU Parallel to do them all in parallelparallel doit {} {#} ::: *.jpgecho Time: $((SECONDS-start))

Result: 8 seconds


Method 5 - Sequential PIL

This is intended to correspond to Jakob's answer.

#!/usr/local/bin/python3import globfrom PIL import Imagesizes = [(120,120), (720,720), (1600,1600)]files = glob.glob('image*.jpg')N=0for image in files:    for size in sizes:      im=Image.open(image)      im.thumbnail(size)      im.save("t-%d-%s.jpg" % (N,size[0]))    N=N+1

Result: 38 seconds


Method 6 - Sequential PIL with single load & successive resize

This is intended as an improvement to Jakob's answer, wherein the image is loaded just once and then resized down three times instead of re-loading each time to produce each new resolution.

#!/usr/local/bin/python3import globfrom PIL import Imagesizes = [(120,120), (720,720), (1600,1600)]files = glob.glob('image*.jpg')N=0for image in files:   # Load just once, then successively scale down   im=Image.open(image)   im.thumbnail((1600,1600))   im.save("t-%d-1600.jpg" % (N))   im.thumbnail((720,720))   im.save("t-%d-720.jpg"  % (N))   im.thumbnail((120,120))   im.save("t-%d-120.jpg"  % (N))   N=N+1

Result: 27 seconds


Method 7 - Parallel PIL

This is intended to correspond to Audionautics' answer, insofar as it uses Python's multiprocessing. It also obviates the need to re-load the image for each thumbnail size.

#!/usr/local/bin/python3import globfrom PIL import Imagefrom multiprocessing import Pool def thumbnail(params):     filename, N = params    try:        # Load just once, then successively scale down        im=Image.open(filename)        im.thumbnail((1600,1600))        im.save("t-%d-1600.jpg" % (N))        im.thumbnail((720,720))        im.save("t-%d-720.jpg"  % (N))        im.thumbnail((120,120))        im.save("t-%d-120.jpg"  % (N))        return 'OK'    except Exception as e:         return e files = glob.glob('image*.jpg')pool = Pool(8)results = pool.map(thumbnail, zip(files,range((len(files)))))

Result: 6 seconds


Method 8 - Parallel OpenCV

This is intended to be an improvement on bcattle's answer, insofar as it uses OpenCV but it also obviates the need to re-load the image to generate each new resolution output.

#!/usr/local/bin/python3import cv2import globfrom multiprocessing import Pool def thumbnail(params):     filename, N = params    try:        # Load just once, then successively scale down        im = cv2.imread(filename)        im = cv2.resize(im, (1600,1600))        cv2.imwrite("t-%d-1600.jpg" % N, im)         im = cv2.resize(im, (720,720))        cv2.imwrite("t-%d-720.jpg" % N, im)         im = cv2.resize(im, (120,120))        cv2.imwrite("t-%d-120.jpg" % N, im)         return 'OK'    except Exception as e:         return e files = glob.glob('image*.jpg')pool = Pool(8)results = pool.map(thumbnail, zip(files,range((len(files)))))

Result: 5 seconds


You want PIL it does this with ease

from PIL import Imagesizes = [(120,120), (720,720), (1600,1600)]files = ['a.jpg','b.jpg','c.jpg']for image in files:    for size in sizes:        im = Image.open(image)        im.thumbnail(size)        im.save("thumbnail_%s_%s" % (image, "_".join(size)))

If you desperately need speed. Then thread it, multiprocess it or get another language.


A little late to the question (only a year!), but I'll piggy backing on the "multiprocess it" part of @JakobBowyer's answer.

This is a good example of an embarrassingly parallel problem, as the main bit of code doesn't mutate any state external to itself. It simply reads an input, performs its computation and saves the result.

Python is actually pretty good at these kinds of problems thanks to the map function provided by multiprocessing.Pool.

from PIL import Imagefrom multiprocessing import Pool def thumbnail(image_details):     size, filename = image_details    try:        im = Image.open(filename)        im.thumbnail(size)        im.save("thumbnail_%s" % filename)        return 'OK'    except Exception as e:         return e sizes = [(120,120), (720,720), (1600,1600)]files = ['a.jpg','b.jpg','c.jpg']pool = Pool(number_of_cores_to_use)results = pool.map(thumbnail, zip(sizes, files))

The core of the code is exactly the same as @JakobBowyer, but instead of running it in a loop in a single thread, we wrapped it in a function spread it out across multiple cores via the multiprocessing map function.