How to unit test a Python function that draws PDF graphics?

python unit-testing pdf-generation imagemagick cairo

(See also update below!)

I'm doing the same thing using a shell script on Linux that wraps

ImageMagick's compare command
the pdftk utility
Ghostscript (optionally)

(It would be rather easy to port this to a .bat Batch file for DOS/Windows.)

I have a few reference PDFs created by my application which are "known good". Newly generated PDFs after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is saved as a new PDF. In this PDF, all unchanged pixels are painted in white, while all differing pixels are painted in red.

Here are the building blocks:

pdftk

Use this command to split multipage PDF files into multiple singlepage PDFs:

pdftk  reference.pdf  burst  output  somewhere/reference_page_%03d.pdfpdftk  comparison.pdf burst  output  somewhere/comparison_page_%03d.pdf

compare

Use this command to create a "diff" PDF page for each of the pages:

compare \       -verbose \       -debug coder -log "%u %m:%l %e" \        somewhere/reference_page_001.pdf \        somewhere/comparison_page_001.pdf \       -compose src \        somewhereelse/reference_diff_page_001.pdf

Ghostscript

Because of automatically inserted meta data (such as the current date+time), PDF output is not working well for MD5hash-based file comparisons.

If you want to automatically discover all cases which consist of purely white pages, you could also convert to a meta-data free bitmap format using the bmp256 output device. You can do that for the original PDFs (reference and comparison), or for the diff-PDF pages:

 gs \   -o reference_diff_page_001.bmp \   -r72 \   -g595x842 \   -sDEVICE=bmp256 \    reference_diff_page_001.pdf md5sum reference_diff_page_001.bmp

If the MD5sum is what you expect for an all-white page of 595x842 PostScript points, then your unit test passed.

Update:

I don't know why I didn't previously think of generating a histogram output from the ImageMagick compare...

The following is a command pipeline chaining 2 different commands:

the first one is the same as the above compare which generates the 'white pixels are equal, red pixels are differences'-format, only it outputs the ImageMagick internal miff format. It doesn't write to a file, but to stdout.
the second one uses convert to read stdin, generate a histogram and output the result in text form. There will be two lines:
- one indicating the number of white pixels
- the other one indicating the number of red pixels.

Here it goes:

compare \   reference.pdf \   current.pdf \  -compose src \   miff:- \| \convert \   - \  -define histogram:unique-colors=true \  -format %c \   histogram:info:-

Sample output:

 56934: (61937,    0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)

(Sample output was generated by using these reference.pdf and current.pdf files.)

I think this type of output is really well suited for automatic unit testing. If you evaluate the two numbers, you can easily compute the "red pixel" percentage and you could even decide to return PASSED or FAILED based on a certain threshold (if you don't necessarily need "zero red" for some reason).

python unit-testing pdf-generation imagemagick cairo

You could capture the PDF as a bitmap (or at least a losslessly-compressed) image, and then compare the image generated by each test with a reference image of what it's supposed to look like. Any differences would be flagged as an error for the test.

python unit-testing pdf-generation imagemagick cairo

The first idea that pops in my head is to use a diff utility. These are generally used to compare texts of documents but they might also compare the layout of the PDF. Using it, you can compare the expected output with the output supplied.

The first result google gives me is this. Altough it is commercial, there might be other free/open source alternatives.

CodeHunter

How to unit test a Python function that draws PDF graphics?

(See also update below!)

pdftk

compare

Ghostscript

Update:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last