Extract a page from a pdf as a jpeg
The pdf2image library can be used.
You can install it simply using,
pip install pdf2image
Once installed you can use following code to get images.
from pdf2image import convert_from_pathpages = convert_from_path('pdf_file', 500)
Saving pages in jpeg format
for page in pages: page.save('out.jpg', 'JPEG')
Edit: the Github repo pdf2image also mentions that it uses pdftoppm
and that it requires other installations:
pdftoppm is the piece of software that does the actual magic. It is distributed as part of a greater package called poppler. Windows users will have to install poppler for Windows. Mac users will have to install poppler for Mac. Linux users will have pdftoppm pre-installed with the distro (Tested on Ubuntu and Archlinux) if it's not, run
sudo apt install poppler-utils
.
You can install the latest version under Windows using anaconda by doing:
conda install -c conda-forge poppler
note: Windows versions upto 0.67 are available at http://blog.alivate.com.au/poppler-windows/ but note that 0.68 was released in Aug 2018 so you'll not be getting the latest features or bug fixes.
I found this simple solution, PyMuPDF, output to png file. Note the library is imported as "fitz", a historical name for the rendering engine it uses.
import fitzpdffile = "infile.pdf"doc = fitz.open(pdffile)page = doc.loadPage(0) # number of pagepix = page.getPixmap()output = "outfile.png"pix.writePNG(output)
The Python library pdf2image
(used in the other answer) in fact doesn't do much more than just launching pdttoppm
with subprocess.Popen
, so here is a short version doing it directly:
PDFTOPPMPATH = r"D:\Documents\software\____PORTABLE\poppler-0.51\bin\pdftoppm.exe"PDFFILE = "SKM_28718052212190.pdf"import subprocesssubprocess.Popen('"%s" -png "%s" out' % (PDFTOPPMPATH, PDFFILE))
Here is the Windows installation link for pdftoppm
(contained in a package named poppler): http://blog.alivate.com.au/poppler-windows/