Cropping pages of a .pdf file
pyPdf does what I expect in this area. Using the following script:
#!/usr/bin/python#from pyPdf import PdfFileWriter, PdfFileReaderwith open("in.pdf", "rb") as in_f: input1 = PdfFileReader(in_f) output = PdfFileWriter() numPages = input1.getNumPages() print "document has %s pages." % numPages for i in range(numPages): page = input1.getPage(i) print page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y() page.trimBox.lowerLeft = (25, 25) page.trimBox.upperRight = (225, 225) page.cropBox.lowerLeft = (50, 50) page.cropBox.upperRight = (200, 200) output.addPage(page) with open("out.pdf", "wb") as out_f: output.write(out_f)
The resulting document has a trim box that is 200x200 points and starts at 25,25 points inside the media box.The crop box is 25 points inside the trim box.
Here is how my sample document looks in acrobat professional after processing with the above code:
This document will appear blank when loaded in acrobat reader.
Use this to get the dimension of pdf
from PyPDF2 import PdfFileWriter,PdfFileReader,PdfFileMergerpdf_file = PdfFileReader(open("/Users/user.name/Downloads/sample.pdf","rb"))page = pdf_file.getPage(0)print(page.cropBox.getLowerLeft())print(page.cropBox.getLowerRight())print(page.cropBox.getUpperLeft())print(page.cropBox.getUpperRight())
After this get page reference and then apply crop command
page.mediaBox.lowerRight = (lower_right_new_x_coordinate, lower_right_new_y_coordinate)page.mediaBox.lowerLeft = (lower_left_new_x_coordinate, lower_left_new_y_coordinate)page.mediaBox.upperRight = (upper_right_new_x_coordinate, upper_right_new_y_coordinate)page.mediaBox.upperLeft = (upper_left_new_x_coordinate, upper_left_new_y_coordinate)#for example :- my custom coordinates #page.mediaBox.lowerRight = (611, 500)#page.mediaBox.lowerLeft = (0, 500)#page.mediaBox.upperRight = (611, 700)#page.mediaBox.upperLeft = (0, 700)
How do I know the coordinates to crop?
Thanks for all answers above.
Step 1. Run the following code to get (x1, y1).
from PyPDF2 import PdfFileWriter, PdfFileReaderinput = PdfFileReader(open("test.pdf","rb"))page = input.getPage(0)print(page.cropBox.getUpperRight())
Step 2. View the pdf file in full screen mode.
Step 3. Capture the screen as an image file screen.jpg.
Step 4. Open screen.jpg by M$ paint or GIMP. These applications show the coordinate of the cursor.
Step 5. Remember the following coordinates, (x2, y2), (x3, y3), (x4, y4) and (x5, y5), where (x4, y4) and (x5, y5) determine the rectangle you want to crop.
Step 6. Get page.cropBox.upperLeft and page.cropBox.lowerRight by the following formulas. Here is a tool for calculating.
page.cropBox.upperLeft = (x1*(x4-x2)/(x3-x2),(1-y4/y3)*y1)page.cropBox.lowerRight = (x1*(x5-x2)/(x3-x2),(1-y5/y3)*y1)
Step 7. Run the following code to crop the pdf file.
from PyPDF2 import PdfFileWriter, PdfFileReaderoutput = PdfFileWriter() input = PdfFileReader(open('test.pdf', 'rb')) n = input.getNumPages()for i in range(n): page = input.getPage(i) page.cropBox.upperLeft = (100,200) page.cropBox.lowerRight = (300,400) output.addPage(page) outputStream = open('result.pdf','wb') output.write(outputStream) outputStream.close()