How to convert webpage into PDF by using Python How to convert webpage into PDF by using Python python python

How to convert webpage into PDF by using Python


You also can use pdfkit:

Usage

import pdfkitpdfkit.from_url('http://google.com', 'out.pdf')

Install

MacOS: brew install Caskroom/cask/wkhtmltopdf

Debian/Ubuntu: apt-get install wkhtmltopdf

Windows: choco install wkhtmltopdf

See official documentation for MacOS/Ubuntu/other OS: https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf


WeasyPrint

pip install weasyprint  # No longer supports Python 2.x.python>>> import weasyprint>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()>>> len(pdf)92059>>> open('google.pdf', 'wb').write(pdf)


thanks to below posts, and I am able to add on the webpage link address to be printed and present time on the PDF generated, no matter how many pages it has.

Add text to Existing PDF using Python

https://github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py

To share the script as below:

import timefrom pyPdf import PdfFileWriter, PdfFileReaderimport StringIOfrom reportlab.pdfgen import canvasfrom reportlab.lib.pagesizes import letterfrom xhtml2pdf import pisaimport sys from PyQt4.QtCore import *from PyQt4.QtGui import * from PyQt4.QtWebKit import * url = 'http://www.yahoo.com'tem_pdf = "c:\\tem_pdf.pdf"final_file = "c:\\younameit.pdf"app = QApplication(sys.argv)web = QWebView()#Read the URL givenweb.load(QUrl(url))printer = QPrinter()#setting formatprinter.setPageSize(QPrinter.A4)printer.setOrientation(QPrinter.Landscape)printer.setOutputFormat(QPrinter.PdfFormat)#export file as c:\tem_pdf.pdfprinter.setOutputFileName(tem_pdf)def convertIt():    web.print_(printer)    QApplication.exit()QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)app.exec_()sys.exit# Below is to add on the weblink as text and present date&time on PDF generatedoutputPDF = PdfFileWriter()packet = StringIO.StringIO()# create a new PDF with Reportlabcan = canvas.Canvas(packet, pagesize=letter)can.setFont("Helvetica", 9)# Writting the new lineoknow = time.strftime("%a, %d %b %Y %H:%M")can.drawString(5, 2, url)can.drawString(605, 2, oknow)can.save()#move to the beginning of the StringIO bufferpacket.seek(0)new_pdf = PdfFileReader(packet)# read your existing PDFexisting_pdf = PdfFileReader(file(tem_pdf, "rb"))pages = existing_pdf.getNumPages()output = PdfFileWriter()# add the "watermark" (which is the new pdf) on the existing pagefor x in range(0,pages):    page = existing_pdf.getPage(x)    page.mergePage(new_pdf.getPage(0))    output.addPage(page)# finally, write "output" to a real fileoutputStream = file(final_file, "wb")output.write(outputStream)outputStream.close()print final_file, 'is ready.'