How can I read pdf in python? [duplicate] How can I read pdf in python? [duplicate] python python

How can I read pdf in python? [duplicate]


You can USE PyPDF2 package

#install pyDF2pip install PyPDF2# importing all the required modulesimport PyPDF2# creating an object file = open('example.pdf', 'rb')# creating a pdf reader objectfileReader = PyPDF2.PdfFileReader(file)# print the number of pages in pdf fileprint(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/


You can use textract module in python

Textract

for install

pip install textract

for read pdf

import textracttext = textract.process('path/to/pdf/file', method='pdfminer')

For detail Textract