Converting a PDF file to Base64 to index into Elasticsearch

The encoding snippet is incorrect it is opening the pdf file in "text" mode.

Depending on the file size you could just open the file in binary mode and use the encode string methodExample:

def pdf_encode(pdf_filename):    return open(pdf_filename,"rb").read().encode("base64");

or if the file size is large you could have to break the encoding into chunks did not look into if there is module to do so but it could be as simple as the below example Code:

 def chunk_24_read(pdf_filename) :    with open(pdf_filename,"rb") as f:        byte = f.read(3)        while(byte) :            yield  byte            byte = f.read(3)def pdf_encode(pdf_filename):    encoded = ""    length = 0    for data in chunk_24_read(pdf_filename):        for char in base64.b64encode(data) :            if(length  and  length % 76 == 0):               encoded += "\n"               length = 0            encoded += char              length += 1    return encoded

CodeHunter

Converting a PDF file to Base64 to index into Elasticsearch

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last