Convert PDF to DOC (Python/Bash) Convert PDF to DOC (Python/Bash) python python

Convert PDF to DOC (Python/Bash)


If you have LibreOffice installed

lowriter --invisible --convert-to doc '/your/file.pdf'

If you want to use Python for this:

import osimport subprocessfor top, dirs, files in os.walk('/my/pdf/folder'):    for filename in files:        if filename.endswith('.pdf'):            abspath = os.path.join(top, filename)            subprocess.call('lowriter --invisible --convert-to doc "{}"'                            .format(abspath), shell=True)


This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects.

  1. PyPDF2
  2. PDFMiner

However, you are most definitely going to lose presentational aspects in the conversion.


If you want to convert PDF -> MS Word type file like docx, I came across this.

Ahsin Shabbir wrote:

import globimport win32com.clientimport osword = win32com.client.Dispatch("Word.Application")word.visible = 0pdfs_path = "" # folder where the .pdf files are storedfor i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):    print(doc)    filename = doc.split('\\')[-1]    in_file = os.path.abspath(doc)    print(in_file)    wb = word.Documents.Open(in_file)    out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))    print("outfile\n",out_file)    wb.SaveAs2(out_file, FileFormat=16) # file format for docx    print("success...")    wb.Close()word.Quit()

This worked like a charm for me, converted 500 pages PDF with formatting and images.