Convert PDF to DOC (Python/Bash)
If you have LibreOffice installed
lowriter --invisible --convert-to doc '/your/file.pdf'
If you want to use Python for this:
import osimport subprocessfor top, dirs, files in os.walk('/my/pdf/folder'): for filename in files: if filename.endswith('.pdf'): abspath = os.path.join(top, filename) subprocess.call('lowriter --invisible --convert-to doc "{}"' .format(abspath), shell=True)
If you want to convert PDF -> MS Word type file like docx, I came across this.
Ahsin Shabbir wrote:
import globimport win32com.clientimport osword = win32com.client.Dispatch("Word.Application")word.visible = 0pdfs_path = "" # folder where the .pdf files are storedfor i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")): print(doc) filename = doc.split('\\')[-1] in_file = os.path.abspath(doc) print(in_file) wb = word.Documents.Open(in_file) out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i)) print("outfile\n",out_file) wb.SaveAs2(out_file, FileFormat=16) # file format for docx print("success...") wb.Close()word.Quit()
This worked like a charm for me, converted 500 pages PDF with formatting and images.