.doc to pdf using python .doc to pdf using python python python

.doc to pdf using python


A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sysimport osimport comtypes.clientwdFormatPDF = 17in_file = os.path.abspath(sys.argv[1])out_file = os.path.abspath(sys.argv[2])word = comtypes.client.CreateObject('Word.Application')doc = word.Documents.Open(in_file)doc.SaveAs(out_file, FileFormat=wdFormatPDF)doc.Close()word.Quit()

You could also use pywin32, which would be the same except for:

import win32com.client

and then:

word = win32com.client.Dispatch('Word.Application')


You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.

from docx2pdf import convertconvert("input.docx")convert("input.docx", "output.pdf")convert("my_docx_folder/")
pip install docx2pdfdocx2pdf input.docx output.pdf

Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf


I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:

(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )

(2). After doing (1), the program will work well sometimes but may fail often. The crash error "COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))" means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.

After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.

    import os    import comtypes.client    import time    wdFormatPDF = 17    # absolute path is needed    # be careful about the slash '\', use '\\' or '/' or raw string r"..."    in_file=r'absolute path of input docx file 1'    out_file=r'absolute path of output pdf file 1'    in_file2=r'absolute path of input docx file 2'    out_file2=r'absolute path of outputpdf file 2'    # print out filenames    print in_file    print out_file    print in_file2    print out_file2    # create COM object    word = comtypes.client.CreateObject('Word.Application')    # key point 1: make word visible before open a new document    word.Visible = True    # key point 2: wait for the COM Server to prepare well.    time.sleep(3)    # convert docx file 1 to pdf file 1    doc=word.Documents.Open(in_file) # open docx file 1    doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion    doc.Close() # close docx file 1    word.Visible = False    # convert docx file 2 to pdf file 2    doc = word.Documents.Open(in_file2) # open docx file 2    doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion    doc.Close() # close docx file 2       word.Quit() # close Word Application