How can I search a word in a Word 2007 .docx file? How can I search a word in a Word 2007 .docx file? python python

How can I search a word in a Word 2007 .docx file?


After reading your post above, I made a 100% native Python docx module to solve this specific problem.

# Import the modulefrom docx import *# Open the .docx filedocument = opendocx('A document.docx')# Search returns true if found    search(document,'your search string')

The docx module is at https://python-docx.readthedocs.org/en/latest/


More exactly, a .docx document is a Zip archive in OpenXML format: you have first to uncompress it.
I downloaded a sample (Google: some search term filetype:docx) and after unzipping I found some folders. The word folder contains the document itself, in file document.xml.


In this example, "Course Outline.docx" is a Word 2007 document, which does contain the word "Windows", and does not contain the phrase "random other string".

>>> import zipfile>>> z = zipfile.ZipFile("Course Outline.docx")>>> "Windows" in z.read("word/document.xml")True>>> "random other string" in z.read("word/document.xml")False>>> z.close()

Basically, you just open the docx file (which is a zip archive) using zipfile, and find the content in the 'document.xml' file in the 'word' folder. If you wanted to be more sophisticated, you could then parse the XML, but if you're just looking for a phrase (which you know won't be a tag), then you can just look in the XML for the string.