How can I search a word in a Word 2007 .docx file?
After reading your post above, I made a 100% native Python docx module to solve this specific problem.
# Import the modulefrom docx import *# Open the .docx filedocument = opendocx('A document.docx')# Search returns true if found search(document,'your search string')
The docx module is at https://python-docx.readthedocs.org/en/latest/
In this example, "Course Outline.docx" is a Word 2007 document, which does contain the word "Windows", and does not contain the phrase "random other string".
>>> import zipfile>>> z = zipfile.ZipFile("Course Outline.docx")>>> "Windows" in z.read("word/document.xml")True>>> "random other string" in z.read("word/document.xml")False>>> z.close()
Basically, you just open the docx file (which is a zip archive) using zipfile, and find the content in the 'document.xml' file in the 'word' folder. If you wanted to be more sophisticated, you could then parse the XML, but if you're just looking for a phrase (which you know won't be a tag), then you can just look in the XML for the string.