How to extract text from an existing docx file using python-docx
you can try this
import docxdef getText(filename): doc = docx.Document(filename) fullText = [] for para in doc.paragraphs: fullText.append(para.text) return '\n'.join(fullText)
You can use python-docx2txt which is adapted from python-docx but can also extract text from links, headers and footers. It can also extract images.
you can try this also
from docx import Documentdocument = Document('demo.docx')for para in document.paragraphs: print(para.text)