Parsing of table from .docx file [closed]
You can use the snippet below to parse your document into a list where each row is a dictionary mapping the table header value to the column value.
from docx.api import Document# Load the first table from your document. In your example file,# there is only one table, so I just grab the first one.document = Document('Books.docx')table = document.tables[0]# Data will be a list of rows represented as dictionaries# containing each row's data.data = []keys = Nonefor i, row in enumerate(table.rows): text = (cell.text for cell in row.cells) # Establish the mapping based on the first row # headers; these will become the keys of our dictionary if i == 0: keys = tuple(text) continue # Construct a dictionary for this row, mapping # keys to values for this row row_data = dict(zip(keys, text)) data.append(row_data)
This will give you:
data = [ {u'Pub.': u'Penguin Books', u'Auther': u'Edward de BONO', u'Sr. No.': u'1', u'Name of Book': u'Six Thinking Hats' }, ...]
If you'd just want a tuple for each row, you should instead of creating a dictionary just set row_data
to the tuple value of text
, so in the loop instead of constructing the dict
, do:
# Construct a tuple for this rowrow_data = tuple(text)data.append(row_data)
Now, data
would hold something like this instead:
data = [ (u'1', u'Six Thinking Hats', u'Edward de BONO', u'Penguin Books' ), ...]
Then you can skip constructing keys
, obviously (but still skip the first row!).