Parsing of table from .docx file [closed]

python xml parsing docx python-docx

You can use the snippet below to parse your document into a list where each row is a dictionary mapping the table header value to the column value.

from docx.api import Document# Load the first table from your document. In your example file,# there is only one table, so I just grab the first one.document = Document('Books.docx')table = document.tables[0]# Data will be a list of rows represented as dictionaries# containing each row's data.data = []keys = Nonefor i, row in enumerate(table.rows):    text = (cell.text for cell in row.cells)    # Establish the mapping based on the first row    # headers; these will become the keys of our dictionary    if i == 0:        keys = tuple(text)        continue    # Construct a dictionary for this row, mapping    # keys to values for this row    row_data = dict(zip(keys, text))    data.append(row_data)

This will give you:

data = [  {u'Pub.': u'Penguin Books',   u'Auther': u'Edward de BONO',   u'Sr. No.': u'1',   u'Name of Book': u'Six Thinking Hats'  },  ...]

If you'd just want a tuple for each row, you should instead of creating a dictionary just set row_data to the tuple value of text, so in the loop instead of constructing the dict, do:

# Construct a tuple for this rowrow_data = tuple(text)data.append(row_data)

Now, data would hold something like this instead:

data = [  (u'1',   u'Six Thinking Hats',   u'Edward de BONO',   u'Penguin Books'  ), ...]

Then you can skip constructing keys, obviously (but still skip the first row!).

CodeHunter

Parsing of table from .docx file [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last