Reading Excel File using Python, how do I get the values of a specific column with indicated column name? Reading Excel File using Python, how do I get the values of a specific column with indicated column name? python python

Reading Excel File using Python, how do I get the values of a specific column with indicated column name?


A somewhat late answer, but with pandas, it is possible to get directly a column of an excel file:

import pandasdf = pandas.read_excel('sample.xls')#print the column namesprint df.columns#get the values for a given columnvalues = df['Arm_id'].values#get a data frame with selected columnsFORMAT = ['Arm_id', 'DSPName', 'Pincode']df_selected = df[FORMAT]

Make sure you have installed xlrd and pandas:

pip install pandas xlrd


This is one approach:

from xlrd import open_workbookclass Arm(object):    def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl):        self.id = id        self.dsp_name = dsp_name        self.dsp_code = dsp_code        self.hub_code = hub_code        self.pin_code = pin_code        self.pptl = pptl    def __str__(self):        return("Arm object:\n"               "  Arm_id = {0}\n"               "  DSPName = {1}\n"               "  DSPCode = {2}\n"               "  HubCode = {3}\n"               "  PinCode = {4} \n"               "  PPTL = {5}"               .format(self.id, self.dsp_name, self.dsp_code,                       self.hub_code, self.pin_code, self.pptl))wb = open_workbook('sample.xls')for sheet in wb.sheets():    number_of_rows = sheet.nrows    number_of_columns = sheet.ncols    items = []    rows = []    for row in range(1, number_of_rows):        values = []        for col in range(number_of_columns):            value  = (sheet.cell(row,col).value)            try:                value = str(int(value))            except ValueError:                pass            finally:                values.append(value)        item = Arm(*values)        items.append(item)for item in items:    print item    print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))    print

You don't have to use a custom class, you can simply take a dict(). If you use a class however, you can access all values via dot-notation, as you see above.

Here is the output of the script above:

Arm object:  Arm_id = 1  DSPName = JaVAS  DSPCode = 1  HubCode = AGR  PinCode = 282001   PPTL = 1Accessing one single value (eg. DSPName): JaVASArm object:  Arm_id = 2  DSPName = JaVAS  DSPCode = 1  HubCode = AGR  PinCode = 282002   PPTL = 3Accessing one single value (eg. DSPName): JaVASArm object:  Arm_id = 3  DSPName = JaVAS  DSPCode = 1  HubCode = AGR  PinCode = 282003   PPTL = 5Accessing one single value (eg. DSPName): JaVAS


So the key parts are to grab the header ( col_names = s.row(0) ) and when iterating through the rows, to skip the first row which isn't needed for row in range(1, s.nrows) - done by using range from 1 onwards (not the implicit 0). You then use zip to step through the rows holding 'name' as the header of the column.

from xlrd import open_workbookwb = open_workbook('Book2.xls')values = []for s in wb.sheets():    #print 'Sheet:',s.name    for row in range(1, s.nrows):        col_names = s.row(0)        col_value = []        for name, col in zip(col_names, range(s.ncols)):            value  = (s.cell(row,col).value)            try : value = str(int(value))            except : pass            col_value.append((name.value, value))        values.append(col_value)print values