Python BeautifulSoup, iterating through tags and attributes
You may locate either td
or th
by specifying a list of tags to look for. In order to get all element attributes, use .attrs
attribute:
rows = bs_table.find_all('tr')for row in rows: cells = row.find_all(['td', 'th']) for cell in cells: print(cell.name, cell.attrs)
Alternative looping (action is at the bottom):
html='''<table id="myBSTable"> <tr> <th>Column A1</th> <th>Column B1</th> <th>Column C1</th> <th>Column D1</th> <th>Column E1</th> </tr> <tr> <td data="First Column Data"></td> <td data="Second Column Data"></td> <td title="Title of the First Row">Value of Row 1</td> <td>Beautiful 1</td> <td>Soup 1</td> </tr> <tr> <td></td> <td data-g="Second Column Data"></td> <td title="Title of the Second Row">Value of Row 2</td> <td>Selenium 1</td> <td>Rocks 1</td> </tr> <tr> <td></td> <td></td> <td title="Title of the Third Row">Value of Row 3</td> <td>Pyhon 1</td> <td>Boulder 1</td> </tr> <tr> <th>Column A2</th> <th>Column B2</th> <th>Column C2</th> <th>Column D2</th> <th>Column E2</th> </tr> <tr> <td data="First Column Data"></td> <td data="Second Column Data"></td> <td title="Title of the First Row">Value of Row 1</td> <td>Beautiful 2</td> <td>Soup 2</td> </tr> <tr> <td></td> <td data-g="Second Column Data"></td> <td title="Title of the Second Row">Value of Row 2</td> <td>Selenium 2</td> <td>Rocks 2</td> </tr> <tr> <td></td> <td></td> <td title="Title of the Third Row">Value of Row 3 2</td> <td>Pyhon 2</td> <td>Boulder 2</td> </tr></table>'''Soup = BeautifulSoup(html)rows = Soup.findAll('tr')for tr in rows: for z in tr.children: if z.name =='td': do stuff1 if z.name == 'th': do stuff2