convert html table to csv in python
Using the csv
module and selenium
selectors would probably be more convenient here:
import csvfrom selenium import webdriverdriver = webdriver.Firefox()driver.get("http://example.com/")table = driver.find_element_by_css_selector("#tableid")with open('eggs.csv', 'w', newline='') as csvfile: wr = csv.writer(csvfile) for row in table.find_elements_by_css_selector('tr'): wr.writerow([d.text for d in row.find_elements_by_css_selector('td')])
Without access to the table you're actually trying to scrape, I used this example:
<table><thead><tr> <td>Header1</td> <td>Header2</td> <td>Header3</td></tr></thead> <tr> <td>Row 11</td> <td>Row 12</td> <td>Row 13</td></tr><tr> <td>Row 21</td> <td>Row 22</td> <td>Row 23</td></tr><tr> <td>Row 31</td> <td>Row 32</td> <td>Row 33</td></tr></table>
and scraped it using:
from bs4 import BEautifulSoup as BScontent = #contents of that tablesoup = BS(content, 'html5lib')rows = [tr.findAll('td') for tr in soup.findAll('tr')]
This rows object is a list of lists:
[ [<td>Header1</td>, <td>Header2</td>, <td>Header3</td>], [<td>Row 11</td>, <td>Row 12</td>, <td>Row 13</td>], [<td>Row 21</td>, <td>Row 22</td>, <td>Row 23</td>], [<td>Row 31</td>, <td>Row 32</td>, <td>Row 33</td>]]
...and you can write it to a file:
for it in rows:with open('result.csv', 'a') as f: f.write(", ".join(str(e).replace('<td>','').replace('</td>','') for e in it) + '\n')
which looks like this:
Header1, Header2, Header3Row 11, Row 12, Row 13Row 21, Row 22, Row 23Row 31, Row 32, Row 33