convert html table to csv in python

python selenium pandas web-scraping beautifulsoup

Using the csv module and selenium selectors would probably be more convenient here:

import csvfrom selenium import webdriverdriver = webdriver.Firefox()driver.get("http://example.com/")table = driver.find_element_by_css_selector("#tableid")with open('eggs.csv', 'w', newline='') as csvfile:    wr = csv.writer(csvfile)    for row in table.find_elements_by_css_selector('tr'):        wr.writerow([d.text for d in row.find_elements_by_css_selector('td')])

python selenium pandas web-scraping beautifulsoup

Without access to the table you're actually trying to scrape, I used this example:

<table><thead><tr>    <td>Header1</td>    <td>Header2</td>    <td>Header3</td></tr></thead>  <tr>    <td>Row 11</td>    <td>Row 12</td>    <td>Row 13</td></tr><tr>    <td>Row 21</td>    <td>Row 22</td>    <td>Row 23</td></tr><tr>    <td>Row 31</td>    <td>Row 32</td>    <td>Row 33</td></tr></table>

and scraped it using:

from bs4 import BEautifulSoup as BScontent = #contents of that tablesoup = BS(content, 'html5lib')rows = [tr.findAll('td') for tr in soup.findAll('tr')]

This rows object is a list of lists:

[    [<td>Header1</td>, <td>Header2</td>, <td>Header3</td>],    [<td>Row 11</td>, <td>Row 12</td>, <td>Row 13</td>],    [<td>Row 21</td>, <td>Row 22</td>, <td>Row 23</td>],    [<td>Row 31</td>, <td>Row 32</td>, <td>Row 33</td>]]

...and you can write it to a file:

for it in rows:with open('result.csv', 'a') as f:    f.write(", ".join(str(e).replace('<td>','').replace('</td>','') for e in it) + '\n')

which looks like this:

Header1, Header2, Header3Row 11, Row 12, Row 13Row 21, Row 22, Row 23Row 31, Row 32, Row 33

CodeHunter

convert html table to csv in python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last