python BeautifulSoup parsing table

python beautifulsoup

Here you go:

data = []table = soup.find('table', attrs={'class':'lineItemsTable'})table_body = table.find('tbody')rows = table_body.find_all('tr')for row in rows:    cols = row.find_all('td')    cols = [ele.text.strip() for ele in cols]    data.append([ele for ele in cols if ele]) # Get rid of empty values

This gives you:

[ [u'1359711259', u'SRF', u'08/05/2013', u'5310 4 AVE', u'K', u'19', u'125.00', u'$'],   [u'7086775850', u'PAS', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'125.00', u'$'],   [u'7355010165', u'OMT', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'145.00', u'$'],   [u'4002488755', u'OMT', u'02/12/2014', u'NB 1ST AVE @ E 23RD ST', u'5', u'115.00', u'$'],   [u'7913806837', u'OMT', u'03/03/2014', u'5015 4th Ave', u'K', u'46', u'115.00', u'$'],   [u'5080015366', u'OMT', u'03/10/2014', u'EB 65TH ST @ 16TH AV E', u'7', u'50.00', u'$'],   [u'7208770670', u'OMT', u'04/08/2014', u'333 15th St', u'K', u'70', u'65.00', u'$'],   [u'$0.00\n\n\nPayment Amount:']]

Couple of things to note:

The last row in the output above, the Payment Amount is not a partof the table but that is how the table is laid out. You can filter itout by checking if the length of the list is less than 7.
The last column of every row will have to be handled separately since it is an input text box.

python beautifulsoup

Updated Answer

If a programmer is interested in only parsing a table from a webpage, they can utilize the pandas method pandas.read_html.

Let's say we want to extract the GDP data table from the website: https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries

Then following codes does the job perfectly (No need of beautifulsoup and fancy html):

import pandas as pdimport requestsurl = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"r = requests.get(url)df_list = pd.read_html(r.text) # this parses all the tables in webpages to a listdf = df_list[0]df.head()

Output

python beautifulsoup

Solved, this is how your parse their html results:

table = soup.find("table", { "class" : "lineItemsTable" })for row in table.findAll("tr"):    cells = row.findAll("td")    if len(cells) == 9:        summons = cells[1].find(text=True)        plateType = cells[2].find(text=True)        vDate = cells[3].find(text=True)        location = cells[4].find(text=True)        borough = cells[5].find(text=True)        vCode = cells[6].find(text=True)        amount = cells[7].find(text=True)        print amount

CodeHunter

python BeautifulSoup parsing table

Updated Answer

Output

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last