Using pandas to read downloaded html file Using pandas to read downloaded html file pandas pandas

Using pandas to read downloaded html file


I think you are on to the right track by using an html parser like beautiful soup. pandas.read_html() reads an html table not an html page.

You would want to do something like this...

from bs4 import BeautifulSoupimport pandas as pdtable = BeautifulSoup(open('C:/age0.html','r').read()).find('table')df = pd.read_html(table) #I think it accepts BeatifulSoup object                         #otherwise try str(table) as input


  1. first of all install below packages for parsing purpose

    • pip install BeautifulSoup4
    • pip install lxml
    • pip install html5lib
  2. then use 'read_html' to read html table on any html page.


    import pandas as pdspds_df = pds.read_html('C:/age0.html')pds_df[0]

I hope this will help.

Good Luck!!