How can I parse a website using Selenium and Beautifulsoup in python? [closed]

python selenium beautifulsoup

Assuming you are on the page you want to parse, Selenium stores the source HTML in the driver's page_source attribute. You would then load the page_source into BeautifulSoup as follows:

In [8]: from bs4 import BeautifulSoupIn [9]: from selenium import webdriverIn [10]: driver = webdriver.Firefox()In [11]: driver.get('http://news.ycombinator.com')In [12]: html = driver.page_sourceIn [13]: soup = BeautifulSoup(html)In [14]: for tag in soup.find_all('title'):   ....:     print tag.text   ....:        ....:     Hacker News

python selenium beautifulsoup

As your question isn't particularly concrete, here's a simple example. To do something more useful read the BS docs. You will also find plenty of examples of selenium (and BS )usage here in SO.

from selenium import webdriverfrom bs4 import BeautifulSoupbrowser=webdriver.Firefox()browser.get('http://webpage.com')soup=BeautifulSoup(browser.page_source)#do something useful#prints all the links with corresponding textfor link in soup.find_all('a'):    print link.get('href',None),link.get_text()

python selenium beautifulsoup

Are you sure you want to use Selenium? For this reasons I used PyQt4, it's very powerful, and you can do what ever you want.

I can give you a sample code, that I just wrote, just change url and you good to go:

#! /usr/bin/env python2.7from PyQt4.QtCore import *from PyQt4.QtGui import *from PyQt4.QtWebKit import *from bs4 import BeautifulSoupimport sys, signalclass Browser(QWebView):    def __init__(self):        QWebView.__init__(self)        self.loadProgress.connect(self._progress)        self.loadFinished.connect(self._loadFinished)        self.frame = self.page().currentFrame()    def _progress(self, progress):        print str(progress) + "%"    def _loadFinished(self):        print "Load Finished"        html = unicode(self.frame.toHtml()).encode('utf-8')        soup = BeautifulSoup(html)        print soup.prettify()        self.close()if __name__ == "__main__":    app = QApplication(sys.argv)    br = Browser()    url = QUrl('http://web site that can contain javascript.com')    br.load(url)    br.show()    if signal.signal(signal.SIGINT, signal.SIG_DFL):        sys.exit(app.exec_())    app.exec_()

CodeHunter

How can I parse a website using Selenium and Beautifulsoup in python? [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last