BeautifulSoup: extract text from anchor tag BeautifulSoup: extract text from anchor tag python python

BeautifulSoup: extract text from anchor tag


This will help:

from bs4 import BeautifulSoupdata = '''<div class="image">        <a href="http://www.example.com/eg1">Content1<img          src="http://image.example.com/img1.jpg" /></a>        </div>        <div class="image">        <a href="http://www.example.com/eg2">Content2<img          src="http://image.example.com/img2.jpg" /> </a>        </div>'''soup = BeautifulSoup(data)for div in soup.findAll('div', attrs={'class':'image'}):    print(div.find('a')['href'])    print(div.find('a').contents[0])    print(div.find('img')['src'])

If you are looking into Amazon products then you should be using the official API. There is at least one Python package that will ease your scraping issues and keep your activity within the terms of use.


In my case, it worked like that:

from BeautifulSoup import BeautifulSoup as bsurl="http://blabla.com"soup = bs(urllib.urlopen(url))for link in soup.findAll('a'):        print link.string

Hope it helps!


I would suggest going the lxml route and using xpath.

from lxml import etree# data is the variable containing the htmldata = etree.HTML(data)anchor = data.xpath('//a[@class="title"]/text()')