BeautifulSoup: extract text from anchor tag

This will help:

from bs4 import BeautifulSoupdata = '''<div class="image">        <a href="http://www.example.com/eg1">Content1<img          src="http://image.example.com/img1.jpg" /></a>        </div>        <div class="image">        <a href="http://www.example.com/eg2">Content2<img          src="http://image.example.com/img2.jpg" /> </a>        </div>'''soup = BeautifulSoup(data)for div in soup.findAll('div', attrs={'class':'image'}):    print(div.find('a')['href'])    print(div.find('a').contents[0])    print(div.find('img')['src'])

If you are looking into Amazon products then you should be using the official API. There is at least one Python package that will ease your scraping issues and keep your activity within the terms of use.

python html beautifulsoup tags scraper

In my case, it worked like that:

from BeautifulSoup import BeautifulSoup as bsurl="http://blabla.com"soup = bs(urllib.urlopen(url))for link in soup.findAll('a'):        print link.string

Hope it helps!

python html beautifulsoup tags scraper

I would suggest going the lxml route and using xpath.

from lxml import etree# data is the variable containing the htmldata = etree.HTML(data)anchor = data.xpath('//a[@class="title"]/text()')

CodeHunter

BeautifulSoup: extract text from anchor tag

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last