How to extract json from script tag using beautiful soup python?

python html json web-scraping beautifulsoup

This should work, I am absolutely sure there is a more elegant approach:

import jsonfrom bs4 import BeautifulSouphtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup = BeautifulSoup(html, 'html.parser')res = soup.find('script')json_object = json.loads(res.contents[0])for language in json_object['languages']:    print('{}: {}'.format(language['displayName'], language['reviewCount']))

output:

Toutes les langues: 573français: 567English: 6

python html json web-scraping beautifulsoup

Import json and load data into json and then iterarte to get all the reviewCount.

import jsonhtml='''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup=BeautifulSoup(html,"html.parser")item=soup.select_one('script[data-initial-state="review-filter"]').textjsondata=json.loads(item)for item in jsondata['languages']:    print(item['reviewCount'])

Output:

python html json web-scraping beautifulsoup

import rehtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]print(match)

Output:

['573', '567', '6']

CodeHunter

How to extract json from script tag using beautiful soup python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last