How to extract json from script tag using beautiful soup python?
This should work, I am absolutely sure there is a more elegant approach:
import jsonfrom bs4 import BeautifulSouphtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup = BeautifulSoup(html, 'html.parser')res = soup.find('script')json_object = json.loads(res.contents[0])for language in json_object['languages']: print('{}: {}'.format(language['displayName'], language['reviewCount']))
output:
Toutes les langues: 573français: 567English: 6
Import json and load data into json
and then iterarte to get all the reviewCount
.
import jsonhtml='''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''soup=BeautifulSoup(html,"html.parser")item=soup.select_one('script[data-initial-state="review-filter"]').textjsondata=json.loads(item)for item in jsondata['languages']: print(item['reviewCount'])
Output:
5735676
import rehtml = '''<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>'''match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]print(match)
Output:
['573', '567', '6']