How to parse ld+json using python How to parse ld+json using python json json

How to parse ld+json using python


You should read the JSON with json.loads to convert it into a dictionary.

import jsonimport requestsfrom bs4 import BeautifulSoupdef get_ld_json(url: str) -> dict:    parser = "html.parser"    req = requests.get(url)    soup = BeautifulSoup(req.text, parser)    return json.loads("".join(soup.find("script", {"type":"application/ld+json"}).contents))

The join / contents combination removes the script tags.


you should read the html to parse

html = urlopen(url).read()soup = BeautifulSoup(html, "html.parser")p = soup.find('script', {'type':'application/ld+json'})print p.contents


The comments above didn't help (thanks though)

In the end I used:

p = str(soup.find('script', {'type':'application/ld+json'}))

I forced it into a string which isn't really pretty, but it did the job. I know there's probably a better way out there, but this worked for me.