Parsing variable data out of a javascript tag using python Parsing variable data out of a javascript tag using python json json

Parsing variable data out of a javascript tag using python


If you use BeautifulSoup to get the contents of the <script> tag, the json module can do the rest with a bit of string magic:

 jsonValue = '{%s}' % (textValue.partition('{')[2].rpartition('}')[0],) value = json.loads(jsonValue)

The .partition() and .rpartition() combo above split the text on the first { and on the last } in the JavaScript text block, which should be your object definition. By adding the braces back to the text we can feed it to json.loads() and get a python structure from it.

This works because JSON is basically the Javascript literal syntax objects, arrays, numbers, booleans and nulls.

Demonstration:

>>> import json>>> text = '''... var page_data = {...    "default_sku" : "SKU12345",...    "get_together" : {...       "imageLargeURL" : "http://null.null/pictures/large.jpg",...       "URL" : "http://null.null/index.tmpl",...       "name" : "Paints",...       "description" : "Here is a description and it works pretty well",...       "canFavorite" : 1,...       "id" : 1234,...       "type" : 2,...       "category" : "faded",...       "imageThumbnailURL" : "http://null.null/small9.jpg"...    }... };... '''>>> json_text = '{%s}' % (text.partition('{')[2].rpartition('}')[0],)>>> value = json.loads(json_text)>>> value{'default_sku': 'SKU12345', 'get_together': {'imageLargeURL': 'http://null.null/pictures/large.jpg', 'URL': 'http://null.null/index.tmpl', 'name': 'Paints', 'description': 'Here is a description and it works pretty well', 'canFavorite': 1, 'id': 1234, 'type': 2, 'category': 'faded', 'imageThumbnailURL': 'http://null.null/small9.jpg'}}>>> import pprint>>> pprint.pprint(value){'default_sku': 'SKU12345', 'get_together': {'URL': 'http://null.null/index.tmpl',                  'canFavorite': 1,                  'category': 'faded',                  'description': 'Here is a description and it works pretty '                                 'well',                  'id': 1234,                  'imageLargeURL': 'http://null.null/pictures/large.jpg',                  'imageThumbnailURL': 'http://null.null/small9.jpg',                  'name': 'Paints',                  'type': 2}}