Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python? Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python? pandas pandas

Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?


My first instinct is to use the json.loads to cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.

A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.

import reimport pandasd as pddf = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",    "{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],    'idx': [130, 278]})### Solution 1- use json.loads##def string_to_dict(dict_string):    # Convert to proper json format    dict_string = dict_string.replace("'", '"').replace('u"', '"')    return json.loads(dict_string)df.CoordDicts = df.Coordinates.apply(string_to_dict)df.CoordDicts[0]['coordinates']#>>> [-43.30175, 123.45]### Solution 2 - use regex##def get_lat_lon(dict_string):    # Get the coordinates string with regex    rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()    # Cast to floats    coords = [float(x) for x in rs.split(',')]    return coordsdf.Coords = df.Coordinates.apply(get_lat_lon)df.Coords[0]#>>> [-43.30175, 123.45]


Just ran into this problem. My solution:

import astimport pandas as pddf = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])df = df["Coordinates"].astype('str')df = df.apply(lambda x: ast.literal_eval(x))df = df.apply(pd.Series)


Assuming you start with a Series of dicts, you can use the .tolist() method to create a list of dicts and use this as input for a DataFrame. This approach will map each distinct key to a column.

You can filter by keys on creation by setting the columns argument in pd.DataFrame(), giving you the neat one-liner below. Hope that helps.

# Starting assumption:data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",        "{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]s = pd.Series(data).apply(eval)# Create a DataFrame with a list of dicts with a selection of columnspd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]:                     coordinates0      [-43.301755, -22.990065]1  [-51.17913026, -30.01201896]