Combine List Into Dataframe Combine List Into Dataframe pandas pandas

Combine List Into Dataframe


Pandas accepts a list of dictionaries directly. Don't fight this, you can simply extract i['metadata'] for each item in your list.

Your only task thereafter is to rename and sort columns.

r = [{'metadata': {'name': 'mike', 'CountryId': 1, 'StateId': 4, 'Income': 20000}},     {'metadata': {'name': 'mary', 'CountryId': 2, 'StateId': 5, 'Income': 30000}},     {'metadata': {'name': 'jane', 'CountryId': 3, 'StateId': 6, 'Income': 40000}}]df = pd.DataFrame([i['metadata'] for i in r])\       .rename(columns={'CountryId': 'id_a', 'StateId': 'id_b', 'Income': 'income'})\       .reindex(['name', 'id_a', 'id_b', 'income'], axis=1)print(df)   name  id_a  id_b  income0  mike     1     4   200001  mary     2     5   300002  jane     3     6   40000


You can create variable person_info outside of loop and append tuples in each iteration:

person_info = list()for r in rows:  person_info.append((r['metadata']['name'], r['metadata']['CountryId'], r['metadata']['StateId'], r['metadata']['Income']))

Solution with list comprehension:

person_info = [(r['metadata']['name'], r['metadata']['CountryId'], r['metadata']['StateId'], r['metadata']['Income']) for r in rows]

df = pd.DataFrame(person_info, columns=["name", "id_a", "id_b", "income"]) 

Another possible solution if input is json is use json_normalize:

import jsonfrom pandas.io.json import json_normalize    with open('myJson.json') as data_file:        data = json.load(data_file)  df = json_normalize(data, 'metadata')


You can also try using defaultdict and using it to create dataframe:

from collections import defaultdictimport pandas as pdperson_info = defaultdict(list)for r in rows:    person_info['name'].append(r['metadata']['name'])    person_info['id_a'].append(r['metadata']['CountryId'])    person_info['id_b'].append(r['metadata']['StateId'])    person_info['income'].append(r['metadata']['Income'])

Then, creating dataframe:

df = pd.DataFrame(person_info)