Combine List Into Dataframe
Pandas accepts a list of dictionaries directly. Don't fight this, you can simply extract i['metadata']
for each item in your list.
Your only task thereafter is to rename and sort columns.
r = [{'metadata': {'name': 'mike', 'CountryId': 1, 'StateId': 4, 'Income': 20000}}, {'metadata': {'name': 'mary', 'CountryId': 2, 'StateId': 5, 'Income': 30000}}, {'metadata': {'name': 'jane', 'CountryId': 3, 'StateId': 6, 'Income': 40000}}]df = pd.DataFrame([i['metadata'] for i in r])\ .rename(columns={'CountryId': 'id_a', 'StateId': 'id_b', 'Income': 'income'})\ .reindex(['name', 'id_a', 'id_b', 'income'], axis=1)print(df) name id_a id_b income0 mike 1 4 200001 mary 2 5 300002 jane 3 6 40000
You can create variable person_info
outside of loop and append tuple
s in each iteration:
person_info = list()for r in rows: person_info.append((r['metadata']['name'], r['metadata']['CountryId'], r['metadata']['StateId'], r['metadata']['Income']))
Solution with list comprehension
:
person_info = [(r['metadata']['name'], r['metadata']['CountryId'], r['metadata']['StateId'], r['metadata']['Income']) for r in rows]
df = pd.DataFrame(person_info, columns=["name", "id_a", "id_b", "income"])
Another possible solution if input is json
is use json_normalize
:
import jsonfrom pandas.io.json import json_normalize with open('myJson.json') as data_file: data = json.load(data_file) df = json_normalize(data, 'metadata')
You can also try using defaultdict
and using it to create dataframe
:
from collections import defaultdictimport pandas as pdperson_info = defaultdict(list)for r in rows: person_info['name'].append(r['metadata']['name']) person_info['id_a'].append(r['metadata']['CountryId']) person_info['id_b'].append(r['metadata']['StateId']) person_info['income'].append(r['metadata']['Income'])
Then, creating dataframe:
df = pd.DataFrame(person_info)