How can I load data from mongodb collection into pandas' DataFrame? How can I load data from mongodb collection into pandas' DataFrame? pandas pandas

How can I load data from mongodb collection into pandas' DataFrame?


Comprehend the cursor you got from the MongoDB before passing it to DataFrame

import pandas as pddf = pd.DataFrame(list(tweets.find()))


If you have data in MongoDb like this:

[    {        "name": "Adam",         "age": 27,         "address":{            "number": 4,             "street": "Main Road",             "city": "Oxford"        }     },     {        "name": "Steve",         "age": 32,         "address":{            "number": 78,             "street": "High Street",             "city": "Cambridge"        }     }]

You can put the data straight into a dataframe like this:

from pandas import DataFramedf = DataFrame(list(db.collection_name.find({}))

And you will get this output:

df.head()|    | name    | age  | address                                                   ||----|---------|------|-----------------------------------------------------------|| 1  | "Steve" | 27   | {"number": 4, "street": "Main Road", "city": "Oxford"}    | | 2  | "Adam"  | 32   | {"number": 78, "street": "High St", "city": "Cambridge"}  |

However the subdocuments will just appear as JSON inside the subdocument cell. If you want to flatten objects so that subdocument properties are shown as individual cells you can use json_normalize without any parameters.

from pandas.io.json import json_normalizedatapoints = list(db.collection_name.find({})df = json_normalize(datapoints)df.head()

This will give the dataframe in this format:

|    | name   | age  | address.number | address.street | address.city ||----|--------|------|----------------|----------------|--------------|| 1  | Thomas | 27   |     4          | "Main Road"    | "Oxford"     || 2  | Mary   | 32   |     78         | "High St"      | "Cambridge"  |


You can load your MongoDB data to pandas DataFame using this code. It works for me. Hope for you too.

import pymongoimport pandas as pdfrom pymongo import Connectionconnection = Connection()db = connection.database_nameinput_data = db.collection_namedata = pd.DataFrame(list(input_data.find()))