How can I subclass a Pandas DataFrame? How can I subclass a Pandas DataFrame? python python

How can I subclass a Pandas DataFrame?


There is now an official guide on how to subclass Pandas data structures, which includes DataFrame as well as Series.

The guide is available here: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-subclassing-pandas

The guide mentions this subclassed DataFrame from the Geopandas project as a good example: https://github.com/geopandas/geopandas/blob/master/geopandas/geodataframe.py

As in HYRY's answer, it seems there are two things you're trying to accomplish:

  1. When calling methods on an instance of your class, return instances of the correct type (your type). For this, you can just add the _constructor property which should return your type.
  2. Adding attributes which will be attached to copies of your object. To do this, you need to store the names of these attributes in a list, as the special _metadata attribute.

Here's an example:

class SubclassedDataFrame(DataFrame):    _metadata = ['added_property']    added_property = 1  # This will be passed to copies    @property    def _constructor(self):        return SubclassedDataFrame


For Requirement 1, just define _constructor:

import pandas as pdimport numpy as npclass MyDF(pd.DataFrame):    @property    def _constructor(self):        return MyDFmydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])print type(mydf)mydf_sub = mydf[['A','C']]print type(mydf_sub)

I think there is no simple solution for Requirement 2. I think you need define __init__, copy, or do something in _constructor, for example:

import pandas as pdimport numpy as npclass MyDF(pd.DataFrame):    _attributes_ = "myattr1,myattr2"    def __init__(self, *args, **kw):        super(MyDF, self).__init__(*args, **kw)        if len(args) == 1 and isinstance(args[0], MyDF):            args[0]._copy_attrs(self)    def _copy_attrs(self, df):        for attr in self._attributes_.split(","):            df.__dict__[attr] = getattr(self, attr, None)    @property    def _constructor(self):        def f(*args, **kw):            df = MyDF(*args, **kw)            self._copy_attrs(df)            return df        return fmydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])print type(mydf)mydf_sub = mydf[['A','C']]print type(mydf_sub)mydf.myattr1 = 1mydf_cp1 = MyDF(mydf)mydf_cp2 = mydf.copy()print mydf_cp1.myattr1, mydf_cp2.myattr1