Pandas dataframe get first row of each group Pandas dataframe get first row of each group python python

Pandas dataframe get first row of each group


>>> df.groupby('id').first()     valueid        1    first2    first3    first4   second5    first6    first7   fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()   id   value0   1   first1   2   first2   3   first3   4  second4   5   first5   6   first6   7  fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)    id   value0    1   first1    1  second2    2   first3    2  second4    3   first5    3   third6    4  second7    4   fifth8    5   first9    6   first10   6  second11   7  fourth12   7   fifth


This will give you the second row of each group (zero indexed, nth(0) is the same as first()):

df.groupby('id').nth(1) 

Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group


I'd suggest to use .nth(0) rather than .first() if you need to get the first row.

The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.

E.g. if your dataset is :

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],            'value'  : ["first","second","third", np.NaN,                        "second","first","second","third",                        "fourth","first","second"]})>>> df.groupby('id').nth(0)    valueid        1    first2    NaN3    first4    first

And

>>> df.groupby('id').first()    valueid        1    first2    second3    first4    first