Pandas dataframe get first row of each group
>>> df.groupby('id').first() valueid 1 first2 first3 first4 second5 first6 first7 fourth
If you need id
as column:
>>> df.groupby('id').first().reset_index() id value0 1 first1 2 first2 3 first3 4 second4 5 first5 6 first6 7 fourth
To get n first records, you can use head():
>>> df.groupby('id').head(2).reset_index(drop=True) id value0 1 first1 1 second2 2 first3 2 second4 3 first5 3 third6 4 second7 4 fifth8 5 first9 6 first10 6 second11 7 fourth12 7 fifth
I'd suggest to use .nth(0)
rather than .first()
if you need to get the first row.
The difference between them is how they handle NaNs, so .nth(0)
will return the first row of group no matter what are the values in this row, while .first()
will eventually return the first not NaN
value in each column.
E.g. if your dataset is :
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4], 'value' : ["first","second","third", np.NaN, "second","first","second","third", "fourth","first","second"]})>>> df.groupby('id').nth(0) valueid 1 first2 NaN3 first4 first
And
>>> df.groupby('id').first() valueid 1 first2 second3 first4 first