SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe
you can also use sort_values()
, groupby()
and finally cumcount() + 1
:
df['RN'] = df.sort_values(['data1','data2'], ascending=[True,False]) \ .groupby(['key1']) \ .cumcount() + 1print(df)
yields:
data1 data2 key1 RN0 1 1 a 11 2 10 a 22 2 2 a 33 3 3 b 14 3 30 a 4
PS tested with pandas 0.18
You can do this by using groupby
twice along with the rank
method:
In [11]: g = df.groupby('key1')
Use the min method argument to give values which share the same data1 the same RN:
In [12]: g['data1'].rank(method='min')Out[12]:0 11 22 23 14 4dtype: float64In [13]: df['RN'] = g['data1'].rank(method='min')
And then groupby these results and add the rank with respect to data2:
In [14]: g1 = df.groupby(['key1', 'RN'])In [15]: g1['data2'].rank(ascending=False) - 1Out[15]:0 01 02 13 04 0dtype: float64In [16]: df['RN'] += g1['data2'].rank(ascending=False) - 1In [17]: dfOut[17]: data1 data2 key1 RN0 1 1 a 11 2 10 a 22 2 2 a 33 3 3 b 14 3 30 a 4
It feels like there ought to be a native way to do this (there may well be!...).