SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe python python

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe


you can also use sort_values(), groupby() and finally cumcount() + 1:

df['RN'] = df.sort_values(['data1','data2'], ascending=[True,False]) \             .groupby(['key1']) \             .cumcount() + 1print(df)

yields:

   data1  data2 key1  RN0      1      1    a   11      2     10    a   22      2      2    a   33      3      3    b   14      3     30    a   4

PS tested with pandas 0.18


You can do this by using groupby twice along with the rank method:

In [11]: g = df.groupby('key1')

Use the min method argument to give values which share the same data1 the same RN:

In [12]: g['data1'].rank(method='min')Out[12]:0    11    22    23    14    4dtype: float64In [13]: df['RN'] = g['data1'].rank(method='min')

And then groupby these results and add the rank with respect to data2:

In [14]: g1 = df.groupby(['key1', 'RN'])In [15]: g1['data2'].rank(ascending=False) - 1Out[15]:0    01    02    13    04    0dtype: float64In [16]: df['RN'] += g1['data2'].rank(ascending=False) - 1In [17]: dfOut[17]:   data1  data2 key1  RN0      1      1    a   11      2     10    a   22      2      2    a   33      3      3    b   14      3     30    a   4

It feels like there ought to be a native way to do this (there may well be!...).


Use groupby.rank function.Here the working example.

df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})dfC1 C2a  1a  2a  3b  4b  5df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)dfC1 C2 RANKa  1  1a  2  2a  3  3b  4  1b  5  2