Pandas DENSE RANK Pandas DENSE RANK sql sql

Pandas DENSE RANK


Use pd.Series.rank with method='dense'

df['Rank'] = df.Year.rank(method='dense').astype(int)df

enter image description here


The fastest solution is factorize:

df['Rank'] = pd.factorize(df.Year)[0] + 1

Timings:

#len(df)=40kdf = pd.concat([df]*10000).reset_index(drop=True)In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)1000 loops, best of 3: 1.55 ms per loopIn [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 11000 loops, best of 3: 1.22 ms per loopIn [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 11000 loops, best of 3: 737 µs per loop


You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).

df['Rank'] = df.Year.astype('category').cat.codes + 1>>> df   Year  Value  Rank0  2012     10     11  2013     20     22  2013     25     23  2014     30     3