Pandas Rolling second Highest Value based on another column Pandas Rolling second Highest Value based on another column pandas pandas

Pandas Rolling second Highest Value based on another column


It is not the most elegant solution, but I would do the following:

1- Load Dataset

import numpy as npimport pandas as pddata={'Person':['a','a','a','a','a','b','b','b','b','b','b'],     'Sales':['50','60','90','30','33','100','600','80','90','400','550'],     'Price':['10','12','8','10','12','10','13','16','14','12','10']}data=pd.DataFrame(data)data['Sales'] = data['Sales'].astype(float)

2- Use Groupby and expanding together:

data['2nd_sales'] = data.groupby('Person')['Sales'].expanding(min_periods=2) \                                  .apply(lambda x: x.nlargest(2).values[-1]).values

3- Calculate the Second_Highest_Price:

data['Second_Highest_Price'] = np.where((data['Sales'].shift() == data['2nd_sales']), data['Price'].shift(),                                (np.where((data['Sales'] == data['2nd_sales']), data['Price'], np.nan)))data['Second_Highest_Price'] = data.groupby('Person')['Second_Highest_Price'].ffill()

Output:

data['Second_Highest_Price'].valuesarray([nan, '10', '12', '12', '12', nan, '10', '10', '10', '12', '10'],      dtype=object)