Filtering out outliers in Pandas dataframe with rolling median

pandas median outliers rolling-computation

Just filter the dataframe

df['median']= df['b'].rolling(window).median()df['std'] = df['b'].rolling(window).std()#filter setupdf = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]

pandas median outliers rolling-computation

There might well be a more pandastic way to do this - this is a bit of a hack, relying on a sorta manual way of mapping the original df's index to each rolling window. (I picked size 6). The records up and until row 6 are associated with the first window; row 7 is the second window, and so on.

n = 100df = pd.DataFrame(np.random.randint(0,n,size=(n,2)), columns = ['a','b'])## set window sizewindow=6std = 1  # I set it at just 1; with real data and larger windows, can be larger## create df with rolling stats, upper and lower boundsbounds = pd.DataFrame({'median':df['b'].rolling(window).median(),'std':df['b'].rolling(window).std()})bounds['upper']=bounds['median']+bounds['std']*stdbounds['lower']=bounds['median']-bounds['std']*std## here, we set an identifier for each window which maps to the original df## the first six rows are the first window; then each additional row is a new windowbounds['window_id']=np.append(np.zeros(window),np.arange(1,n-window+1))## then we can assign the original 'b' value back to the bounds dfbounds['b']=df['b']## and finally, keep only rows where b falls within the desired boundsbounds.loc[bounds.eval("lower<b<upper")]

pandas median outliers rolling-computation

This is my take on creating a median filter:

def median_filter(num_std=3):    def _median_filter(x):        _median = np.median(x)        _std = np.std(x)        s = x[-1]        return s if s >= _median - num_std * _std and s <= _median + num_std * _std else np.nan    return _median_filterdf.y.rolling(window).apply(median_filter(num_std=3), raw=True)

CodeHunter

Filtering out outliers in Pandas dataframe with rolling median

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last