Find when the values of a pandas.Series change by at least x
I don't know if I am understanding you correctly, but here is how I interpreted the problem:
import pandas as pdimport numpy as np# Our series of data.data = pd.DataFrame(np.random.rand(10), columns = ['value'])# The threshold.threshold = .33# For each point t, grab t - 1. data['value_shifted'] = data['value'].shift(1)# Absolute difference of t and t - 1.data['abs_change'] = abs(data['value'] - data['value_shifted'])# Test against the threshold.data['change_exceeds_threshold'] = np.where(data['abs_change'] > threshold, 1, 0)print(data)
Giving:
value value_shifted abs_change change_exceeds_threshold0 0.005382 NaN NaN 01 0.060954 0.005382 0.055573 02 0.090456 0.060954 0.029502 03 0.603118 0.090456 0.512661 14 0.178681 0.603118 0.424436 15 0.597814 0.178681 0.419133 16 0.976092 0.597814 0.378278 17 0.660010 0.976092 0.316082 08 0.805768 0.660010 0.145758 09 0.698369 0.805768 0.107400 0
I don't think the pseudo code can be vectorized because the next state of s*
is dependent on the last state. There's a pure python solution (1 iteration):
import randomimport pandas as pds = [random.randint(0,100) for _ in range(100)]res = [] # record changesthres = 20ss = s[0]for i in range(len(s)): if abs(s[i] - ss) > thres: ss = s[i] res.append([i, s[i]])df = pd.DataFrame(res, columns=['value'])
I think there's no way to run faster than O(N) in this case.