How should I generate outliers randomly? How should I generate outliers randomly? numpy numpy

How should I generate outliers randomly?


Just generate three parts of the data independently: first non-outliers, then lower- and upper outliers, merge them together, and finally shuffle them:

def generate(median=630, err=12, outlier_err=100, size=80, outlier_size=10):    errs = err * np.random.rand(size) * np.random.choice((-1, 1), size)    data = median + errs    lower_errs = outlier_err * np.random.rand(outlier_size)    lower_outliers = median - err - lower_errs    upper_errs = outlier_err * np.random.rand(outlier_size)    upper_outliers = median + err + upper_errs    data = np.concatenate((data, lower_outliers, upper_outliers))    np.random.shuffle(data)    return data

You'll get something like this:

>>> data = generate()>>> data.shape(100,)>>> data.min()518.1635764484727>>> data.max()729.9467630423616>>> np.median(data)629.9427184256936