Using Holt-Winters for forecasting in Python
I tried generating random data until I got interesting results. Here I fed in all positive numbers and got negative forecasts:
y = [0.92, 0.78, 0.92, 0.61, 0.47, 0.4, 0.59, 0.13, 0.27, 0.31, 0.24, 0.01]holtwinters(y, 0.2, 0.1, 0.05, 4)...forecast: -0.104857182966forecast: -0.197407475203forecast: -0.463988558577forecast: -0.258023593197
but note that the forecast fits the negative slope of the data.
This might be the orders of magnitude you were talking about:
y = [0.1, 0.68, 0.15, 0.08, 0.94, 0.58, 0.35, 0.38, 0.7, 0.74, 0.93, 0.87]holtwinters(y, 0.2, 0.1, 0.05, 4)...forecast: 1.93777559066forecast: 3.11109138055forecast: 0.910967977635forecast: 0.684668348397
But I'm not sure how you'd deem it wildly inaccurate or judge that it "should be" lower.
Whenever you're extrapolating data, you're going to have somewhat surprising results. Are you concerned more that the implementation might be incorrect or that the output doesn't have good properties for your specific usage?
Firstable, if you're unsure about your specific implementation of the algorithm, I recommend that you create some testcase for that. Take another implementation, maybe matlab, whatever, anything that you know it works. Generate some inputs, feed it to the reference and to your implementation, and it should be identical. I have translated and verified some algorithms from matlab that way. scipy.io.loadmat
is great for that.
About your usage of the algorithm: You're talking about periodicities in days and weeks, and you feed data on a minutes timescale. I don't know if this specific algorithm handles that well, but in any case I'd suggest, to try some lowpass filtering and then feeding it into the algorithm hourly, or even slower. Nearly 700 timesteps for one period could be just too much to recognize. The data that you feed in should also contain a minimum two complete periods of your timeseries. If your algorithm supports periodicity, you also have to provide it with data in an appropriate way, so it can actually see the periodicity. The fact, that you get these extrem values could be a hint, that the algorithm only has date for a steady trend in one direction.
Maybe you also want to separate your forcasts to have one optimized for weekly prediction, and the other one intraday, and you combine them in the end again.
I think the problem with this method is how they calculate the initial values. They seems to be using a linear model when:
This is a very poor method that should not be used as the trend will be biased by the seasonal pattern. Imagine a seasonal pattern, for example, where the last period of the year is always the largest value for the year. Then the trend will be biased upwards. Unfortunately, Bowerman, O’Connell and Koehler (2005) are not alone in recommending bad methods. I’ve seen similar, and worse, procedures recommended in other books. [1]
a better method si decomposinf the timeseries in trend and seasonality [1]