install.packages(“ggplot2”,“forecast”,“fpp2”,“fable”)
A moving average of order mm can be written as :
\[\bar{T_{t}} = \frac{1}{m} \sum^{k}_{j= -k} {y}_{t} + j\]
That is, the estimate of the trend-cycle at time T is obtained by averaging values of the time series within k periods of t. Observations that are nearby in time are also likely to be close in value. Therefore, the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component. We call this an mm-MA, meaning a moving average of order mm.
Using the R programming first lets simply plot the graph of Years v/s Sales.
The data shown below shows the annual electricity sale for 20 years. By using moving average method lets forecast the sale of electricity for the coming years.
Year <- c(1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008)
Sales <- c(2354.34,2379.71,2318.52,2468.99,2386.09,2569.47,2575.72,2762.72,2844.50,3000.70,3108.10,3357.50,3075.70,3180.60,3221.60,3176.20,3430.60,3527.48,3637.89,3655.00)
df <- data.frame(Year, Sales)
print(df)
## Year Sales
## 1 1989 2354.34
## 2 1990 2379.71
## 3 1991 2318.52
## 4 1992 2468.99
## 5 1993 2386.09
## 6 1994 2569.47
## 7 1995 2575.72
## 8 1996 2762.72
## 9 1997 2844.50
## 10 1998 3000.70
## 11 1999 3108.10
## 12 2000 3357.50
## 13 2001 3075.70
## 14 2002 3180.60
## 15 2003 3221.60
## 16 2004 3176.20
## 17 2005 3430.60
## 18 2006 3527.48
## 19 2007 3637.89
## 20 2008 3655.00
library(ggplot2)
library(forecast)
library(fpp2) #fpp2 has built - in dataset "elecsales"
library(fable)
data(elecsales)
elecsales
## Time Series:
## Start = 1989
## End = 2008
## Frequency = 1
## [1] 2354.34 2379.71 2318.52 2468.99 2386.09 2569.47 2575.72 2762.72 2844.50
## [10] 3000.70 3108.10 3357.50 3075.70 3180.60 3221.60 3176.20 3430.60 3527.48
## [19] 3637.89 3655.00
the graph in RStudio is as below :
autoplot(elecsales) + xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia") +
theme( plot.title = element_text(hjust = 0.5) )
Now we will calculate the moving averages in order to smoothing the data. In the last 4 columns of the last table, I have calculated the moving average of order 3,5,7,9 is shown, providing an estimate of the trend-cycle. The more we move towards a bigger moving average , the more smooth is the trend line.
The first value in this column is the average of the first three observations (1989–1991); the second value in the 3-MA column is the average of the values for 1992–1994; and so on. Each value in the 3-MA column is the average of the observations in the three year window centred on the corresponding year.
ma(elecsales, 3)
## Time Series:
## Start = 1989
## End = 2008
## Frequency = 1
## [1] NA 2350.857 2389.073 2391.200 2474.850 2510.427 2635.970 2727.647
## [9] 2869.307 2984.433 3155.433 3180.433 3204.600 3159.300 3192.800 3276.133
## [17] 3378.093 3531.990 3606.790 NA
autoplot(elecsales, series="Data") +
autolayer(ma(elecsales,3), series="3-MA") +
xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia",
subtitle = "(3 Moving Average)") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) +
scale_colour_manual(values=c("Data"="grey50","3-MA"="red"),
breaks=c("Data","3-MA"))
## Warning: Removed 2 row(s) containing missing values (geom_path).
The first value in this column is the average of the first five observations (1989–1993); the second value in the 5-MA column is the average of the values for 1990–1994; and so on. Each value in the 5-MA column is the average of the observations in the five year window centred on the corresponding year.
ma(elecsales, 5)
## Time Series:
## Start = 1989
## End = 2008
## Frequency = 1
## [1] NA NA 2381.530 2424.556 2463.758 2552.598 2627.700 2750.622
## [9] 2858.348 3014.704 3077.300 3144.520 3188.700 3202.320 3216.940 3307.296
## [17] 3398.754 3485.434 NA NA
autoplot(elecsales, series="Data") +
autolayer(ma(elecsales,5), series="5-MA") +
xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia",
subtitle = "(5 Moving Average)") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) +
scale_colour_manual(values=c("Data"="grey50","5-MA"="red"),
breaks=c("Data","5-MA"))
## Warning: Removed 4 row(s) containing missing values (geom_path).
The first value in this column is the average of the first three observations (1989–1995); the second value in the 7-MA column is the average of the values for 1996–2002; and so on. Each value in the 7-MA column is the average of the observations in the seven year window centred on the corresponding year.
ma(elecsales, 7)
## Time Series:
## Start = 1989
## End = 2008
## Frequency = 1
## [1] NA NA NA 2436.120 2494.460 2560.859 2658.313 2749.614
## [9] 2888.387 2960.706 3047.117 3112.671 3160.057 3221.471 3281.383 3321.439
## [17] 3404.196 NA NA NA
autoplot(elecsales, series="Data") +
autolayer(ma(elecsales,7), series="7-MA") +
xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia",
subtitle = "(7 Moving Average)") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) +
scale_colour_manual(values=c("Data"="grey50","7-MA"="red"),
breaks=c("Data","7-MA"))
## Warning: Removed 6 row(s) containing missing values (geom_path).
The first value in this column is the average of the first three observations (1989–1997); the second value in the 9-MA column is the average of the values for 1998–2006; and so on. Each value in the 9-MA column is the average of the observations in the nine year window centred on the corresponding year.
ma(elecsales, 9)
## Time Series:
## Start = 1989
## End = 2008
## Frequency = 1
## [1] NA NA NA NA 2517.784 2589.602 2670.534 2785.977
## [9] 2853.389 2941.668 3014.127 3080.847 3155.056 3230.942 3301.741 3362.508
## [17] NA NA NA NA
autoplot(elecsales, series="Data") +
autolayer(ma(elecsales,9), series="9-MA") +
xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia",
subtitle = "(9 Moving Average)") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) +
scale_colour_manual(values=c("Data"="grey50","9-MA"="red"),
breaks=c("Data","9-MA"))
## Warning: Removed 8 row(s) containing missing values (geom_path).
It can be seen that the more observations included in the moving average (i.e., the larger the value of k), the smoother the resulting trend-cycle. However, even with a 5 MA, the fitted trend-cycle is still too rough. A much smoother curve, without the little bumps and wiggles, would be a more reasonable estimate; that would require a moving average of higher order. Determining the appropriate length of a moving average is an important task in decomposition methods. As a rule, a larger number of terms in the moving average increases the likelihood that randomness will be eliminated. That argues for using as long a length as possible. However, the longer the length of the moving average, the more terms (and information) are lost in the process of averaging, since k data values are required for a k-term average. Also, longer-term moving average smoothers tend to smooth out the genuine bumps or cycles that are of interest. In applying a k-term moving average, m = (k − 1)/2 neighboring points are needed on either side of the observation. Therefore, it is not possible to estimate the trend-cycle close to the beginning and end of the series. The m terms lost in the beginning of the data are usually of little consequence, but those m lost in the end are critical, since they are the starting point for forecasting the cycle. Not only3/2 Moving averages 95 must the cyclical values for periods t+1, t+2, and so on, be estimated, but the values for periods t, t − 1, t − 2, . . . , t − m + 1 must also be estimated.