install.packages(“ggplot2”,“forecast”,“fpp2”,“fable”)

Project Objective : To forecast the Annual electricity sales to residential customers in South Australia during 1989 – 2008.

Moving Average Smoothing

A moving average of order mm can be written as :

\[\bar{T_{t}} = \frac{1}{m} \sum^{k}_{j= -k} {y}_{t} + j\]

                                                                                                        where m = 2k

That is, the estimate of the trend-cycle at time T is obtained by averaging values of the time series within k periods of t. Observations that are nearby in time are also likely to be close in value. Therefore, the average eliminates some of the randomness in the data, leaving a smooth trend-cycle component. We call this an mm-MA, meaning a moving average of order mm.

Using the R programming first lets simply plot the graph of Years v/s Sales.

Residential electricity sales during years 1989 - 2008

The data shown below shows the annual electricity sale for 20 years. By using moving average method lets forecast the sale of electricity for the coming years.

Year <- c(1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008)

Sales <- c(2354.34,2379.71,2318.52,2468.99,2386.09,2569.47,2575.72,2762.72,2844.50,3000.70,3108.10,3357.50,3075.70,3180.60,3221.60,3176.20,3430.60,3527.48,3637.89,3655.00)

df <- data.frame(Year, Sales)

print(df)
##    Year   Sales
## 1  1989 2354.34
## 2  1990 2379.71
## 3  1991 2318.52
## 4  1992 2468.99
## 5  1993 2386.09
## 6  1994 2569.47
## 7  1995 2575.72
## 8  1996 2762.72
## 9  1997 2844.50
## 10 1998 3000.70
## 11 1999 3108.10
## 12 2000 3357.50
## 13 2001 3075.70
## 14 2002 3180.60
## 15 2003 3221.60
## 16 2004 3176.20
## 17 2005 3430.60
## 18 2006 3527.48
## 19 2007 3637.89
## 20 2008 3655.00
library(ggplot2)
library(forecast)
library(fpp2)       #fpp2 has built - in dataset "elecsales"
library(fable)

data(elecsales)

elecsales
## Time Series:
## Start = 1989 
## End = 2008 
## Frequency = 1 
##  [1] 2354.34 2379.71 2318.52 2468.99 2386.09 2569.47 2575.72 2762.72 2844.50
## [10] 3000.70 3108.10 3357.50 3075.70 3180.60 3221.60 3176.20 3430.60 3527.48
## [19] 3637.89 3655.00

the graph in RStudio is as below :

autoplot(elecsales) + xlab("Year") + ylab("GWh") +
   ggtitle("Annual electricity sales: South Australia") + 
    theme( plot.title = element_text(hjust = 0.5) )

Now we will calculate the moving averages in order to smoothing the data. In the last 4 columns of the last table, I have calculated the moving average of order 3,5,7,9 is shown, providing an estimate of the trend-cycle. The more we move towards a bigger moving average , the more smooth is the trend line.

Calculation of 3M :

The first value in this column is the average of the first three observations (1989–1991); the second value in the 3-MA column is the average of the values for 1992–1994; and so on. Each value in the 3-MA column is the average of the observations in the three year window centred on the corresponding year.

Residential electricity sales (black) along with the 3-MA estimate of the trend-cycle (red).

ma(elecsales, 3)
## Time Series:
## Start = 1989 
## End = 2008 
## Frequency = 1 
##  [1]       NA 2350.857 2389.073 2391.200 2474.850 2510.427 2635.970 2727.647
##  [9] 2869.307 2984.433 3155.433 3180.433 3204.600 3159.300 3192.800 3276.133
## [17] 3378.093 3531.990 3606.790       NA
autoplot(elecsales, series="Data") +
  autolayer(ma(elecsales,3), series="3-MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Annual electricity sales: South Australia", 
          subtitle = "(3 Moving Average)") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) + 
  scale_colour_manual(values=c("Data"="grey50","3-MA"="red"),
                      breaks=c("Data","3-MA"))
## Warning: Removed 2 row(s) containing missing values (geom_path).

Calculation of 5M :

The first value in this column is the average of the first five observations (1989–1993); the second value in the 5-MA column is the average of the values for 1990–1994; and so on. Each value in the 5-MA column is the average of the observations in the five year window centred on the corresponding year.

Residential electricity sales (black) along with the 5-MA estimate of the trend-cycle (red).

ma(elecsales, 5)
## Time Series:
## Start = 1989 
## End = 2008 
## Frequency = 1 
##  [1]       NA       NA 2381.530 2424.556 2463.758 2552.598 2627.700 2750.622
##  [9] 2858.348 3014.704 3077.300 3144.520 3188.700 3202.320 3216.940 3307.296
## [17] 3398.754 3485.434       NA       NA
autoplot(elecsales, series="Data") +
  autolayer(ma(elecsales,5), series="5-MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Annual electricity sales: South Australia", 
          subtitle = "(5 Moving Average)") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) +
  scale_colour_manual(values=c("Data"="grey50","5-MA"="red"),
                      breaks=c("Data","5-MA"))
## Warning: Removed 4 row(s) containing missing values (geom_path).

Calculation of 7M :

The first value in this column is the average of the first three observations (1989–1995); the second value in the 7-MA column is the average of the values for 1996–2002; and so on. Each value in the 7-MA column is the average of the observations in the seven year window centred on the corresponding year.

Residential electricity sales (black) along with the 7-MA estimate of the trend-cycle (red).

ma(elecsales, 7)
## Time Series:
## Start = 1989 
## End = 2008 
## Frequency = 1 
##  [1]       NA       NA       NA 2436.120 2494.460 2560.859 2658.313 2749.614
##  [9] 2888.387 2960.706 3047.117 3112.671 3160.057 3221.471 3281.383 3321.439
## [17] 3404.196       NA       NA       NA
autoplot(elecsales, series="Data") +
  autolayer(ma(elecsales,7), series="7-MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Annual electricity sales: South Australia", 
          subtitle = "(7 Moving Average)") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) + 
  scale_colour_manual(values=c("Data"="grey50","7-MA"="red"),
                      breaks=c("Data","7-MA"))
## Warning: Removed 6 row(s) containing missing values (geom_path).

Calculation of 9M :

The first value in this column is the average of the first three observations (1989–1997); the second value in the 9-MA column is the average of the values for 1998–2006; and so on. Each value in the 9-MA column is the average of the observations in the nine year window centred on the corresponding year.

Residential electricity sales (black) along with the 7-MA estimate of the trend-cycle (red).

ma(elecsales, 9)
## Time Series:
## Start = 1989 
## End = 2008 
## Frequency = 1 
##  [1]       NA       NA       NA       NA 2517.784 2589.602 2670.534 2785.977
##  [9] 2853.389 2941.668 3014.127 3080.847 3155.056 3230.942 3301.741 3362.508
## [17]       NA       NA       NA       NA
autoplot(elecsales, series="Data") +
  autolayer(ma(elecsales,9), series="9-MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Annual electricity sales: South Australia", 
          subtitle = "(9 Moving Average)") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", color = "red") ) + 
  scale_colour_manual(values=c("Data"="grey50","9-MA"="red"),
                      breaks=c("Data","9-MA"))
## Warning: Removed 8 row(s) containing missing values (geom_path).

CONCLUSION :

It can be seen that the more observations included in the moving average (i.e., the larger the value of k), the smoother the resulting trend-cycle. However, even with a 5 MA, the fitted trend-cycle is still too rough. A much smoother curve, without the little bumps and wiggles, would be a more reasonable estimate; that would require a moving average of higher order. Determining the appropriate length of a moving average is an important task in decomposition methods. As a rule, a larger number of terms in the moving average increases the likelihood that randomness will be eliminated. That argues for using as long a length as possible. However, the longer the length of the moving average, the more terms (and information) are lost in the process of averaging, since k data values are required for a k-term average. Also, longer-term moving average smoothers tend to smooth out the genuine bumps or cycles that are of interest. In applying a k-term moving average, m = (k − 1)/2 neighboring points are needed on either side of the observation. Therefore, it is not possible to estimate the trend-cycle close to the beginning and end of the series. The m terms lost in the beginning of the data are usually of little consequence, but those m lost in the end are critical, since they are the starting point for forecasting the cycle. Not only3/2 Moving averages 95 must the cyclical values for periods t+1, t+2, and so on, be estimated, but the values for periods t, t − 1, t − 2, . . . , t − m + 1 must also be estimated.