ARIMA stands for AutoRegressive Integrated Moving Average — a widely used statistical method for analyzing and forecasting time series data. It models a time series based on its past values, past errors, and differencing to make the series stationary.

An ARIMA model is typically written as ARIMA(p, d, q) where:

ComponentMeaningPurpose
pAutoRegressive (AR) termRegression on past values
dIntegrated (I) termNumber of differences to make data stationary
qMoving Average (MA) termRegression on past forecast errors

Why Use ARIMA?

  • Works well for univariate time series with temporal structure.
  • Flexible: Can model data with trends and cycles.
  • Powerful: Balances past values (AR), past errors (MA), and differencing (I).
  • Often outperforms naive methods like moving averages or linear regression on raw time series data.

Example Scenario:

You’re a retailer and have monthly sales data for the last 5 years. You want to forecast:

  • Sales for the next 6 months.
  • Understand if there is a trend or seasonality.

ARIMA helps you:

  • Clean the data (make it stationary).
  • Fit a model that learns from past values and noise.
  • Generate accurate forecasts for decision-making.

ARIMA Modeling Steps

1. Understand and Prepare the Data

  • Ensure your data is a time series object (ts or tsibble).
  • Identify frequency: yearly, quarterly, monthly, etc.
ts_data <- ts(data_vector, start = c(YYYY, MM), frequency = 12)  # Monthly data

2. Visualize the Time Series

  • Helps to spot trends, seasonality, or anomalies.
plot(ts_data)

3. Check for Stationarity

  • ARIMA assumes stationarity (constant mean/variance).
  • Use plots and statistical tests:
adf.test(ts_data)  # Augmented Dickey-Fuller test

If Non-Stationary:

  • Difference the data until it becomes stationary.
  • Keep track of how many times you difference (d in ARIMA(p,d,q)).
ts_diff <- diff(ts_data)

4. Identify AR and MA Components

  • Use ACF and PACF plots:
    • ACF → Moving Average (MA) part (q)
    • PACF → AutoRegressive (AR) part (p)
acf(ts_diff)
pacf(ts_diff)

5. Fit ARIMA Model

  • Automatically:
fit <- auto.arima(ts_data)

Manually (if you decide p, d, q yourself):

fit <- arima(ts_data, order = c(p, d, q))

6. Evaluate Model Fit

  • Check summary statistics (AIC, BIC, residuals):
summary(fit)

7. Diagnostic Checking

  • Residuals should resemble white noise (no pattern).
checkresiduals(fit)

8. Forecasting

  • Predict future values:
forecasted <- forecast(fit, h = 12)  # Forecast 12 steps ahead
plot(forecasted)

We can fit a series of ARIMA models with different combinations of (p, d, q) and select the best one based on AIC or BIC. You can do this in base R using nested loops or use a more elegant approach with the forecast or stats package.

Here’s how to manually grid search ARIMA models and compare AIC/BIC scores:

# Load required packages
library(forecast)

# Replace this with your own time series data
data("AirPassengers")     # Example dataset
ts_data <- AirPassengers

# Define ranges for p, d, q
max_p <- 3
max_d <- 2
max_q <- 3

# Store results
results <- data.frame(p=integer(), d=integer(), q=integer(), AIC=numeric(), BIC=numeric())

# Loop through combinations
for (p in 0:max_p) {
  for (d in 0:max_d) {
    for (q in 0:max_q) {
      # Try-catch to handle models that may fail
      try({
        fit <- Arima(ts_data, order = c(p, d, q))
        results <- rbind(results, data.frame(
          p = p, d = d, q = q,
          AIC = AIC(fit),
          BIC = BIC(fit)
        ))
      }, silent = TRUE)
    }
  }
}

# Sort by AIC
best_aic <- results[which.min(results$AIC), ]
best_bic <- results[which.min(results$BIC), ]

# Print best models
cat("Best model by AIC:\n")
print(best_aic)

cat("\nBest model by BIC:\n")
print(best_bic)

Output

This code will return:

  • The (p,d,q) combination with the lowest AIC
  • The one with the lowest BIC

You can then fit the best model like:

best_model <- Arima(ts_data, order = c(best_aic$p, best_aic$d, best_aic$q))
summary(best_model)

Here is a list of some other widely used time-series models and their differences with ARIMA.

ModelDescription
ARUses only past values
MAUses only past forecast errors
ARMACombines AR and MA (for stationary data)
ARIMAAdds integration (differencing) to handle non-stationary data
SARIMAARIMA with seasonality
ARIMAXARIMA with exogenous variables (predictors)

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading