ARIMA stands for AutoRegressive Integrated Moving Average — a widely used statistical method for analyzing and forecasting time series data. It models a time series based on its past values, past errors, and differencing to make the series stationary.
An ARIMA model is typically written as ARIMA(p, d, q) where:
| Component | Meaning | Purpose |
|---|---|---|
| p | AutoRegressive (AR) term | Regression on past values |
| d | Integrated (I) term | Number of differences to make data stationary |
| q | Moving Average (MA) term | Regression on past forecast errors |
Why Use ARIMA?
- Works well for univariate time series with temporal structure.
- Flexible: Can model data with trends and cycles.
- Powerful: Balances past values (AR), past errors (MA), and differencing (I).
- Often outperforms naive methods like moving averages or linear regression on raw time series data.
Example Scenario:
You’re a retailer and have monthly sales data for the last 5 years. You want to forecast:
- Sales for the next 6 months.
- Understand if there is a trend or seasonality.
ARIMA helps you:
- Clean the data (make it stationary).
- Fit a model that learns from past values and noise.
- Generate accurate forecasts for decision-making.
ARIMA Modeling Steps
1. Understand and Prepare the Data
- Ensure your data is a time series object (
tsortsibble). - Identify frequency: yearly, quarterly, monthly, etc.
ts_data <- ts(data_vector, start = c(YYYY, MM), frequency = 12) # Monthly data
2. Visualize the Time Series
- Helps to spot trends, seasonality, or anomalies.
plot(ts_data)
3. Check for Stationarity
- ARIMA assumes stationarity (constant mean/variance).
- Use plots and statistical tests:
adf.test(ts_data) # Augmented Dickey-Fuller test
If Non-Stationary:
- Difference the data until it becomes stationary.
- Keep track of how many times you difference (
din ARIMA(p,d,q)).
ts_diff <- diff(ts_data)
4. Identify AR and MA Components
- Use ACF and PACF plots:
- ACF → Moving Average (MA) part (q)
- PACF → AutoRegressive (AR) part (p)
acf(ts_diff)
pacf(ts_diff)
5. Fit ARIMA Model
- Automatically:
fit <- auto.arima(ts_data)
Manually (if you decide p, d, q yourself):
fit <- arima(ts_data, order = c(p, d, q))
6. Evaluate Model Fit
- Check summary statistics (AIC, BIC, residuals):
summary(fit)
7. Diagnostic Checking
- Residuals should resemble white noise (no pattern).
checkresiduals(fit)
8. Forecasting
- Predict future values:
forecasted <- forecast(fit, h = 12) # Forecast 12 steps ahead
plot(forecasted)
We can fit a series of ARIMA models with different combinations of (p, d, q) and select the best one based on AIC or BIC. You can do this in base R using nested loops or use a more elegant approach with the forecast or stats package.
Here’s how to manually grid search ARIMA models and compare AIC/BIC scores:
# Load required packages
library(forecast)
# Replace this with your own time series data
data("AirPassengers") # Example dataset
ts_data <- AirPassengers
# Define ranges for p, d, q
max_p <- 3
max_d <- 2
max_q <- 3
# Store results
results <- data.frame(p=integer(), d=integer(), q=integer(), AIC=numeric(), BIC=numeric())
# Loop through combinations
for (p in 0:max_p) {
for (d in 0:max_d) {
for (q in 0:max_q) {
# Try-catch to handle models that may fail
try({
fit <- Arima(ts_data, order = c(p, d, q))
results <- rbind(results, data.frame(
p = p, d = d, q = q,
AIC = AIC(fit),
BIC = BIC(fit)
))
}, silent = TRUE)
}
}
}
# Sort by AIC
best_aic <- results[which.min(results$AIC), ]
best_bic <- results[which.min(results$BIC), ]
# Print best models
cat("Best model by AIC:\n")
print(best_aic)
cat("\nBest model by BIC:\n")
print(best_bic)
Output
This code will return:
- The (p,d,q) combination with the lowest AIC
- The one with the lowest BIC
You can then fit the best model like:
best_model <- Arima(ts_data, order = c(best_aic$p, best_aic$d, best_aic$q))
summary(best_model)
Here is a list of some other widely used time-series models and their differences with ARIMA.
| Model | Description |
|---|---|
| AR | Uses only past values |
| MA | Uses only past forecast errors |
| ARMA | Combines AR and MA (for stationary data) |
| ARIMA | Adds integration (differencing) to handle non-stationary data |
| SARIMA | ARIMA with seasonality |
| ARIMAX | ARIMA with exogenous variables (predictors) |
