January 21, 2026 10:48 am

What is Autocorrelation?

Correlation between the error terms arising in time series data is known as autocorrelation or serial correlation. For such cases, the error term et at time period t is correlated with error terms et+1,et+2,… or et-1,et-2 and so on. The first one is an example of positive autocorrelation and the second one is an example of negative autocorrelation. Such correlation often arises from the correlation of the omitted variables from the model that the error term captures. The correlation between et and et-1 is the first order autocorrelation. Similarly, the correlation between et and et-2 is called the second order autocorrelation.

How can we detect presence of autocorrelation in our model?

There are many statistical tests which help to identify the presence of autocorrelation. We can also identify autocorrelation visually through ACF plots. We will discuss them one by one.

Statistical Tests to identify Autocorrelation

1. Durbin – Watson (DW) Test:

A very well known test that is used to identify presence of autocorrelation is Durbin Watson(DW) test. The DW test statistic is expressed as below –

∑(et -et-1)2/∑et2

where et is the error term at period t. Now this formula returns a value which lies between 0 and 4. A value of 2 is considered as no autocorrelation. A value greater than 2 and closer to 4 indicates negative autocorrelation and a value lesser than 2 and closer to 0 indicates positive autocorrelation. Now, the null hypothesis of this test is –

H0 : No first order autocorrelation exists among the residuals.

H1: The residuals are autocorrelated.

Python Implementation of Durbin-Watson test:

import statsmodels.api as sm
import numpy as np

# Example: Regression residuals
X = np.random.randn(100, 2)  # Predictors
y = X @ np.array([1.5, -2.0]) + np.random.normal(0, 1, 100)  # True model + noise

model = sm.OLS(y, sm.add_constant(X)).fit()
residuals = model.resid

# Durbin-Watson test
dw_stat = sm.stats.stattools.durbin_watson(residuals)
print(f"Durbin-Watson Statistic: {dw_stat:.4f}")

2. Ljung-Box Q Test:

Another very popular test is the Ljung-Box Q test. The null and alternative hypothesis for this test is as follows –

H0: The autocorrelation upto lag k is all 0

H1: The autocorrelation upto lag k differ from 0.

For this test if the resulting p value is less than the critical value for the chosen level of significance, we reject the null hypothesis and conclude that there is autocorrelation in residuals.

Python Implementation of Ljung-Box test:

import numpy as np
from statsmodels.stats.diagnostic import acorr_ljungbox

# Example: Residuals from a fitted model
residuals = np.random.normal(0, 1, 100)  # Replace with actual residuals

# Perform Ljung-Box test
lb_test = acorr_ljungbox(residuals, lags=[10], return_df=True)  # Testing up to lag 10
print(lb_test)

Visual Methods:

ACF plots :

A plot of the autocorrelation of a time series by lag is called the Auto-Correlation Function, or ACF plot. We plot the values of correlation among lags along with the confidence band in an ACF plot. In simple terms, it describes how well the present value of the series is related with its past values.

Python Code for ACF plots:

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Generate example data (AR(1) process)
np.random.seed(42)
data = np.random.normal(size=100)  # White noise (no autocorrelation)
# data = np.cumsum(data)  # Uncomment for trending data

# Plot ACF
plt.figure(figsize=(10, 5))
plot_acf(data, lags=20, alpha=0.05, title="ACF Plot")
plt.show()

What is the impact of autocorrelation or serial correlation on coefficient estimates?

In a more simple term when we identify presence of autocorrelation in the residuals of a model, we can say that some key variables are missing from the model. The model might not be correctly specified. The presence of autocorrelation implies that we cannot rely on the standard errors, and consequently p-­values. The effects of autocorrelated errors on least square estimators (OLS) are

  • If there are no lagged dependent variables among the explanatory variables in our model, the estimators are still going to be unbiased in presence of autocorrelation, however, they will no longer be efficient (the most optimal estimator).
  • If there are lagged dependent variables included in the model, the least square estimators may not be consistent in the presence of autocorrelation as n (sample size) tends to infinity.

What are the remedies for presence of Autocorrelation?

  1. When autocorrelated error terms are present in our model, we should investigate the absence of any other key explanatory variable first. If we cannot identify any such predictor to eliminate autocorrelation from our model, then we need to transform the variables. One of the widely used technique is just proceeding with a First difference model ( Note this is applicable for AR(1) cases).We need to simply regress Yt∗=Yt−Yt−1 on Xt,j∗=Xt,j−Xt−1,j for j=1,2,… using regression through the origin(i.e. without intercept term).
  2. Generalized Least Squares (GLS) adjusts for autocorrelation. Feasible GLS (FGLS) estimates autocorrelation and corrects it.
  3. Cochrane-Orcutt or Prais-Winsten Estimation uses iterative methods to correct AR(1) autocorrelation.

You can also check my posts on testing other assumptions:

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading