Econometrics has become a cornerstone skill for careers in data science, economics, finance, consulting, and research. Whether you’re interviewing for a research analyst position, an academic program, or a data-driven role in the financial sector, you can expect to face technical econometrics questions.
This guide compiles common and advanced econometrics interview questions, along with clear, structured answers that will help you revise concepts and demonstrate applied knowledge. The format is conversational—just like how you’d respond in an actual interview.
Section 1: Basics of Econometrics
Q1. What are the main types of data that econometric models use?
A.
- Cross-sectional models – one time period, multiple units (e.g., household surveys).
- Time-series models – one unit, multiple periods (e.g., GDP growth, stock returns).
- Panel data models – multiple units over multiple periods (e.g., firm performance over 10 years).
- Structural models – based on economic theory with equations representing behavioral relationships.
- Reduced-form models – focus on correlations or predictive relationships, not necessarily causal interpretation.
Q2. What is the difference between correlation and causation in econometrics?
A. Correlation measures association, but causation requires establishing that changes in one variable directly affect another. Econometrics uses tools like instrumental variables, difference-in-differences, randomized control trials, and regression discontinuity to distinguish causal effects from spurious correlations.
Section 2: Classical Linear Regression (OLS)
Q3. What are the assumptions of the classical linear regression model (CLRM)?
A. The key Gauss–Markov assumptions are:
- Linearity in parameters.
- Random sampling.
- No perfect multicollinearity.
- Zero conditional mean of errors (exogeneity).
- Homoskedasticity (constant error variance).
- No autocorrelation (for time-series).
Under these assumptions, OLS estimators are BLUE (Best Linear Unbiased Estimators).
Q4. What happens if the error term is heteroskedastic?
A. OLS remains unbiased, but standard errors are biased, leading to invalid hypothesis tests. Remedies:
- Use robust standard errors (White correction).
- Transform variables (logarithmic).
- Weighted least squares.
Q5. What is multicollinearity, and why is it a problem?
A. Multicollinearity occurs when independent variables are highly correlated. It inflates variances of coefficient estimates, making them unstable and statistically insignificant. Detection: Variance Inflation Factor (VIF). Remedies: drop redundant variables, combine predictors (PCA), or collect more data.
Q6. What is endogeneity? Give examples.
A. Endogeneity arises when an explanatory variable is correlated with the error term. Causes include:
- Omitted variable bias (leaving out an important factor).
- Measurement error in variables.
- Simultaneity (mutual causation, e.g., price ↔ demand).
Solution: Instrumental Variables (IV), GMM, panel fixed effects, or natural experiments.
Q7. How can we address endogeneity?
A.
- Instrumental Variables (IV).
- Fixed Effects models.
- Control function approach.
Q8. What is 2SLS (Two-Stage Least Squares)?
A.
- Stage 1: regress endogenous regressor on instruments to get predicted values.
- Stage 2: regress dependent variable on predicted values.
This produces consistent estimates under valid instruments.
Q9. How do you interpret R-squared and Adjusted R-squared?
A.
- R² measures the proportion of variance explained by the model.
- Adjusted R² accounts for the number of predictors, penalizing overfitting.
In interviews, emphasize that a high R² doesn’t imply causality or model validity.
Section 3: Hypothesis Testing
Q10. What is the difference between t-test and F-test in regression?
A.
- t-test: tests the significance of individual coefficients.
- F-test: tests joint significance of multiple coefficients (e.g., do all slope coefficients = 0?).
Q11. How do you test for heteroskedasticity?
A.
- Breusch–Pagan test
- White’s test
- Graphical inspection of residuals vs. fitted values.
Q12. What are Type I and Type II errors?
A.
- Type I error: rejecting a true null (false positive).
- Type II error: failing to reject a false null (false negative).
Section 4: Time Series Econometrics
Q13. What is stationarity, and why is it important?
A. A stationary series has a constant mean, variance, and autocovariance over time. Stationarity matters because many time-series models (AR, MA, ARIMA, VAR) assume stable distributions to ensure reliable parameter estimation, forecasting, and hypothesis testing. Non-stationary series (like trending GDP) can lead to spurious regressions, where results appear significant but are statistically invalid.
Q14. How do you test for stationarity?
A. Common tests include:
KPSS test → Null: series is stationary.
Using multiple tests together gives more reliable evidence.
Augmented Dickey–Fuller (ADF) test → Null: unit root present (non-stationary).
Phillips–Perron (PP) test → robust to heteroskedasticity.
Q15. What is a unit root?
A. A unit root means the series has a stochastic trend (non-stationary). Example: random walk. If a variable has a unit root, shocks have permanent effects, making forecasting and inference more challenging. Econometricians typically difference the series (ΔYt) or use cointegration techniques to handle unit roots.
Q16. What is cointegration, and why is it important?
A. Cointegration occurs when two or more non-stationary series are linked by a long-run equilibrium relationship, even if they drift individually. For example, stock prices of competitors may each follow a random walk, but their difference is stationary. This allows use of Engle–Granger two-step method or Johansen test to model long-run relationships.
Q17. What is the difference between AR, MA, and ARMA models?
A.
- AR(p) (Autoregressive): current value depends on past values.
- MA(q) (Moving Average): current value depends on past shocks/errors.
- ARMA(p,q): combines AR and MA for stationary series.
If series is non-stationary, use ARIMA(p,d,q), where d is the number of differences applied.
Q18. What is an ARIMA model? When is it used?
A. ARIMA (AutoRegressive Integrated Moving Average) models combine AR and MA components with differencing (I) to handle non-stationary data. Widely used in forecasting macroeconomic indicators, stock prices, demand, etc. Seasonal versions are called SARIMA.
Q19. What is a VAR (Vector Autoregression) model?
A. VAR generalizes AR to multiple time series, allowing each variable to depend on lags of itself and others. It’s useful for capturing dynamic interdependencies (e.g., GDP, inflation, and interest rates together). But it requires stationarity and suffers from overparameterization if too many lags/variables are included.
Q20. What is Granger causality?
A. A variable X “Granger-causes” Y if past values of X contain information that helps predict Y, beyond Y’s own past. It’s not true causality but a statistical test for predictive power. Granger causality is tested within VAR frameworks.
Q21. What is ARCH/GARCH, and why are they important?
A. ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH (Generalized ARCH) models capture time-varying volatility in financial time series (e.g., stock returns). They model conditional variance as a function of past squared errors (ARCH) and past variances (GARCH). Key for risk management, VaR estimation, and option pricing.
Q22. What is structural break in time series, and how do you test for it?
A. Structural breaks occur when the underlying data-generating process changes (e.g., a policy change, COVID shock). Ignoring breaks can bias estimates. Tests: Chow test, Bai–Perron test, and rolling regressions.
Q23. What is the difference between short-run and long-run dynamics in time series?
A. Short-run dynamics capture immediate fluctuations (via lags, shocks). Long-run dynamics are captured via cointegration and error correction models (ECM), where deviations from equilibrium gradually adjust over time.
Q24. What is an Error Correction Model (ECM)?
A. ECM is used when variables are cointegrated. It links short-term fluctuations to long-term equilibrium. The error correction term (lagged residual from cointegration equation) measures how quickly deviations from the long-run relationship are corrected.
Q25. How do you forecast with time series models?
A. Forecasting requires:
- Ensuring stationarity (via differencing or transformation).
- Selecting appropriate lag lengths (AIC, BIC, HQIC).
- Estimating the model (ARIMA, VAR, GARCH, etc.).
- Checking residual diagnostics (autocorrelation, normality, stability).
- Generating forecasts and evaluating accuracy (RMSE, MAE, MAPE).
Q26. What is the difference between Fixed Effects (FE) and Random Effects (RE) models in panel data?
- Fixed Effects (FE):
- Controls for time-invariant unobserved heterogeneity (firm culture, geography, industry, etc.) by using within-entity variation.
- Assumes these unobserved characteristics are correlated with regressors.
- Removes bias by demeaning (entity mean subtraction), but cannot estimate coefficients on variables that do not vary over time.
- Best for causal inference when correlation between individual effects and regressors is likely.
- Random Effects (RE):
- Treats unobserved heterogeneity as random and uncorrelated with regressors.
- Uses both within-entity and between-entity variation, making it more efficient if assumptions hold.
- Allows estimation of time-invariant regressors (e.g., region, ownership structure).
- Requires the Hausman test to validate whether RE is consistent; if correlation exists, FE is preferred.
Q27. What is the Hausman test used for?
A. In panel data econometrics, the Hausman test checks whether fixed effects or random effects are appropriate. If the unobserved effect is correlated with regressors, fixed effects are consistent, while random effects become inconsistent.
