In my previous post I described how to run a regression in Excel and generate the output.Here I will focus more into what to look for in the output table and how to derive insights from that.

Here is a sample of output table of linear regression generated in MS excel.

Now let’s discuss what are the things we are going to focus on.

Adjusted R sq: First of all, we are going to check the overall model fit.Adjusted R square basically tells us what percentage of variation in the dependent variable is being explained by the model.The closer it is to 100 % the better the model fit.As simple as that.Here the adjusted R square is around 94% which considered to be a pretty good fit.

Now also look at the statistical significance of the overall model fit which is captured by the F statistics in the ANOVA table. Look at the p value/Significance of F which is less than 0.05 that means at 95% confidence interval our overall model fit is better than an intercept only model. In other words, the predictors we have chosen for our model is capturing the variation significantly. Check my post here to understand the relationship between F statistic and R square.

After checking the overall model fit, we need to focus on statistical significance of individual predictors.So, we will simply check the p values associated with all the estimated coefficients for each predictor.

If we look at the p value of the estimated coefficient for the predictor ‘price per week’ it is lesser than the critical value of 0.05 at 5% level of significance or 95% confidence interval.So, we can say that this particular predictor is statistically significant.Now please keep in mind that the choice of confidence interval can vary.Usually, the standard practice is to use 95% confidence interval which allows 5% chance of error.If we want more accuracy then we can choose 99% confidence interval and for that case, the p value should be less than 0.1.

However, if we look at the other variables like ‘population of city’ or ‘monthly income of riders’ the p values are higher than 0.05 or even 0.1 and hence, we cannot conclude that they are statistically significant.So we can drop these variables and again re-run the regression.

Hope you found this article helpful.Happy learning 🙂

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading