The primary objective of statistical inference process is to –
- estimate population parameter and set up the confidence interval for those estimates
- testing the statistical significance.
Now the terms may sound familiar if you have a background in Statistics. Even if you are a beginner, let me try to explain each of the components in detail.
First, let’s try to understand what a statistical hypothesis is.
A Statistical Hypothesis is a statement about a population or a characteristic distribution of the population which we need to verify on the basis of the information that we have collected from the sample.
Now before we try to test our hypothesis, we need to think about how to frame or design the hypothesis. This is why we need to first state our Null Hypothesis clearly. Null hypothesis is basically a statement which reflects the researcher or statisticians neutral attitude towards the outcome of the experiment or test. Now the acceptance or rejection of null hypothesis is only meaningful when we have an exactly opposite hypothesis which is known as Alternative Hypothesis.
The decision to accept or reject the null hypothesis is done on the basis of information that we have observed in our sample data. However, there is always a probability that the conclusion that we are drawing is wrong with respect to the population. Here comes the concept of Type I error and Type II error. But, before that we also need to understand what a critical region is.
Since our observed data or sample values can be expressed as a point in n-dimensional space, we specify a region of that n-dimensional space and then we try to find out if our test statistic lies within the boundary of that region or outside that boundary. So, basically we divide our entire sample space into 2 regions – the acceptance region and the critical region. The null hypothesis is rejected if the observed test statistic falls in the critical region.
Type I Error : The error of rejecting the null hypothesis when it is actually true is called Type I error. A very familiar scenario during pandemic is when the COVID test results come out to be negative when in reality the patient is infected with the virus. This type of error is referred to as false negatives and as you can understand have huge implications in any type of experiment. The probability of Type I error is also known as the level of significance.
Type II error : The error of accepting the null hypothesis when in reality it is false is known as Type II error. The Power of the null/ test hypothesis against the alternative hypothesis is given by (1 – probability of type II error).
So let’s summarize the steps that are required to solve any hypothesis testing problem-
- First we need to know the population parameter that we are trying to estimate
- Then we need to set up our null hypothesis and alternative hypothesis based on the parameter of interest.
- The choice of the test statistic which will help us to reflect on the probability of rejecting or accepting the null hypothesis
- Then we need to identify the critical region based on our choice of the test statistic and chosen level of significance
- Finally, we need to compute the test statistic from our sample observation and then find the conclusion of the experiment accordingly.
