Correlation denotes the degree of association between two variables.If changes in one variable corresponds to changes in the values of the other variable as well, it means that the two variables are correlated.So, when one variable increases if the other one also increases then these two variables are positively correlated.On the other hand if increase in one variable leads to decrease in other one it means they are negatively correlated.

Correlation can be linear and non-linear. If the changes in one variable maintains a constant ratio to the changes in the other variable then the correlation is said to be linear. We will talk about this in detail later.

Pearson’s Correlation:

Suppose we have ‘n’ numbers of observations for 2 variables i.e, x and y in our dataset and we want to check if they have any correlation between them.The formula to compute correlation coefficient between two variables is as shown below

r=cov(x,y)/(σx σy)

where σand σy are the standard deviation of the variables x and y and cov(x,y) denotes the covariance of x and y.Now, this formula is known as Pearson’s correlation coefficient formula.The simple correlation coefficient we calculate using MS excel, is actually Pearson’s correlation coefficient.Here is an example of how to calculate Pearson correlation coefficient in Excel.We have two variables X and Y with 13 observations.Please note to calculate correlation coefficient we need to have equal observations for both the variables.Now we need to use the function ‘CORREL’ as shown in the formula bar to find the correlation for the two variables X and Y.

Now, we just need to select the array for both the variables,separated by comma (including the label of the variables).So the formula will be something like this:

Pearson correlation coefficient = CORREL(A1:A13,B1:B13)

Now, let’s focus on some of the properties of the correlation coefficient (r)

  • Correlation coefficient does not have any unit,it is expressed as percentage.So in the above case when r is 0.327, it implies that the two variables X and Y are positively correlated (approx 33%).
  • r always lies between -1 and +1 but it cannot exceed 1.
  • Another point to note that when two variables are positively correlated it implies that higher values of one variable are associated with high values of the other one. Similarly when r is negative high values of one of the variables are associated with low values of the other.
  • In linear correlation, it is assumed that when we plot the two variables we will find a straight line correlation.However, when we have a small value of r that does not mean that we don’t have any association between the two variables.There is chance of having non-linear association between the two variables.That is why we need to check the scatter plot as well to understand if the pattern of points is linear or not.
  • Also, there is also chance of getting spurious correlation between two variables.When we observe very high correlation between two otherwise non-related variables chances are that correlation is spurious.So we always need to apply our judgement before drawing any conclusion.

Spearman Rank Correlation :

Now, let’s discuss the next topic on Spearman’s Rank correlation. Sometimes, we try to find the association between two variables which are not quantitative. Suppose we are trying determine the extent of association between intelligence and the efficiency of the employees for a company. These type of variables are qualitative. So we can solve this problem by ranking the employees, using numbers 1,2,3,… in order of their merit and efficiency. Now for each employee we have a pair of ranks and the correlation coefficient between these two ranks is called rank correlation coefficient. It is expressed by the following formula:

R = 1 – {(6∑d2)/(n3 – n)}, where d represents the difference of the ranks of an individual/observation in the two characters and n is number of individuals/observations.

  • R can lie between -1 and +1(Similar to Pearson correlation coefficient).When for one observation both the ranks are equal then we will find R =1.Again, R will be equal to -1 when the ranks are just the opposite.
  • Rank correlation coefficient R is used as a measure of the ‘degree of association between two attributes where measurements are not available.However, even when exact measurements are available we can use Spearman rank correlation. by simply assigning ranks to the individuals.Now, MS excel does not have in-built function for calculating Spearman Rank correlation.So we will have to do a two step process for the same.

In the above example, we have assigned ranks for both the variables X and Y. Although both of them are quantitative, just to show an example we have created this one.Now Excel’s in-built function RANK.AVG will automatically assign ranks for the variables.The formula will be similar to this one:

RANK_X = IFERROR(RANK.AVG(A2,$A$2:$A$13,1),””)

RANK_Y = =IFERROR(RANK.AVG(C2,$C$2:$C$13,1),””)

Now we will simply apply the Pearson correlation function ‘CORREL’ to calculate correlation between these two ranked variables.The formula will be similar to this.

Spearman Rank Correlation (R) = IFERROR(CORREL(B2:B13,D2:D13),0)

If we have qualitative variables we will have to assign ranks for these variables and calculate the correlation between them.

Pearson vs. Spearman Correlation: Key Differences

FeaturePearson CorrelationSpearman Correlation
Type of DataBest for continuous, normally distributed data.Works with ordinal, non-normal, or non-linear data.
Relationship TypeMeasures linear relationships.Measures monotonic (consistently increasing/decreasing) relationships.

Examples

  1. Pearson Example:
    • Relationship between temperature (℃) and ice cream sales (linear).
    • Result: r=0.9 (strong positive linear correlation).
  2. Spearman Example:
    • Relationship between education level (ordinal: 1=High School, 2=BSc, 3=MSc) and salary.
    • Result: ρ=0.75 (strong monotonic trend, but not necessarily linear).

PYTHON IMPLEMENTATION:

Hope you found this post helpful. Have a nice day 🙂

Discover more from SolutionShala

Subscribe now to keep reading and get access to the full archive.

Continue reading