Category

Friday, October 7, 2011

Correlation Coefficient

  • The correlation coefficient (or cross-correlation coefficient) gives a measure of correlation (linear dependence) between two variables X and Y. It takes the values between +1 and -1 inclusive.
    • if the value is +1: the two data sets have a perfect positive relationship (if one increases, then the other also increases)
    • if the value is -1: the two data sets have a perfect negative relationship

  • There are several correlation coefficients. Let's have a look at the common Pearson product-moment correlation coefficient (PPMCC).
  • PPMCC is defined as the covariance of the two variables divided by the product of their standard deviations:

                                           this defines the population correlation coefficient.

  • For a sample based representation, we have sample correlation coefficient:
  • Here, you can notice that divide by n (for E operation) in the numerator and the denominator is cancelled off .
  • Now let us see an example.
    • let data set X = {10, 20, 30, 40, 50, 60} and Y = {3, 5, 2, 3, 7, 4}
    • E(X) = (10+20+30+40+50+60) / 6 = 35; E(Y) = (3+5+2+3+7+4) / 6 = 4
    • Numerator: (10-35)(3-4)+(20-35)(5-4)+(30-35)(2-4)+(40-35)(3-4)+(50-35)(7-4)+(60-35)(4-4) = 60
    • Denominator: sqrt((10-35)^2 + (20-35)^2 + (30-35)^2 + (40-35)^2 + (50-35)^2 + (60-35)^2) * sqrt((3-4)^2 + (5-4)^2 + (2-4)^2 + (3-4)^2 + (7-4)^2 + (4-4)^2) = 167.332
    • r = 60/167.332 = 0.358569

  • When the two data sets are equal, we have:

    • If you calculate the PPMCC for X = {10, 20, 30, 40, 50} and Y = {1, 2, 3, 4, 5} you will find that it is equal to 1.
      • This means that for PPMCC to be equal to 1, you do not need the two data sets to be equal.
    • PPMCC can be used as a matching techniques
      • map matching
      • image matching
    Equation credit: wikipedia