1 00:00:00,570 --> 00:00:03,350 In this segment, we justify some of the property is that 2 00:00:03,350 --> 00:00:05,230 the correlation coefficient that we 3 00:00:05,230 --> 00:00:07,780 claimed a little earlier. 4 00:00:07,780 --> 00:00:10,560 The most important properties of the correlation coefficient 5 00:00:10,560 --> 00:00:13,780 lies between minus 1 and plus 1. 6 00:00:13,780 --> 00:00:16,950 We will prove this property for the special case where we 7 00:00:16,950 --> 00:00:20,950 have random variables with zero means and unit variances. 8 00:00:20,950 --> 00:00:24,670 So standard deviations are also 1, so most of the terms 9 00:00:24,670 --> 00:00:27,560 here disappear and the correlation coefficient is 10 00:00:27,560 --> 00:00:30,690 simply the expected value of X times Y. 11 00:00:30,690 --> 00:00:33,280 We will show that in this special case the expected 12 00:00:33,280 --> 00:00:37,370 value of X times Y lies between minus 1 and 1. 13 00:00:37,370 --> 00:00:42,630 But the proof of this fact remains valid with a little 14 00:00:42,630 --> 00:00:45,790 bit of more algebra along similar lines 15 00:00:45,790 --> 00:00:48,501 for the general case. 16 00:00:48,501 --> 00:00:53,350 What we will do is we will consider this quantity here 17 00:00:53,350 --> 00:00:57,200 and expand this quadratic and write it as 18 00:00:57,200 --> 00:00:59,790 expected value of X squared. 19 00:00:59,790 --> 00:01:03,160 Then there's a cross term, which is minus 2 rho, the 20 00:01:03,160 --> 00:01:10,070 expected value of X times Y, plus rho squared, expected 21 00:01:10,070 --> 00:01:13,560 value of Y squared. 22 00:01:13,560 --> 00:01:17,840 Now since we assume that the random variables have 0 mean, 23 00:01:17,840 --> 00:01:20,370 this is the same as the variance and we assume that 24 00:01:20,370 --> 00:01:25,100 the variance is 1, so this term here is equal to 1. 25 00:01:25,100 --> 00:01:29,050 Now, the expected value of X times Y is the same as the 26 00:01:29,050 --> 00:01:31,170 correlation coefficient in this case. 27 00:01:31,170 --> 00:01:35,190 So we have minus 2 rho squared and from 28 00:01:35,190 --> 00:01:36,870 here we have rho squared. 29 00:01:36,870 --> 00:01:40,030 And by the previous argument, again this quantity, according 30 00:01:40,030 --> 00:01:43,740 to our assumptions, is equal to 1 so we're left with this 31 00:01:43,740 --> 00:01:49,830 expression, which is 1 minus rho squared. 32 00:01:49,830 --> 00:01:52,979 Now, notice that this is the expectation of a non-negative 33 00:01:52,979 --> 00:01:57,210 random variable so this quantity here must be 34 00:01:57,210 --> 00:01:58,560 non-negative. 35 00:01:58,560 --> 00:02:06,470 Therefore, 1 minus rho squared is non-negative, which means 36 00:02:06,470 --> 00:02:11,850 that rho squared is less than or equal to 1. 37 00:02:11,850 --> 00:02:15,230 And that's the same as requiring that rho lie between 38 00:02:15,230 --> 00:02:17,820 minus 1 and plus 1. 39 00:02:17,820 --> 00:02:21,310 And so we have established this important property, at 40 00:02:21,310 --> 00:02:24,150 least for the special case of 0 means and unit variances. 41 00:02:24,150 --> 00:02:28,250 But as I mentioned, it remains valid more generally. 42 00:02:28,250 --> 00:02:32,920 Now let us look at an extreme case, when the absolute value 43 00:02:32,920 --> 00:02:35,410 of rho is equal to 1. 44 00:02:35,410 --> 00:02:36,986 What happens in this case? 45 00:02:36,986 --> 00:02:43,410 In that case, this term is 0 and this implies that the 46 00:02:43,410 --> 00:02:46,870 expected value of the square of this random variable is 47 00:02:46,870 --> 00:02:48,250 equal to 0. 48 00:02:48,250 --> 00:02:51,770 Now here we have a non-negative random variable, 49 00:02:51,770 --> 00:02:55,390 and its expected value is 0, which means that when we 50 00:02:55,390 --> 00:02:58,470 calculate the expected value of this there will be no 51 00:02:58,470 --> 00:03:02,710 positive contributions and so the only contributions must be 52 00:03:02,710 --> 00:03:04,000 equal to 0. 53 00:03:04,000 --> 00:03:09,100 This means that X minus rho Y has to be equal to 0 with 54 00:03:09,100 --> 00:03:11,860 probability 1. 55 00:03:11,860 --> 00:03:17,260 So X is going to be equal to rho times Y and this will 56 00:03:17,260 --> 00:03:19,700 happen with essential certainty. 57 00:03:19,700 --> 00:03:23,250 Now also because the absolute value overall is equal to 1, 58 00:03:23,250 --> 00:03:30,490 this means that we have either X equal to Y or X equals to 59 00:03:30,490 --> 00:03:35,210 minus Y, in case rho is equal to minus 1. 60 00:03:35,210 --> 00:03:38,100 So we see that if the correlation coefficient has an 61 00:03:38,100 --> 00:03:42,280 absolute value of 1, then X and Y are related to each 62 00:03:42,280 --> 00:03:47,620 other according to a simple linear relation, and it's an 63 00:03:47,620 --> 00:03:49,579 extreme form of dependence between 64 00:03:49,579 --> 00:03:50,829 the two random variables.