1 00:00:16,990 --> 00:00:18,990 MICHALE FEE: OK, let's go ahead and get started. 2 00:00:18,990 --> 00:00:23,060 So today we're turning to a new topic called 3 00:00:23,060 --> 00:00:25,940 that basically focused on principal components 4 00:00:25,940 --> 00:00:29,750 analysis, which is a very cool way of analyzing 5 00:00:29,750 --> 00:00:32,990 high-dimensional data. 6 00:00:32,990 --> 00:00:36,110 Along the way, we're going to learn a little bit 7 00:00:36,110 --> 00:00:38,250 more linear algebra. 8 00:00:38,250 --> 00:00:41,810 So today, I'm going to talk to you about eigenvectors 9 00:00:41,810 --> 00:00:46,580 and eigenvalues which are one of the most fundamental concepts 10 00:00:46,580 --> 00:00:48,710 in linear algebra. 11 00:00:48,710 --> 00:00:53,360 And it's extremely important and widely applicable to a lot 12 00:00:53,360 --> 00:00:55,490 of different things. 13 00:00:55,490 --> 00:00:58,700 So eigenvalues and eigenvectors are important for everything 14 00:00:58,700 --> 00:01:05,030 from understanding energy levels and quantum mechanics 15 00:01:05,030 --> 00:01:09,320 to understanding the vibrational modes of a musical instrument, 16 00:01:09,320 --> 00:01:11,870 to analyzing the dynamics of differential 17 00:01:11,870 --> 00:01:17,540 equations of the sort that you find that describe 18 00:01:17,540 --> 00:01:25,280 neural circuits in the brain, and also for analyzing data 19 00:01:25,280 --> 00:01:28,010 and doing dimensionality reduction. 20 00:01:28,010 --> 00:01:30,980 So understanding eigenvectors and eigenvalues 21 00:01:30,980 --> 00:01:33,200 are very important for doing things 22 00:01:33,200 --> 00:01:36,050 like principal components analysis. 23 00:01:36,050 --> 00:01:40,790 So along the way, we're going to talk a little bit more 24 00:01:40,790 --> 00:01:41,930 about variance. 25 00:01:41,930 --> 00:01:44,900 We're going to extend the notion of variance 26 00:01:44,900 --> 00:01:47,840 that we're all familiar with in one dimension, 27 00:01:47,840 --> 00:01:52,250 like the width of a Gaussian or the width of a distribution 28 00:01:52,250 --> 00:01:56,120 of data to the case of multivariate Gaussian 29 00:01:56,120 --> 00:01:58,940 distributions or multivariate-- 30 00:01:58,940 --> 00:02:01,280 which means, it's basically the same thing 31 00:02:01,280 --> 00:02:04,040 as high-dimensional data. 32 00:02:04,040 --> 00:02:08,300 We're going to talk about how to compute a covariance matrix 33 00:02:08,300 --> 00:02:14,420 from data which describes how the different dimensions 34 00:02:14,420 --> 00:02:17,540 of the data are correlated with each other, what 35 00:02:17,540 --> 00:02:19,338 the variance in different dimensions is, 36 00:02:19,338 --> 00:02:21,380 and how those different dimensions are correlated 37 00:02:21,380 --> 00:02:22,560 with each other. 38 00:02:22,560 --> 00:02:24,830 And finally, we'll go through actually 39 00:02:24,830 --> 00:02:28,310 how to implement principal components analysis, which 40 00:02:28,310 --> 00:02:32,090 is useful for a huge number of things. 41 00:02:32,090 --> 00:02:35,570 I'll come back to many of the different applications 42 00:02:35,570 --> 00:02:37,700 of principal components analysis at the end. 43 00:02:37,700 --> 00:02:41,840 But I just want to mention that it's very commonly used 44 00:02:41,840 --> 00:02:46,070 in understanding high-dimensional data 45 00:02:46,070 --> 00:02:47,030 and neural circuits. 46 00:02:47,030 --> 00:02:50,600 So it's a very important way of describing 47 00:02:50,600 --> 00:02:54,420 how the state of the brain evolves as a function of time. 48 00:02:54,420 --> 00:02:57,020 So nowadays, you can record from hundreds 49 00:02:57,020 --> 00:03:00,080 or even thousands or tens of thousands of neurons 50 00:03:00,080 --> 00:03:02,130 simultaneously. 51 00:03:02,130 --> 00:03:04,250 And if you just look at all that data, 52 00:03:04,250 --> 00:03:06,710 it just looks like a complete mess. 53 00:03:06,710 --> 00:03:11,390 But somehow, underneath of all of that, 54 00:03:11,390 --> 00:03:13,430 the circuitry in the brain is going 55 00:03:13,430 --> 00:03:18,350 through discrete trajectories in some low-dimensional space 56 00:03:18,350 --> 00:03:24,120 within that high-dimensional mess of data. 57 00:03:24,120 --> 00:03:29,810 So our brains have something like 100 billion neurons 58 00:03:29,810 --> 00:03:35,780 in them-- about the same as the number of stars in our galaxy-- 59 00:03:35,780 --> 00:03:38,240 and yet, somehow all of those different neurons 60 00:03:38,240 --> 00:03:40,850 communicate with each other in a way that 61 00:03:40,850 --> 00:03:45,410 constrains the state of the brain to evolve along 62 00:03:45,410 --> 00:03:48,890 the low-dimensional trajectories that are our thoughts 63 00:03:48,890 --> 00:03:51,450 and perceptions. 64 00:03:51,450 --> 00:03:55,400 And so it's important to be able to visualize those trajectories 65 00:03:55,400 --> 00:03:58,040 in order to understand how that machine is working. 66 00:04:02,390 --> 00:04:05,090 OK, and then one more comment about principal components 67 00:04:05,090 --> 00:04:11,930 analysis, it's not actually the best way 68 00:04:11,930 --> 00:04:15,530 often of doing this kind of dimensionality reduction. 69 00:04:15,530 --> 00:04:18,470 But the basic idea of how principal 70 00:04:18,470 --> 00:04:22,580 components analysis works is so fundamental to all 71 00:04:22,580 --> 00:04:24,260 of the other techniques. 72 00:04:24,260 --> 00:04:28,250 It's sort of the base on which all of those other techniques 73 00:04:28,250 --> 00:04:30,890 are built conceptually. 74 00:04:30,890 --> 00:04:33,560 So that's why we're going to spend a lot of time 75 00:04:33,560 --> 00:04:36,260 talking about this. 76 00:04:36,260 --> 00:04:39,283 OK, so let's start with eigenvectors and eigenvalues. 77 00:04:39,283 --> 00:04:41,200 So remember, we've been talking about the idea 78 00:04:41,200 --> 00:04:45,500 that matrix multiplication performs a transformation. 79 00:04:45,500 --> 00:04:49,360 So we can have a vector x that we multiply it 80 00:04:49,360 --> 00:04:52,810 by matrix A. It transforms that set of vectors 81 00:04:52,810 --> 00:04:55,750 x into some other set of vectors y. 82 00:04:55,750 --> 00:05:01,510 And we can go from y back to x by multiplying by A inverse-- 83 00:05:01,510 --> 00:05:06,130 if the determinant of that matrix A is not equal to zero. 84 00:05:06,130 --> 00:05:08,620 So we've talked about a number of different kinds of matrix 85 00:05:08,620 --> 00:05:11,620 transformations by introducing perturbations 86 00:05:11,620 --> 00:05:12,900 on the identity matrix. 87 00:05:12,900 --> 00:05:16,750 So if we have diagonal matrices, where one of the elements 88 00:05:16,750 --> 00:05:23,950 is slightly larger than 1, the other diagonal element 89 00:05:23,950 --> 00:05:29,200 is equal to 1, you get a stretch of this set of input vectors 90 00:05:29,200 --> 00:05:32,840 along the x-axis. 91 00:05:32,840 --> 00:05:36,670 Now, that process of stretching vectors 92 00:05:36,670 --> 00:05:41,200 along a particular direction has built into it 93 00:05:41,200 --> 00:05:46,000 the idea that there are special directions in this matrix 94 00:05:46,000 --> 00:05:47,230 transformation. 95 00:05:47,230 --> 00:05:49,250 So what do I mean by that? 96 00:05:49,250 --> 00:05:53,200 So most of these vectors here, each one of these red dots 97 00:05:53,200 --> 00:05:56,560 is one of those x's, one of those initial vectors-- 98 00:05:56,560 --> 00:05:58,120 if you look at the transformation 99 00:05:58,120 --> 00:06:02,150 from x to y going-- 100 00:06:02,150 --> 00:06:05,620 so that's the x that we put into this matrix transformation. 101 00:06:05,620 --> 00:06:08,920 When we multiply by y, we see that that vector 102 00:06:08,920 --> 00:06:12,320 has been stretched along the x direction. 103 00:06:12,320 --> 00:06:16,000 So for most of these vectors, that stretch 104 00:06:16,000 --> 00:06:19,150 involves a change in the direction of the vector. 105 00:06:19,150 --> 00:06:23,380 Going from x to y means that the vector has been rotated. 106 00:06:23,380 --> 00:06:28,800 So you can see that the green vector is at a different angle 107 00:06:28,800 --> 00:06:30,490 than the red vector. 108 00:06:30,490 --> 00:06:33,970 So there's been a rotation, as well as a stretch. 109 00:06:33,970 --> 00:06:37,480 So you can see that's true for that vector, that vector, 110 00:06:37,480 --> 00:06:38,930 and so on. 111 00:06:38,930 --> 00:06:42,220 So you can see, though, that there are other directions that 112 00:06:42,220 --> 00:06:43,840 are not rotated. 113 00:06:43,840 --> 00:06:45,700 So here's another. 114 00:06:45,700 --> 00:06:47,470 I just drew that same picture over again. 115 00:06:47,470 --> 00:06:50,290 But now, let's look at this particular vector, 116 00:06:50,290 --> 00:06:51,640 this particular red vector. 117 00:06:51,640 --> 00:06:54,340 You can see that when that red vector is 118 00:06:54,340 --> 00:06:59,340 stretched by this matrix, it's not rotated. 119 00:06:59,340 --> 00:07:02,170 It's simply scaled. 120 00:07:02,170 --> 00:07:04,720 Same for this vector right here. 121 00:07:04,720 --> 00:07:06,790 That vector is not rotated. 122 00:07:06,790 --> 00:07:08,710 It's just scaled, in this case, by 1. 123 00:07:11,650 --> 00:07:13,870 But let's take a look at this other transformation. 124 00:07:13,870 --> 00:07:19,990 So this transformation produces a stretch in the y direction 125 00:07:19,990 --> 00:07:23,890 and a compression in the x direction. 126 00:07:23,890 --> 00:07:27,520 So I'm just showing you a subset of those vectors now. 127 00:07:27,520 --> 00:07:31,120 You can see that, again, this vector is 128 00:07:31,120 --> 00:07:33,520 rotated by that transformation. 129 00:07:33,520 --> 00:07:36,700 This vector is rotated by that transformation. 130 00:07:36,700 --> 00:07:38,590 But other vectors are not rotated. 131 00:07:38,590 --> 00:07:42,360 So again, this vector is compressed. 132 00:07:42,360 --> 00:07:45,430 It's simply scaled, but it's not rotated. 133 00:07:45,430 --> 00:07:47,240 And this vector is stretched. 134 00:07:47,240 --> 00:07:51,290 It's scaled but not rotated. 135 00:07:51,290 --> 00:07:53,500 Does that make sense? 136 00:07:53,500 --> 00:07:56,910 OK, so these transformations here 137 00:07:56,910 --> 00:08:01,080 are given by a diagonal matrices where the off-diagonal elements 138 00:08:01,080 --> 00:08:01,650 are zero. 139 00:08:01,650 --> 00:08:03,780 And the diagonal elements are just some constant. 140 00:08:09,720 --> 00:08:15,540 So for all diagonal matrices, these special directions, 141 00:08:15,540 --> 00:08:19,200 the directions on which vectors are simply scaled 142 00:08:19,200 --> 00:08:25,680 but not rotated by that matrix by that transformation, 143 00:08:25,680 --> 00:08:28,770 it's the vectors along the axes that are scaled 144 00:08:28,770 --> 00:08:30,150 and not rotated-- 145 00:08:30,150 --> 00:08:32,370 along the x-axis or the y-axis. 146 00:08:35,230 --> 00:08:39,230 And you can see that by taking this matrix A, 147 00:08:39,230 --> 00:08:42,799 this general diagonal matrix, multiplying it 148 00:08:42,799 --> 00:08:47,420 by a vector along the x-axis, and you 149 00:08:47,420 --> 00:08:50,330 can see that that is just a constant, lambda 150 00:08:50,330 --> 00:08:53,190 1, times that vector. 151 00:08:53,190 --> 00:08:57,320 So we take this times this, plus this times this, 152 00:08:57,320 --> 00:08:59,120 is equal to lambda 1. 153 00:08:59,120 --> 00:09:02,270 This times this plus this times this is equal to zero. 154 00:09:02,270 --> 00:09:05,420 So you can see that A times that vector in the x direction 155 00:09:05,420 --> 00:09:08,960 is simply a scaled version of the vector in the x direction. 156 00:09:08,960 --> 00:09:12,712 And the scaling factor is simply the constant 157 00:09:12,712 --> 00:09:13,670 that's on the diagonal. 158 00:09:17,950 --> 00:09:20,770 So we can write this in matrix notation 159 00:09:20,770 --> 00:09:28,810 as this lambda, this stretch vector, this diagonal matrix, 160 00:09:28,810 --> 00:09:33,280 times a unit vector in the x direction. 161 00:09:33,280 --> 00:09:35,390 That's the standard basis vector, 162 00:09:35,390 --> 00:09:36,940 the first standard basis vector. 163 00:09:36,940 --> 00:09:40,030 So that's a unit vector in the x direction 164 00:09:40,030 --> 00:09:44,486 is equal to lambda 1 times a vector in the x direction. 165 00:09:47,430 --> 00:09:49,230 And if we do that same multiplication 166 00:09:49,230 --> 00:09:52,320 for a vector in the y direction, we 167 00:09:52,320 --> 00:09:57,370 see that we get a constant times that vector in the y direction. 168 00:09:57,370 --> 00:09:59,380 So we have another equation. 169 00:09:59,380 --> 00:10:04,080 So this particular matrix, this diagonal matrix, 170 00:10:04,080 --> 00:10:11,000 has two vectors that are in special directions in the sense 171 00:10:11,000 --> 00:10:12,470 that they aren't rotated. 172 00:10:12,470 --> 00:10:15,690 They're just stretched. 173 00:10:15,690 --> 00:10:18,690 So diagonal matrices have the property 174 00:10:18,690 --> 00:10:23,830 that they map any vector parallel to the standard basis 175 00:10:23,830 --> 00:10:26,295 into another vector along the standard basis. 176 00:10:30,680 --> 00:10:37,130 So that now is a general n-dimensional diagonal matrix 177 00:10:37,130 --> 00:10:40,910 with these lambdas, which are just scalar 178 00:10:40,910 --> 00:10:43,070 constants along the diagonal. 179 00:10:43,070 --> 00:10:46,220 And there are n equations that look 180 00:10:46,220 --> 00:10:51,650 like this that say that this matrix times a vector 181 00:10:51,650 --> 00:10:54,830 in the direction of a standard basis vector 182 00:10:54,830 --> 00:10:57,050 is equal to a constant times that 183 00:10:57,050 --> 00:11:00,520 vector in the standard basis direction. 184 00:11:00,520 --> 00:11:02,660 Any questions about that? 185 00:11:02,660 --> 00:11:06,240 Everything else just flows from this very easily. 186 00:11:06,240 --> 00:11:10,930 So if you have any questions about that, just ask. 187 00:11:10,930 --> 00:11:17,110 OK, that equation is called the eigenvalue equation. 188 00:11:17,110 --> 00:11:25,560 And it describes a property of this matrix lambda. 189 00:11:29,190 --> 00:11:35,060 So any vector v that's mapped by a matrix A 190 00:11:35,060 --> 00:11:39,910 onto a parallel vector lambda v is called 191 00:11:39,910 --> 00:11:44,660 an eigenvector of this matrix. 192 00:11:44,660 --> 00:11:49,330 So we're going to generalize now from diagonal matrices that 193 00:11:49,330 --> 00:11:57,670 look like this to an arbitrary matrix A. So the statement 194 00:11:57,670 --> 00:12:01,060 is that any vector, that when you multiply it 195 00:12:01,060 --> 00:12:07,190 by a matrix A that gets transformed into a vector 196 00:12:07,190 --> 00:12:12,600 parallel to v, it's called an eigenvector of A. 197 00:12:12,600 --> 00:12:18,600 And the one vector that this is true for that 198 00:12:18,600 --> 00:12:21,240 isn't called an eigenvector is the zero vector 199 00:12:21,240 --> 00:12:29,910 because you can see that a zero vector here times any matrix 200 00:12:29,910 --> 00:12:33,440 is equal to zero. 201 00:12:33,440 --> 00:12:35,520 OK, so we exclude the zero vector. 202 00:12:35,520 --> 00:12:38,707 We don't call the zero vector an eigenvector. 203 00:12:43,220 --> 00:12:50,510 So typically a matrix, an n-dimensional matrix, 204 00:12:50,510 --> 00:12:53,300 has n eigenvectors and n eigenvalues. 205 00:12:53,300 --> 00:12:55,400 Oh, and I forgot to say that the scale factor 206 00:12:55,400 --> 00:13:04,860 lambda is called the eigenvalue associated with that vector v. 207 00:13:04,860 --> 00:13:08,580 So now, let's take a look at a matrix 208 00:13:08,580 --> 00:13:12,380 that's a little more complicated than our diagonal matrix. 209 00:13:12,380 --> 00:13:16,980 Let's take one of these rotated stretch matrices. 210 00:13:16,980 --> 00:13:19,370 So remember, in the last class, we 211 00:13:19,370 --> 00:13:22,040 built a matrix like this that produces 212 00:13:22,040 --> 00:13:26,690 a stretch of a factor of 2 along a 45-degree axis. 213 00:13:26,690 --> 00:13:31,790 And we built that matrix by multiplying it together 214 00:13:31,790 --> 00:13:36,410 by basically taking this set of vectors, 215 00:13:36,410 --> 00:13:40,670 rotating them, stretching them, and then rotating them back. 216 00:13:40,670 --> 00:13:44,600 So we did that by three separate transformations 217 00:13:44,600 --> 00:13:47,330 that we applied successively. 218 00:13:47,330 --> 00:13:53,270 And we did that by multiplying phi transpose lambda and then 219 00:13:53,270 --> 00:13:54,920 phi. 220 00:13:54,920 --> 00:14:00,890 So let's see what the special directions are for this matrix 221 00:14:00,890 --> 00:14:02,720 transformation. 222 00:14:02,720 --> 00:14:06,140 So you can see that most of these vectors that we've 223 00:14:06,140 --> 00:14:08,645 multiplied by this matrix get rotated. 224 00:14:13,320 --> 00:14:16,170 And you can see that even vectors 225 00:14:16,170 --> 00:14:20,940 along the standard basis directions get rotated. 226 00:14:20,940 --> 00:14:23,830 So what are the special directions for this matrix? 227 00:14:23,830 --> 00:14:27,460 Well, they're going to be these vectors right here. 228 00:14:27,460 --> 00:14:30,330 So this vector along this 45-degree line 229 00:14:30,330 --> 00:14:32,910 gets transformed. 230 00:14:32,910 --> 00:14:34,270 It's not rotated. 231 00:14:34,270 --> 00:14:36,990 It gets stretched by a factor of 1. 232 00:14:36,990 --> 00:14:39,240 And this vector here gets stretched. 233 00:14:42,720 --> 00:14:47,330 OK, so you can see that this matrix has eigenvectors 234 00:14:47,330 --> 00:14:52,190 that are along this 45-degree axis and that 45-degree axis. 235 00:14:55,510 --> 00:14:59,890 So in general, let's calculate what 236 00:14:59,890 --> 00:15:05,150 are the eigenvectors and eigenvalues 237 00:15:05,150 --> 00:15:09,475 for a general rotated transformation matrix. 238 00:15:12,620 --> 00:15:13,460 So let's do that. 239 00:15:13,460 --> 00:15:19,990 Let's take this matrix A and multiply it by a vector x. 240 00:15:19,990 --> 00:15:22,210 And we're going to ask what vectors 241 00:15:22,210 --> 00:15:27,100 x satisfy the properties that, when they're multiplied by A, 242 00:15:27,100 --> 00:15:29,770 are equal to a constant times x. 243 00:15:29,770 --> 00:15:33,700 So we're going to ask what are the eigenvectors of this matrix 244 00:15:33,700 --> 00:15:36,520 A that we've constructed in this form? 245 00:15:40,533 --> 00:15:41,950 So what we're going to do is we're 246 00:15:41,950 --> 00:15:47,620 going to replace A with this product of matrices, of three 247 00:15:47,620 --> 00:15:49,700 matrices. 248 00:15:49,700 --> 00:15:53,300 We're going to multiply this equation on both sides 249 00:15:53,300 --> 00:15:58,250 by phi transpose on the left side, by phi transpose. 250 00:15:58,250 --> 00:16:05,980 OK, so phi transpose times this, is equal to A sabai, x 251 00:16:05,980 --> 00:16:10,390 subai, times 5 transpose on the left. 252 00:16:10,390 --> 00:16:11,320 What happens here? 253 00:16:14,590 --> 00:16:17,030 Remember phi is a rotation matrix. 254 00:16:17,030 --> 00:16:20,850 What is phi transpose phi? 255 00:16:20,850 --> 00:16:22,820 Anybody remember? 256 00:16:22,820 --> 00:16:23,390 Good. 257 00:16:23,390 --> 00:16:27,110 Because for rotation matrix, the inverse, 258 00:16:27,110 --> 00:16:31,160 the transpose of a rotation matrix, is its inverse. 259 00:16:31,160 --> 00:16:35,360 And so phi transpose phi is just equal to the identity matrix. 260 00:16:35,360 --> 00:16:37,400 So that goes away. 261 00:16:37,400 --> 00:16:40,220 And we're left with lambda phi transpose 262 00:16:40,220 --> 00:16:43,490 x equals A phi transpose x. 263 00:16:49,380 --> 00:16:55,030 So remember that we just wrote down 264 00:16:55,030 --> 00:16:59,380 that if we have a diagonal matrix 265 00:16:59,380 --> 00:17:05,065 lambda, that the eigenvectors are the standard basis vectors. 266 00:17:12,780 --> 00:17:14,790 So what does that mean? 267 00:17:14,790 --> 00:17:17,280 If we look at this equation here, 268 00:17:17,280 --> 00:17:24,930 and we look at this equation here, 269 00:17:24,930 --> 00:17:31,850 it seems like phi transpose x is an eigenvector of this equation 270 00:17:31,850 --> 00:17:36,010 as long as phi transpose x is equal to one 271 00:17:36,010 --> 00:17:38,170 of the standard basis vectors. 272 00:17:38,170 --> 00:17:39,200 Does that make sense? 273 00:17:41,980 --> 00:17:46,290 So we know this solution is satisfied 274 00:17:46,290 --> 00:17:50,250 by phi transpose x is equal to one of the standard basis 275 00:17:50,250 --> 00:17:51,195 vectors. 276 00:17:51,195 --> 00:17:52,320 Does that make sense? 277 00:18:01,430 --> 00:18:06,080 So if we replace phi transpose x with one of the standard basis 278 00:18:06,080 --> 00:18:08,405 vectors, then that solves this equation. 279 00:18:13,610 --> 00:18:18,020 So what that means is that the solution 280 00:18:18,020 --> 00:18:22,570 to this eigenvalue equation is that the eigenvalues 281 00:18:22,570 --> 00:18:27,380 A are simply the diagonal elements of this lambda here. 282 00:18:30,790 --> 00:18:35,100 And the eigenvectors are just x, where 283 00:18:35,100 --> 00:18:40,800 x is equal to phi times the standard basis vectors. 284 00:18:40,800 --> 00:18:44,520 We just solve for x by multiplying both sides 285 00:18:44,520 --> 00:18:47,900 by phi transpose inverse. 286 00:18:47,900 --> 00:18:50,055 What's phi transpose inverse? 287 00:18:50,055 --> 00:18:50,555 phi. 288 00:18:53,080 --> 00:18:55,120 So we multiply both sides by phi. 289 00:18:55,120 --> 00:18:58,700 This becomes the identity matrix. 290 00:18:58,700 --> 00:19:03,220 And we have x equals phi times this set of standard basis 291 00:19:03,220 --> 00:19:05,760 vectors. 292 00:19:05,760 --> 00:19:07,440 Any questions about that? 293 00:19:07,440 --> 00:19:11,570 That probably went by pretty fast. 294 00:19:11,570 --> 00:19:17,080 But does everyone believe this? 295 00:19:17,080 --> 00:19:18,430 We went through that. 296 00:19:18,430 --> 00:19:22,930 We went through both examples of how this equation is 297 00:19:22,930 --> 00:19:27,730 true for the case where lambda is a diagonal matrix 298 00:19:27,730 --> 00:19:32,180 and the e's are the standard basis vectors. 299 00:19:32,180 --> 00:19:38,320 And if we solve for the eigenvectors of this equation 300 00:19:38,320 --> 00:19:43,090 where A has this form of phi lambda phi transpose, 301 00:19:43,090 --> 00:19:45,880 you can see that the eigenvectors 302 00:19:45,880 --> 00:19:54,890 are given by this matrix times a standard basis vector. 303 00:19:54,890 --> 00:19:57,320 So any standard basis vector times phi 304 00:19:57,320 --> 00:20:00,050 will give you an eigenvector of this equation here. 305 00:20:13,460 --> 00:20:16,380 Let's push on. 306 00:20:16,380 --> 00:20:19,640 And the eigenvalues are just these diagonal elements 307 00:20:19,640 --> 00:20:21,110 of this lambda. 308 00:20:27,730 --> 00:20:29,780 What are these? 309 00:20:29,780 --> 00:20:32,890 So now, we're going to figure out what these things are, 310 00:20:32,890 --> 00:20:37,450 and how to just see what they are. 311 00:20:37,450 --> 00:20:43,150 These eigenvectors here are given by phi 312 00:20:43,150 --> 00:20:46,540 times a standard basis vector. 313 00:20:46,540 --> 00:20:51,200 So phi is a rotation matrix, right? 314 00:20:51,200 --> 00:20:55,430 So phi times a standard basis vector is just what? 315 00:20:55,430 --> 00:20:57,950 It's just a standard basis vector rotated. 316 00:21:02,010 --> 00:21:06,350 So let's just solve for these two x's. 317 00:21:06,350 --> 00:21:09,990 We're going to take phi, which was this 45-degree rotation 318 00:21:09,990 --> 00:21:12,270 matrix, and we're going to multiply it 319 00:21:12,270 --> 00:21:17,860 by the standard basis vector in the x direction. 320 00:21:17,860 --> 00:21:19,870 So what is that? 321 00:21:19,870 --> 00:21:21,060 Just multiply this out. 322 00:21:21,060 --> 00:21:25,870 You'll see that this is just a vector along a 45-degree line. 323 00:21:30,380 --> 00:21:34,370 So this eigenvector, this first eigenvector here, 324 00:21:34,370 --> 00:21:39,080 is just a vector on the 45-degree line, 1 over root 2. 325 00:21:39,080 --> 00:21:40,130 It's a unit vector. 326 00:21:40,130 --> 00:21:43,880 That's why it's got the 1 over root 2 in it. 327 00:21:43,880 --> 00:21:47,550 The second eigenvector is just phi times e2. 328 00:21:47,550 --> 00:21:55,580 So it's a rotated version of the y standard basis vector, which 329 00:21:55,580 --> 00:21:59,588 is 1 over root 2 minus 1, 1. 330 00:21:59,588 --> 00:22:01,520 That's this vector. 331 00:22:04,140 --> 00:22:12,020 So our two eigenvectors we derived for this matrix that 332 00:22:12,020 --> 00:22:15,860 produces this stretch along a 45-degree line, the two 333 00:22:15,860 --> 00:22:21,110 eigenvectors are the vector, 45-degree vector 334 00:22:21,110 --> 00:22:23,720 in this quadrant, and the 45-degree vector 335 00:22:23,720 --> 00:22:25,550 in that quadrant. 336 00:22:25,550 --> 00:22:29,330 Notice it's just a rotated basis set. 337 00:22:36,800 --> 00:22:41,560 So notice that the eigenvectors are just 338 00:22:41,560 --> 00:22:49,140 the columns of our rotated matrix. 339 00:22:49,140 --> 00:22:50,165 So let me recap. 340 00:22:52,680 --> 00:22:58,460 If you have a matrix that you've constructed like this, 341 00:22:58,460 --> 00:23:07,460 as a matrix that produces a stretch in a rotated frame, 342 00:23:07,460 --> 00:23:11,300 the eigenvalues are just the diagonal elements of the lambda 343 00:23:11,300 --> 00:23:14,510 matrix that you put in there to build that thing, 344 00:23:14,510 --> 00:23:17,120 to build that matrix. 345 00:23:17,120 --> 00:23:21,020 And the eigenvectors are just the columns 346 00:23:21,020 --> 00:23:22,175 of the rotation matrix. 347 00:23:29,420 --> 00:23:32,290 OK, so let me summarize. 348 00:23:32,290 --> 00:23:38,510 A symmetric matrix can always be written like this, 349 00:23:38,510 --> 00:23:40,530 where phi is a rotation matrix. 350 00:23:40,530 --> 00:23:42,480 And lambda is a diagonal matrix that 351 00:23:42,480 --> 00:23:45,870 tells you how much the different axes are stretched. 352 00:23:49,010 --> 00:23:53,150 The eigenvectors of this matrix A are the columns of phi. 353 00:23:53,150 --> 00:23:57,970 They are the basis vectors, the new basis vectors, 354 00:23:57,970 --> 00:24:00,190 in this rotated basis set. 355 00:24:04,510 --> 00:24:07,150 So remember, we can [AUDIO OUT] this rotation 356 00:24:07,150 --> 00:24:14,100 matrix as a set of basis vectors, as the columns. 357 00:24:14,100 --> 00:24:18,300 And that set of basis vectors are the eigenvectors 358 00:24:18,300 --> 00:24:23,690 of any matrix that you construct like this. 359 00:24:23,690 --> 00:24:27,590 And the eigenvalues are just the diagonal elements of the lambda 360 00:24:27,590 --> 00:24:30,390 that you put in there. 361 00:24:30,390 --> 00:24:31,930 All right, any questions about that? 362 00:24:34,970 --> 00:24:38,960 For the most part, we're going to be 363 00:24:38,960 --> 00:24:44,300 working with matrices that are symmetric, 364 00:24:44,300 --> 00:24:46,830 that can be built like this. 365 00:25:00,090 --> 00:25:04,150 So eigenvectors are not unique. 366 00:25:04,150 --> 00:25:15,030 So if x eigenvector of A, then any scaled version of x 367 00:25:15,030 --> 00:25:16,590 is also an eigenvector. 368 00:25:16,590 --> 00:25:21,240 Remember, an eigenvector is a vector 369 00:25:21,240 --> 00:25:24,150 that when you multiply it by a matrix 370 00:25:24,150 --> 00:25:27,900 just gets stretched and not rotated. 371 00:25:27,900 --> 00:25:31,080 What that means is that any vector in that direction 372 00:25:31,080 --> 00:25:33,960 will also be stretched and not rotated. 373 00:25:33,960 --> 00:25:36,810 So eigenvectors are not unique. 374 00:25:36,810 --> 00:25:39,210 Any scaled version of an eigenvector 375 00:25:39,210 --> 00:25:42,930 is also an eigenvector. 376 00:25:42,930 --> 00:25:45,780 When we write down eigenvectors of a matrix, 377 00:25:45,780 --> 00:25:48,240 we usually write down unit vectors 378 00:25:48,240 --> 00:25:51,960 to avoid this ambiguity. 379 00:25:56,350 --> 00:25:59,920 So we usually write eigenvectors as unit vectors. 380 00:25:59,920 --> 00:26:03,280 For matrices of n dimensions, there 381 00:26:03,280 --> 00:26:06,820 are typically n different unit eigenvectors-- 382 00:26:06,820 --> 00:26:09,880 n different vectors in different directions that 383 00:26:09,880 --> 00:26:12,520 have the special properties that they're just stretched 384 00:26:12,520 --> 00:26:13,420 and not rotated. 385 00:26:16,100 --> 00:26:21,770 So for our two-dimensional matrices that produce stretch 386 00:26:21,770 --> 00:26:24,200 in one direction, the special directions are-- 387 00:26:27,440 --> 00:26:32,130 sorry, so here is a two-dimensional, two-by-two 388 00:26:32,130 --> 00:26:34,630 matrix that produces a stretch in this direction. 389 00:26:34,630 --> 00:26:37,830 There are two eigenvectors, two unit eigenvectors, 390 00:26:37,830 --> 00:26:40,392 one in this direction and one in that direction. 391 00:26:44,550 --> 00:26:49,140 And notice, that because the eigenvectors are 392 00:26:49,140 --> 00:26:52,980 the columns of this rotation matrix, 393 00:26:52,980 --> 00:26:59,110 the eigenvectors form a complete orthonormal basis set. 394 00:26:59,110 --> 00:27:01,260 And that is true. 395 00:27:01,260 --> 00:27:05,310 That statement is true only for symmetric matrices 396 00:27:05,310 --> 00:27:08,500 that are constructed like this. 397 00:27:17,100 --> 00:27:20,250 So now, let's calculate what the eigenvalues are 398 00:27:20,250 --> 00:27:28,040 for a general two-dimensional matrix A. So here's our matrix 399 00:27:28,040 --> 00:27:32,100 A. That's an eigenvector. 400 00:27:32,100 --> 00:27:35,220 Any vector x that satisfies that equation 401 00:27:35,220 --> 00:27:36,900 is called an eigenvector. 402 00:27:36,900 --> 00:27:39,540 And that's the eigenvalue associated 403 00:27:39,540 --> 00:27:41,670 with that eigenvector. 404 00:27:41,670 --> 00:27:44,820 We can rewrite this equation as A times 405 00:27:44,820 --> 00:27:49,000 x equals lambda i times x-- 406 00:27:49,000 --> 00:27:54,600 just like A equals b, then equals 1 times b. 407 00:27:57,180 --> 00:28:01,700 We can subtract that from both sides, 408 00:28:01,700 --> 00:28:05,990 and we get A minus lambda i times x equals zero. 409 00:28:05,990 --> 00:28:09,905 So that is a different way of writing an eigenvalue equation. 410 00:28:13,120 --> 00:28:14,850 Now, what we're to do is we're going 411 00:28:14,850 --> 00:28:19,650 to solve for lambdas that satisfy this equation. 412 00:28:19,650 --> 00:28:22,560 And we only want solutions where x is not equal to zero. 413 00:28:32,160 --> 00:28:33,990 So this is just a matrix. 414 00:28:33,990 --> 00:28:38,490 A minus lambda i is just a matrix. 415 00:28:38,490 --> 00:28:48,290 So how do we know whether this matrix has solutions 416 00:28:48,290 --> 00:28:50,340 where x is not equal to zero? 417 00:28:56,590 --> 00:28:59,410 Any ideas? 418 00:28:59,410 --> 00:29:00,422 [INAUDIBLE] 419 00:29:00,422 --> 00:29:02,210 AUDIENCE: [INAUDIBLE] 420 00:29:02,210 --> 00:29:06,020 MICHALE FEE: Is, so what do we need the determinant to do? 421 00:29:06,020 --> 00:29:10,970 AUDIENCE: [INAUDIBLE] 422 00:29:10,970 --> 00:29:14,030 MICHALE FEE: Has to be zero. 423 00:29:14,030 --> 00:29:21,540 If the determinant of this matrix is not equal to zero, 424 00:29:21,540 --> 00:29:24,610 then the only solution to this equation is x equals zero. 425 00:29:24,610 --> 00:29:29,100 OK, so we solve this equation. 426 00:29:29,100 --> 00:29:34,680 We ask what values of lambda give us a zero 427 00:29:34,680 --> 00:29:38,490 determinant in this matrix. 428 00:29:38,490 --> 00:29:40,980 So let's write down an arbitrary A, 429 00:29:40,980 --> 00:29:46,720 an arbitrary two-dimensional matrix A, 2D, 2 by 2. 430 00:29:46,720 --> 00:29:50,780 We can write A minus lambda i like this. 431 00:29:50,780 --> 00:29:57,170 Remember, lambda i is just lambdas on the diagonals. 432 00:29:57,170 --> 00:30:00,100 The determinant of A minus lambda i 433 00:30:00,100 --> 00:30:03,200 is just the product of the diagonal elements 434 00:30:03,200 --> 00:30:07,660 minus the product of the off-diagonal elements. 435 00:30:07,660 --> 00:30:10,510 And we set that equal to zero. 436 00:30:10,510 --> 00:30:11,760 And we solve for lambda. 437 00:30:15,150 --> 00:30:18,390 And that just looks like a polynomial. 438 00:30:26,650 --> 00:30:30,830 OK, so the solutions to that polynomial 439 00:30:30,830 --> 00:30:33,560 solve what's called the characteristic equation 440 00:30:33,560 --> 00:30:39,620 of this matrix A. And those are the eigenvalues 441 00:30:39,620 --> 00:30:45,710 of this arbitrary matrix A, this 2D, two-by-two matrix. 442 00:30:45,710 --> 00:30:47,850 So there is characteristic equation. 443 00:30:47,850 --> 00:30:50,270 There is the characteristic polynomial. 444 00:30:50,270 --> 00:30:56,340 We can solve for lambda just by using the quadratic formula. 445 00:30:56,340 --> 00:31:06,560 And those are the eigenvalues of A. Notice, first of all, 446 00:31:06,560 --> 00:31:09,700 there are two of them given by the two roots 447 00:31:09,700 --> 00:31:15,310 of this quadratic equation. 448 00:31:15,310 --> 00:31:21,250 And notice that they can be real or complex. 449 00:31:21,250 --> 00:31:22,270 They can be complex. 450 00:31:22,270 --> 00:31:23,620 They are complex in general. 451 00:31:29,710 --> 00:31:32,500 And they can be real, or imaginary, 452 00:31:32,500 --> 00:31:36,110 or have real and imaginary components. 453 00:31:39,160 --> 00:31:43,730 And that just depends on this quantity right here. 454 00:31:43,730 --> 00:31:47,870 If what's inside this square root is negative, 455 00:31:47,870 --> 00:31:51,020 then eigenvalues will be complex. 456 00:31:51,020 --> 00:31:55,690 If what's inside the square root is positive, 457 00:31:55,690 --> 00:31:57,180 then the eigenvector will be real. 458 00:32:00,600 --> 00:32:05,830 So let's find the eigenvalues for a symmetric matrix. 459 00:32:05,830 --> 00:32:10,930 a, d on the diagonals and b on the off-diagonals. 460 00:32:10,930 --> 00:32:12,080 So let's see what happens. 461 00:32:12,080 --> 00:32:14,680 Let's plug these into this equation. 462 00:32:14,680 --> 00:32:19,440 The 4bc becomes 4b squared. 463 00:32:19,440 --> 00:32:23,030 And you can see that this thing has 464 00:32:23,030 --> 00:32:27,570 to be greater than zero because a minus d squared has 465 00:32:27,570 --> 00:32:30,770 [INAUDIBLE] has to be positive. 466 00:32:30,770 --> 00:32:33,810 And b squared has to be positive. 467 00:32:33,810 --> 00:32:37,160 And so that quantity has to be greater than zero. 468 00:32:37,160 --> 00:32:39,500 And so what we find is that the eigenvalues 469 00:32:39,500 --> 00:32:41,960 of a symmetric matrix are always real. 470 00:32:46,270 --> 00:32:48,670 So let's just take this particular-- 471 00:32:48,670 --> 00:32:54,800 just an example-- and let's plug those into this equation. 472 00:32:54,800 --> 00:32:59,760 And what we find is that the eigenvalues are 473 00:32:59,760 --> 00:33:05,370 1 plus or minus root 2 over 2. 474 00:33:05,370 --> 00:33:08,050 So two real eigenvalues. 475 00:33:12,070 --> 00:33:16,540 So let's consider a special case of a symmetric matrix. 476 00:33:16,540 --> 00:33:20,470 Let's consider a matrix where the diagonal elements are 477 00:33:20,470 --> 00:33:23,350 equal, and the off-diagonal elements are equal. 478 00:33:26,570 --> 00:33:30,880 So we can update this equation for the case 479 00:33:30,880 --> 00:33:32,670 where the diagonal elements are equal. 480 00:33:32,670 --> 00:33:35,470 So a equals d. 481 00:33:35,470 --> 00:33:38,020 And what you find is that the eigenvalues are 482 00:33:38,020 --> 00:33:44,920 just a plus b and a minus b-- so a plus b and a minus b. 483 00:33:44,920 --> 00:33:50,180 And the eigenvectors can be found just 484 00:33:50,180 --> 00:33:55,220 by plugging these eigenvalues into the eigenvalue equation 485 00:33:55,220 --> 00:33:58,700 and solving for the eigenvectors. 486 00:33:58,700 --> 00:34:01,160 So I'll just go through that real quick-- 487 00:34:01,160 --> 00:34:02,750 a times x. 488 00:34:02,750 --> 00:34:05,360 So we found two eigenvalues, so there are 489 00:34:05,360 --> 00:34:07,460 going to be two eigenvectors. 490 00:34:07,460 --> 00:34:11,480 We can just plug that first eigenvalue into here, 491 00:34:11,480 --> 00:34:12,620 call it lambda plus. 492 00:34:12,620 --> 00:34:15,980 And now, we can solve for the eigenvector associated 493 00:34:15,980 --> 00:34:18,290 with that eigenvalue. 494 00:34:18,290 --> 00:34:20,810 Just plug that in, solve for x. 495 00:34:20,810 --> 00:34:25,190 What you find is that the x associated with that eigenvalue 496 00:34:25,190 --> 00:34:30,350 is 1, 1-- if you just go through the algebra. 497 00:34:30,350 --> 00:34:32,900 So that's the eigenvector associated 498 00:34:32,900 --> 00:34:35,150 with that eigenvalue. 499 00:34:35,150 --> 00:34:37,909 And that is the eigenvector associated 500 00:34:37,909 --> 00:34:39,440 with that eigenvalue. 501 00:34:42,150 --> 00:34:45,060 So I'll just give you a hint. 502 00:34:45,060 --> 00:34:47,690 In most of the problems that I'll 503 00:34:47,690 --> 00:34:53,810 give you to deal with on an exam or many 504 00:34:53,810 --> 00:34:55,400 of the ones in the problem sets, I 505 00:34:55,400 --> 00:34:59,650 think, in the problem set will have 506 00:34:59,650 --> 00:35:03,580 a form like this and [INAUDIBLE] eigenvectors 507 00:35:03,580 --> 00:35:06,520 along a 45-degree axis. 508 00:35:06,520 --> 00:35:09,700 So if you see a matrix like that, 509 00:35:09,700 --> 00:35:11,590 you don't have to plug it into MATLAB 510 00:35:11,590 --> 00:35:13,690 to extract the eigenvalues. 511 00:35:13,690 --> 00:35:16,870 You just know that the eigenvectors 512 00:35:16,870 --> 00:35:18,605 are on the 45-degree axis. 513 00:35:24,310 --> 00:35:31,200 So the process of writing a matrix as phi lambda phi 514 00:35:31,200 --> 00:35:35,220 transpose is called eigen-decomposition 515 00:35:35,220 --> 00:35:40,140 of this matrix A. So if you have a matrix 516 00:35:40,140 --> 00:35:42,290 that you can write down like this, 517 00:35:42,290 --> 00:35:44,910 that you can write in that form, it's 518 00:35:44,910 --> 00:35:49,100 called eigen-decomposition. 519 00:35:49,100 --> 00:35:54,660 And the lambdas, the diagonal elements of this lambda matrix, 520 00:35:54,660 --> 00:35:55,410 are real. 521 00:35:55,410 --> 00:35:57,900 And they're the eigenvalues. 522 00:35:57,900 --> 00:36:02,460 The columns of phi are the eigenvalues, 523 00:36:02,460 --> 00:36:04,625 and they form an orthogonal basis set. 524 00:36:11,190 --> 00:36:13,650 And this, if you take this equation 525 00:36:13,650 --> 00:36:16,950 and you multiply it on both sides by phi, 526 00:36:16,950 --> 00:36:20,550 you can write down that equation in a slightly different form-- 527 00:36:20,550 --> 00:36:24,180 A times phi equals phi lambda. 528 00:36:24,180 --> 00:36:30,900 This is a matrix way, a matrix equivalent, 529 00:36:30,900 --> 00:36:35,460 to the set of equations that we wrote down earlier. 530 00:36:35,460 --> 00:36:40,670 So remember, we wrote down this eigenvalue equation that 531 00:36:40,670 --> 00:36:44,970 describes that when you multiply this matrix A times 532 00:36:44,970 --> 00:36:50,210 an eigenvector equals lambda times the eigenvector, 533 00:36:50,210 --> 00:36:55,530 this is equivalent to writing down this matrix equation. 534 00:36:55,530 --> 00:36:59,130 So you'll often see this equation 535 00:36:59,130 --> 00:37:02,550 to describe the form of the eigenvalue equation 536 00:37:02,550 --> 00:37:03,700 rather than this form. 537 00:37:03,700 --> 00:37:04,200 Why? 538 00:37:04,200 --> 00:37:05,283 Because it's more compact. 539 00:37:08,238 --> 00:37:09,280 Any questions about that? 540 00:37:09,280 --> 00:37:13,060 We've just piled up all of these different f vectors 541 00:37:13,060 --> 00:37:17,730 into the columns of this rotation matrix phi. 542 00:37:21,540 --> 00:37:24,240 So if you see an equation like that, 543 00:37:24,240 --> 00:37:25,740 you'll know that you're just looking 544 00:37:25,740 --> 00:37:30,410 at an eigenvalue equation just like this. 545 00:37:30,410 --> 00:37:34,580 Now in general, when you want to do eigen-decomposition, 546 00:37:34,580 --> 00:37:36,980 when you have a symmetric matrix that you want 547 00:37:36,980 --> 00:37:39,530 to write down in this form. 548 00:37:39,530 --> 00:37:40,720 It's really simple. 549 00:37:40,720 --> 00:37:44,450 You don't have to go through all of this stuff 550 00:37:44,450 --> 00:37:47,990 with the characteristic equation, 551 00:37:47,990 --> 00:37:53,180 and solve for the eigenvalues, and then plug them in here, 552 00:37:53,180 --> 00:37:55,730 and solve for the eigenvectors. 553 00:37:55,730 --> 00:37:58,160 You can do that if you really want to. 554 00:37:58,160 --> 00:38:02,400 But most people don't because in two dimensions, you can do it. 555 00:38:02,400 --> 00:38:08,010 But in higher dimensions, it's very hard or impossible. 556 00:38:08,010 --> 00:38:11,510 So what you typically do is just use the eig function in MATLAB. 557 00:38:11,510 --> 00:38:15,050 If you just use this function eig on a matrix, 558 00:38:15,050 --> 00:38:18,440 it will return the eigenvectors and eigenvalues. 559 00:38:18,440 --> 00:38:21,290 So here, I'm just constructing a matrix A-- 560 00:38:21,290 --> 00:38:28,220 1.5, 0.5, 0.5, and 1.5, like that. 561 00:38:28,220 --> 00:38:31,720 And if you just use the eig function, 562 00:38:31,720 --> 00:38:36,700 it returns the eigenvectors as the columns of the matrix 563 00:38:36,700 --> 00:38:40,780 and the eigenvalues as the diagonals of this matrix. 564 00:38:40,780 --> 00:38:42,400 So you have to pass it. 565 00:38:42,400 --> 00:38:46,510 Arguments F and V equals eig of A. 566 00:38:46,510 --> 00:38:51,490 And it returns eigenvectors and eigenvalues. 567 00:38:51,490 --> 00:38:52,580 Any questions about that? 568 00:38:58,950 --> 00:39:06,220 So let's push on toward doing principal components analysis. 569 00:39:06,220 --> 00:39:10,110 So this is just the machinery that you use. 570 00:39:10,110 --> 00:39:13,210 Oh, and I think I had one more panel here just to show you 571 00:39:13,210 --> 00:39:17,440 that if you take F and V, you can reconstruct A. 572 00:39:17,440 --> 00:39:22,030 So A is just F, V, F transpose. 573 00:39:22,030 --> 00:39:26,230 F is just phi in the previous equation. 574 00:39:26,230 --> 00:39:28,240 And V is the lambda. 575 00:39:28,240 --> 00:39:30,520 Sorry, they didn't have phi and lambda, 576 00:39:30,520 --> 00:39:33,670 and they're not options. 577 00:39:33,670 --> 00:39:36,860 For variable names, I used F and V. 578 00:39:36,860 --> 00:39:45,990 And you can see that F, V, F transpose is just equal to A. 579 00:39:45,990 --> 00:39:48,310 Any questions about that? 580 00:39:48,310 --> 00:39:49,500 No? 581 00:39:49,500 --> 00:39:53,930 All right, so let's turn to how do 582 00:39:53,930 --> 00:39:58,790 you use eigenvectors and eigenvalues to describe data. 583 00:40:01,730 --> 00:40:05,540 So I'm going to briefly review the notion of variance, 584 00:40:05,540 --> 00:40:08,270 what that means in higher dimensions, 585 00:40:08,270 --> 00:40:13,010 and how you use a covariance matrix to describe data 586 00:40:13,010 --> 00:40:14,870 in high dimensions. 587 00:40:14,870 --> 00:40:17,210 So let's say that we have a bunch of observations 588 00:40:17,210 --> 00:40:19,490 of a variable x-- 589 00:40:19,490 --> 00:40:22,760 so this is now just a scaler. 590 00:40:22,760 --> 00:40:26,390 So, we have m different observations, 591 00:40:26,390 --> 00:40:31,160 x superscript j is the j-th observation of that data. 592 00:40:31,160 --> 00:40:35,270 And you can see that if you make a bunch of measurements of most 593 00:40:35,270 --> 00:40:38,480 things in the world, you'll find a distribution 594 00:40:38,480 --> 00:40:43,070 of those measurements. 595 00:40:43,070 --> 00:40:45,605 Often, they will be distributed in a bump. 596 00:40:49,220 --> 00:40:52,790 You can write down the mean of that distribution 597 00:40:52,790 --> 00:40:56,270 just as the average value overall distributions 598 00:40:56,270 --> 00:40:58,490 by summing together all those distributions 599 00:40:58,490 --> 00:41:01,490 and dividing by the number of observations. 600 00:41:01,490 --> 00:41:06,320 You can also write down the variance of that distribution 601 00:41:06,320 --> 00:41:10,580 by subtracting the mean from all of those observations, 602 00:41:10,580 --> 00:41:14,330 squaring that difference from the mean, 603 00:41:14,330 --> 00:41:17,180 summing up over all observations, 604 00:41:17,180 --> 00:41:18,260 and dividing by m. 605 00:41:22,580 --> 00:41:28,940 So let's say that we now have m different observations of two 606 00:41:28,940 --> 00:41:32,771 variables, pressure and temperature. 607 00:41:36,540 --> 00:41:42,860 We have a distribution of those quantities. 608 00:41:42,860 --> 00:41:49,970 We can describe that observation of x1 and x2 as a vector. 609 00:41:49,970 --> 00:41:54,290 And we have m different observations of that vector. 610 00:41:54,290 --> 00:41:58,760 You can write down the mean and variance of x1 and x2. 611 00:41:58,760 --> 00:42:02,480 So for x1, we can write down the mean as mu1. 612 00:42:02,480 --> 00:42:05,330 We can write down the variance of x1. 613 00:42:05,330 --> 00:42:09,396 We can write down the mean and variance of x2, 614 00:42:09,396 --> 00:42:11,290 of the x2 observation. 615 00:42:14,720 --> 00:42:16,250 And sometimes, that will give you 616 00:42:16,250 --> 00:42:20,630 a pretty good description of this two-dimensional 617 00:42:20,630 --> 00:42:23,700 observation. 618 00:42:23,700 --> 00:42:26,420 But sometimes, it won't. 619 00:42:26,420 --> 00:42:31,570 So in many cases, those variables, x1 and x2, 620 00:42:31,570 --> 00:42:33,220 are not correlated with each other. 621 00:42:33,220 --> 00:42:36,300 They're independent variables. 622 00:42:36,300 --> 00:42:42,120 In many cases, though, x1 and x2 are dependent on each other. 623 00:42:42,120 --> 00:42:45,810 The observations of x1 and x2 are correlated with each other, 624 00:42:45,810 --> 00:42:49,320 so that if x1 is big, x2 also tends to be big. 625 00:42:52,580 --> 00:42:56,000 In these two cases, x1 can have the same variance. 626 00:43:00,060 --> 00:43:02,600 x2 can have the same variance. 627 00:43:02,600 --> 00:43:05,480 But there's clearly something different here. 628 00:43:05,480 --> 00:43:08,270 So we need something more than just describing 629 00:43:08,270 --> 00:43:12,440 the variance of x1 and x2 to describe these data. 630 00:43:12,440 --> 00:43:16,680 And that thing is the covariance. 631 00:43:16,680 --> 00:43:20,790 It just says how do x1 and x2 covary? 632 00:43:20,790 --> 00:43:25,730 If x1 is big, does x2 also tend to be big? 633 00:43:25,730 --> 00:43:28,550 In this case, the covariance is zero. 634 00:43:28,550 --> 00:43:31,040 In this case, the covariance is positive. 635 00:43:31,040 --> 00:43:35,300 So we're taking if a fluctuation of x1 above the mean 636 00:43:35,300 --> 00:43:38,840 is associated with a fluctuation of x2 above the mean, 637 00:43:38,840 --> 00:43:41,510 then these points will produce a positive contribution 638 00:43:41,510 --> 00:43:42,640 to the covariance. 639 00:43:42,640 --> 00:43:45,920 And these points here will also produce a positive contribution 640 00:43:45,920 --> 00:43:47,110 to the covariance. 641 00:43:47,110 --> 00:43:53,000 And the covariance here will be some number greater than zero. 642 00:43:53,000 --> 00:43:55,400 That's closely related to the correlation, just 643 00:43:55,400 --> 00:43:57,920 the Pearson correlation coefficient, which 644 00:43:57,920 --> 00:44:01,940 is the covariance divided by the geometric mean 645 00:44:01,940 --> 00:44:03,620 of the individual variances. 646 00:44:07,480 --> 00:44:11,470 I'm assuming most of you have seen this many times, but just 647 00:44:11,470 --> 00:44:14,620 to get us up to speed. 648 00:44:14,620 --> 00:44:20,720 So if you have data, a bunch of observations, 649 00:44:20,720 --> 00:44:25,640 you can very easily fit those data to a Gaussian. 650 00:44:25,640 --> 00:44:30,320 And you do that simply by measuring the mean and variance 651 00:44:30,320 --> 00:44:32,190 of your data. 652 00:44:32,190 --> 00:44:36,860 And that turns out to be the best fit to a Gaussian. 653 00:44:36,860 --> 00:44:39,980 So if you have a bunch of observations in one dimension, 654 00:44:39,980 --> 00:44:43,880 you measure the mean and variance of that set of data. 655 00:44:43,880 --> 00:44:47,510 That turns out to be a best fit in the least squared sense 656 00:44:47,510 --> 00:44:53,850 to a Gaussian probability distribution defined 657 00:44:53,850 --> 00:44:56,310 by a mean and a variance. 658 00:44:58,880 --> 00:45:01,976 So this is easy in one dimension. 659 00:45:07,860 --> 00:45:09,960 What we're interested in doing is understanding 660 00:45:09,960 --> 00:45:11,530 data in higher dimensions. 661 00:45:11,530 --> 00:45:15,850 So how do we describe data in higher dimensions? 662 00:45:15,850 --> 00:45:20,070 How do we describe a Gaussian in higher dimensions? 663 00:45:20,070 --> 00:45:23,228 So that's what we're going to turn to now. 664 00:45:23,228 --> 00:45:24,770 And the reason we're going to do this 665 00:45:24,770 --> 00:45:27,290 is not because every time we have data, 666 00:45:27,290 --> 00:45:31,830 we're really trying to fit a Gaussian into it. 667 00:45:31,830 --> 00:45:39,150 It's just that it's a powerful way of thinking about data, 668 00:45:39,150 --> 00:45:43,500 of describing data in terms of variances 669 00:45:43,500 --> 00:45:45,880 in different directions. 670 00:45:45,880 --> 00:45:47,760 And so we often think about what we're 671 00:45:47,760 --> 00:45:50,970 doing when we are looking at high-dimensional data 672 00:45:50,970 --> 00:45:54,120 is understanding its distribution 673 00:45:54,120 --> 00:45:58,470 in different dimensions as kind of a Gaussian cloud 674 00:45:58,470 --> 00:46:03,120 that optimally best fits the data that we're looking at. 675 00:46:03,120 --> 00:46:04,770 And mostly because it just gives us 676 00:46:04,770 --> 00:46:09,420 an intuitive about how to best represent or think 677 00:46:09,420 --> 00:46:12,930 about data in high dimensions. 678 00:46:12,930 --> 00:46:15,330 So we're going to get insights into how to think 679 00:46:15,330 --> 00:46:17,280 about high-dimensional data. 680 00:46:17,280 --> 00:46:20,340 We're going to develop that description using the vector 681 00:46:20,340 --> 00:46:22,590 and matrix notation that we've been developing 682 00:46:22,590 --> 00:46:27,095 all along because vectors and matrices 683 00:46:27,095 --> 00:46:29,990 provide a natural way of manipulating 684 00:46:29,990 --> 00:46:34,250 data sets, of doing transformations of basis, 685 00:46:34,250 --> 00:46:36,720 rotations, so on. 686 00:46:36,720 --> 00:46:38,000 It's very compact. 687 00:46:38,000 --> 00:46:39,590 And those manipulations are really 688 00:46:39,590 --> 00:46:45,900 trivial in MATLAB or Python. 689 00:46:45,900 --> 00:46:51,110 So let's build up a Gaussian distribution in two dimensions. 690 00:46:51,110 --> 00:46:56,880 So we have, again, our Gaussian random variables, x1 and x2. 691 00:46:56,880 --> 00:46:59,340 We have a Gaussian distribution, where the probability 692 00:46:59,340 --> 00:47:02,655 distribution is proportional to e to the minus 1/2 693 00:47:02,655 --> 00:47:03,670 of x1 squared. 694 00:47:06,500 --> 00:47:10,140 We have probability distribution for x2-- 695 00:47:10,140 --> 00:47:11,550 again, probability of x2. 696 00:47:14,280 --> 00:47:15,900 We can write down the probability 697 00:47:15,900 --> 00:47:20,290 of x1 and x2, the joint probability distribution, 698 00:47:20,290 --> 00:47:22,510 assuming these are independent. 699 00:47:22,510 --> 00:47:25,950 We can write that as the product of p-- 700 00:47:25,950 --> 00:47:28,530 the product of the two probability distributions p 701 00:47:28,530 --> 00:47:31,390 of x1 and p of x2. 702 00:47:31,390 --> 00:47:36,360 And we have some Gaussian cloud, some Gaussian distribution 703 00:47:36,360 --> 00:47:38,910 in two dimensions that we can write down like this. 704 00:47:38,910 --> 00:47:41,110 That's simply the product. 705 00:47:41,110 --> 00:47:43,500 So the product of these two distributions 706 00:47:43,500 --> 00:47:47,560 is e to the minus 1/2 x1 squared times 707 00:47:47,560 --> 00:47:50,160 e to the minus 1/2 x2 squared. 708 00:47:50,160 --> 00:47:51,960 And then, there's a constant in front 709 00:47:51,960 --> 00:47:55,260 that just normalizes, so that the total area under that curve 710 00:47:55,260 --> 00:47:55,920 is just 1. 711 00:47:59,110 --> 00:48:02,620 We can write this as e to the minus 1/2 x1 712 00:48:02,620 --> 00:48:06,470 squared plus x2 squared. 713 00:48:06,470 --> 00:48:10,970 And that's e to the minus 1/2 times some distance 714 00:48:10,970 --> 00:48:12,920 from the origin. 715 00:48:12,920 --> 00:48:18,050 So it falls off exponentially in a way that depends only 716 00:48:18,050 --> 00:48:24,110 on the distance from the origin or from the mean 717 00:48:24,110 --> 00:48:25,040 of the distribution. 718 00:48:25,040 --> 00:48:29,200 In this case, we set the mean to be zero. 719 00:48:29,200 --> 00:48:36,380 Now, we can write that distance squared using vector notation. 720 00:48:36,380 --> 00:48:39,190 It's just the square magnitude of that vector x. 721 00:48:39,190 --> 00:48:42,070 So if we have a vector x sitting out here somewhere, 722 00:48:42,070 --> 00:48:46,060 we can measure the distance from the center of the Gaussian 723 00:48:46,060 --> 00:48:50,350 as the square magnitude of x, which is just x dot x, 724 00:48:50,350 --> 00:48:52,180 or x transpose x. 725 00:48:56,340 --> 00:48:57,875 So we're going to use this notation 726 00:48:57,875 --> 00:49:04,200 to find the distance of a vector from the center of the Gaussian 727 00:49:04,200 --> 00:49:05,220 distribution. 728 00:49:05,220 --> 00:49:07,782 So you're going to see a lot of x [INAUDIBLE] axis. 729 00:49:10,760 --> 00:49:13,220 So this distribution that we just built 730 00:49:13,220 --> 00:49:15,920 is called an isotropic multivariate Gaussian 731 00:49:15,920 --> 00:49:17,094 distribution. 732 00:49:20,730 --> 00:49:25,940 And that distance d is called the Mahalanobis distance, 733 00:49:25,940 --> 00:49:28,145 which I'm going to say as little as possible. 734 00:49:30,880 --> 00:49:38,590 So that distribution now describes how these points-- 735 00:49:38,590 --> 00:49:42,220 the probability of finding these different points 736 00:49:42,220 --> 00:49:45,280 drawn from that distribution as a function of their position 737 00:49:45,280 --> 00:49:47,675 in this space. 738 00:49:47,675 --> 00:49:49,300 So you're going to draw a lot of points 739 00:49:49,300 --> 00:49:51,640 here in the middle and fewer points 740 00:49:51,640 --> 00:49:55,230 as you go away at larger distances. 741 00:49:55,230 --> 00:50:01,800 So this particular distribution that I made here 742 00:50:01,800 --> 00:50:03,620 has one more word in it. 743 00:50:03,620 --> 00:50:07,280 It's an isotopic multivariate Gaussian distribution 744 00:50:07,280 --> 00:50:10,255 of unit variance. 745 00:50:10,255 --> 00:50:11,630 And what we're going to do now is 746 00:50:11,630 --> 00:50:17,570 we're going to build up all possible Gaussian distributions 747 00:50:17,570 --> 00:50:22,310 from this distribution by simply doing matrix transformations. 748 00:50:25,040 --> 00:50:29,840 So we're going to start by taking that unit variance 749 00:50:29,840 --> 00:50:33,440 Gaussian distribution and build an isotopic Gaussian 750 00:50:33,440 --> 00:50:36,560 distribution that has an arbitrary variance-- 751 00:50:36,560 --> 00:50:39,640 that means an arbitrary width. 752 00:50:39,640 --> 00:50:43,730 We're then going to build a Gaussian distribution that 753 00:50:43,730 --> 00:50:52,900 can be stretched arbitrarily along these two axes, y1 754 00:50:52,900 --> 00:50:55,300 and y2. 755 00:50:55,300 --> 00:50:59,620 And we're going to do that by using a transformation 756 00:50:59,620 --> 00:51:03,610 with a diagonal matrix. 757 00:51:03,610 --> 00:51:07,480 And then, what we're going to do is build an arbitrary Gaussian 758 00:51:07,480 --> 00:51:11,620 distribution that can be stretched and worked 759 00:51:11,620 --> 00:51:18,740 in any direction by using a transformation matrix called 760 00:51:18,740 --> 00:51:21,810 a covariance matrix, which just tells you 761 00:51:21,810 --> 00:51:24,540 how that distribution is stretched 762 00:51:24,540 --> 00:51:25,580 in different directions. 763 00:51:25,580 --> 00:51:29,700 So we can stretch it in any direction we want. 764 00:51:29,700 --> 00:51:30,590 Yes. 765 00:51:30,590 --> 00:51:31,924 AUDIENCE: Why is [INAUDIBLE]? 766 00:51:34,790 --> 00:51:36,650 MICHALE FEE: OK, the distance squared 767 00:51:36,650 --> 00:51:39,550 is the square of magnitude. 768 00:51:39,550 --> 00:51:45,640 And the square of magnitude is x dot x, the dot product. 769 00:51:45,640 --> 00:51:48,520 But remember, we can write down the dot product in matrix 770 00:51:48,520 --> 00:51:51,470 notation as x transpose x. 771 00:51:51,470 --> 00:51:57,900 So if we have row vector times a column vector, 772 00:51:57,900 --> 00:52:01,350 you get the dot product. 773 00:52:01,350 --> 00:52:02,000 Yes, Lina. 774 00:52:02,000 --> 00:52:03,458 AUDIENCE: What does isotropic mean? 775 00:52:03,458 --> 00:52:06,630 MICHALE FEE: OK, isotropic just means the same 776 00:52:06,630 --> 00:52:08,070 in all directions. 777 00:52:08,070 --> 00:52:09,540 Sorry, I should have defined that. 778 00:52:09,540 --> 00:52:12,914 AUDIENCE: [INAUDIBLE] when you stretched it, 779 00:52:12,914 --> 00:52:14,360 it's not isotropic? 780 00:52:14,360 --> 00:52:18,000 MICHALE FEE: Yes, these are non-isotropic distributions 781 00:52:18,000 --> 00:52:19,290 because they're different. 782 00:52:19,290 --> 00:52:23,020 They have different variances in different directions. 783 00:52:23,020 --> 00:52:25,080 So you can see that this has a large variance 784 00:52:25,080 --> 00:52:29,130 in the y1 direction and a small variance in the y2 direction. 785 00:52:29,130 --> 00:52:30,370 So it's non-isotropic. 786 00:52:33,090 --> 00:52:33,840 Yes, [INAUDIBLE]. 787 00:52:33,840 --> 00:52:36,230 AUDIENCE: Why do you [INAUDIBLE]?? 788 00:52:36,230 --> 00:52:37,230 MICHALE FEE: Right here. 789 00:52:37,230 --> 00:52:39,060 OK, think about this. 790 00:52:39,060 --> 00:52:44,240 Variance, you put into this Gaussian distribution 791 00:52:44,240 --> 00:52:48,030 as the distance squared over the variance squared. 792 00:52:48,030 --> 00:52:51,830 It's distance squared over a variance, which 793 00:52:51,830 --> 00:52:53,570 is sigma squared. 794 00:52:53,570 --> 00:52:56,870 Here it's distance squared over a variance. 795 00:52:56,870 --> 00:52:59,980 Here it's distance squared over a variance. 796 00:52:59,980 --> 00:53:02,880 Does that makes sense? 797 00:53:02,880 --> 00:53:04,610 It's just that in order to describe 798 00:53:04,610 --> 00:53:10,650 these complex stretching and rotation of this Gaussian 799 00:53:10,650 --> 00:53:12,730 distribution in high-dimensional space, 800 00:53:12,730 --> 00:53:15,000 we need a matrix to do that. 801 00:53:18,000 --> 00:53:21,900 And that covariance matrix describes the variances 802 00:53:21,900 --> 00:53:27,390 in the different direction and essentially the rotation. 803 00:53:27,390 --> 00:53:30,630 Remember, this distribution here is just a distribution 804 00:53:30,630 --> 00:53:34,170 that's stretched and rotated. 805 00:53:34,170 --> 00:53:39,360 Well, we learned how to build exactly such a transformation 806 00:53:39,360 --> 00:53:44,370 by taking the product of phi transpose lambda phi. 807 00:53:44,370 --> 00:53:49,890 So we're going to use this to build these arbitrary Gaussian 808 00:53:49,890 --> 00:53:50,890 distributions. 809 00:53:53,710 --> 00:53:55,930 OK, so I'll just go through this quickly. 810 00:53:55,930 --> 00:54:04,520 If we have an isotopic unit variance Gaussian distribution 811 00:54:04,520 --> 00:54:06,920 as a function of this vector x, we 812 00:54:06,920 --> 00:54:09,860 can build a Gaussian distribution 813 00:54:09,860 --> 00:54:13,340 of arbitrary variance by writing down a y that's 814 00:54:13,340 --> 00:54:16,310 simply sigma times x. 815 00:54:16,310 --> 00:54:22,380 We're going to transform x into y, 816 00:54:22,380 --> 00:54:25,850 so that we can write down a distribution that 817 00:54:25,850 --> 00:54:28,410 has an arbitrary variance. 818 00:54:28,410 --> 00:54:29,760 Here this is variance 1. 819 00:54:29,760 --> 00:54:34,020 Here this is sigma squared. 820 00:54:34,020 --> 00:54:40,900 So let's make just a change of variables y equals sigma x. 821 00:54:40,900 --> 00:54:42,930 So now, what's the probability distribution 822 00:54:42,930 --> 00:54:44,460 as a function of y? 823 00:54:44,460 --> 00:54:46,350 Well, there's probability distribution 824 00:54:46,350 --> 00:54:47,430 as a function of x. 825 00:54:47,430 --> 00:54:50,310 We're simply going to substitute y equals sigma x 826 00:54:50,310 --> 00:54:54,240 with x equals sigma inverse y. 827 00:54:54,240 --> 00:54:57,020 We're going to substitute this into here. 828 00:54:57,020 --> 00:54:59,370 The Mahalanobis distance is just x 829 00:54:59,370 --> 00:55:03,900 transpose x, which is just sigma inverse y transpose sigma 830 00:55:03,900 --> 00:55:06,420 inverse y. 831 00:55:06,420 --> 00:55:10,540 And when you do that, you find that the distance squared 832 00:55:10,540 --> 00:55:14,030 is just y transpose sigma to the minus 2y. 833 00:55:17,560 --> 00:55:21,320 So there is our Gaussian distribution 834 00:55:21,320 --> 00:55:25,610 for this distribution. 835 00:55:25,610 --> 00:55:28,010 There's the expression for this Gaussian distribution 836 00:55:28,010 --> 00:55:29,510 with a variance sigma. 837 00:55:32,830 --> 00:55:35,030 We can rewrite this in different ways. 838 00:55:35,030 --> 00:55:37,540 Now, let's build a Gaussian distribution 839 00:55:37,540 --> 00:55:45,000 that stretched arbitrarily in different directions, x and y. 840 00:55:45,000 --> 00:55:46,620 We're going to do the same trick. 841 00:55:46,620 --> 00:55:50,520 We're simply going to make a transformation y equals 842 00:55:50,520 --> 00:55:58,650 matrix, diagonal matrix, s times x and substitute this 843 00:55:58,650 --> 00:56:03,550 into our expression for a Gaussian. 844 00:56:03,550 --> 00:56:05,530 So x equals s inverse y. 845 00:56:05,530 --> 00:56:09,880 The Mahalanobis distance is given by x transpose x, 846 00:56:09,880 --> 00:56:11,310 which we can just get down here. 847 00:56:11,310 --> 00:56:13,230 Let's do that with this substitution. 848 00:56:16,800 --> 00:56:21,792 And we get an s squared here, s inverse squared, 849 00:56:21,792 --> 00:56:23,875 which we're just going to write as lambda inverse. 850 00:56:30,170 --> 00:56:33,950 And you can see that you have these variances 851 00:56:33,950 --> 00:56:35,480 along the diagonal. 852 00:56:35,480 --> 00:56:39,440 So if that's lambda inverse, then lambda 853 00:56:39,440 --> 00:56:43,970 is just a matrix of variances along the diagonal. 854 00:56:43,970 --> 00:56:49,040 So sigma 1 squared is the variance in this direction. 855 00:56:49,040 --> 00:56:53,460 Sigma 2 squared is the variance in this direction. 856 00:56:53,460 --> 00:56:57,870 I'm just showing you how you make a transformation 857 00:56:57,870 --> 00:57:01,500 to this vector x into another vector y 858 00:57:01,500 --> 00:57:07,170 to build up a representation of this effective distance 859 00:57:07,170 --> 00:57:10,680 from the center of distribution for different kinds 860 00:57:10,680 --> 00:57:12,180 of Gaussian distributions. 861 00:57:16,210 --> 00:57:19,900 And now finally, let's build up an expression 862 00:57:19,900 --> 00:57:23,560 for a Gaussian distribution with arbitrary variance 863 00:57:23,560 --> 00:57:25,690 and covariance. 864 00:57:25,690 --> 00:57:28,810 So we're going to make a transformation 865 00:57:28,810 --> 00:57:38,140 of x into a new vector y using this rotated stretch matrix. 866 00:57:40,800 --> 00:57:46,600 We're going to substitute this in, calculate the Mahalanobis 867 00:57:46,600 --> 00:57:47,500 distance-- 868 00:57:47,500 --> 00:57:50,050 is now x transpose x. 869 00:57:50,050 --> 00:57:56,340 Substitute this and solve for the Mahalanobis distance. 870 00:57:56,340 --> 00:57:59,430 And what you find is that distance squared 871 00:57:59,430 --> 00:58:05,560 is just y transpose phi lambda inverse phi transpose times y. 872 00:58:05,560 --> 00:58:09,540 And we just write that as y transpose sigma inverse y. 873 00:58:14,570 --> 00:58:19,640 So that is now an expression for an arbitrary Gaussian 874 00:58:19,640 --> 00:58:23,285 distribution in high-dimensional space. 875 00:58:26,090 --> 00:58:31,010 And that distribution is defined by this matrix 876 00:58:31,010 --> 00:58:35,840 of variances and covariances. 877 00:58:35,840 --> 00:58:39,110 Again, I'm just writing down the definition of sigma inverse 878 00:58:39,110 --> 00:58:40,340 here. 879 00:58:40,340 --> 00:58:44,860 We can take the inverse of that, and we see that 880 00:58:44,860 --> 00:58:49,370 our covariance-- this is called a covariance matrix-- 881 00:58:49,370 --> 00:58:54,480 it describes the variance and correlations 882 00:58:54,480 --> 00:59:00,810 of those different dimensions as a matrix. 883 00:59:00,810 --> 00:59:05,430 That's just this rotated stretch matrix 884 00:59:05,430 --> 00:59:08,890 that we been working with. 885 00:59:08,890 --> 00:59:15,810 And that's just the same as this covariance matrix 886 00:59:15,810 --> 00:59:22,385 that we described for distribution. 887 00:59:22,385 --> 00:59:24,840 I feel like all that didn't come out quite as clearly 888 00:59:24,840 --> 00:59:26,070 as I'd hoped. 889 00:59:26,070 --> 00:59:29,980 But let me just summarize for you. 890 00:59:29,980 --> 00:59:36,800 So we started with an isotopic Gaussian of unit variance. 891 00:59:36,800 --> 00:59:41,510 And we multiplied that vector, we transformed that vector x, 892 00:59:41,510 --> 00:59:45,710 by multiplying it by sigma so that we could write down 893 00:59:45,710 --> 00:59:49,490 a Gaussian distribution of arbitrary variance. 894 00:59:49,490 --> 00:59:54,560 We transformed that vector x with a diagonal covariance 895 00:59:54,560 --> 01:00:01,010 matrix to get arbitrary stretches along the axes. 896 01:00:01,010 --> 01:00:04,220 And then, we made another kind of transformation 897 01:00:04,220 --> 01:00:08,680 with an arbitrary stretch and rotation matrix 898 01:00:08,680 --> 01:00:12,040 so that we can now write down a Gaussian distribution that 899 01:00:12,040 --> 01:00:16,000 has arbitrary stretch and rotation of its variances 900 01:00:16,000 --> 01:00:18,260 in different directions. 901 01:00:18,260 --> 01:00:24,430 So this is the punch line right here-- 902 01:00:24,430 --> 01:00:28,300 that you can write down the Gaussian distribution 903 01:00:28,300 --> 01:00:36,080 with arbitrary variances in this form. 904 01:00:36,080 --> 01:00:41,320 And that sigma right there is just the covariance matrix 905 01:00:41,320 --> 01:00:44,980 that describes how wide that distribution is 906 01:00:44,980 --> 01:00:47,860 in the different directions and how correlated 907 01:00:47,860 --> 01:00:49,900 those different directions are. 908 01:00:54,780 --> 01:00:57,400 I think this just summarizes what I've already said. 909 01:01:01,940 --> 01:01:06,470 So now, let's compute the covariance matrix from data. 910 01:01:06,470 --> 01:01:09,710 So now, I've shown you how to represent 911 01:01:09,710 --> 01:01:11,810 Gaussians in high dimensions that 912 01:01:11,810 --> 01:01:15,390 have these arbitrary variances. 913 01:01:15,390 --> 01:01:18,350 Now, let's say that I actually have some data. 914 01:01:18,350 --> 01:01:22,330 How do I fit one of these Gaussians to it? 915 01:01:22,330 --> 01:01:25,010 And it turns out that it's really simple. 916 01:01:25,010 --> 01:01:27,520 It's just a matter of calculating 917 01:01:27,520 --> 01:01:30,160 this covariance matrix. 918 01:01:30,160 --> 01:01:32,630 So let's do that. 919 01:01:32,630 --> 01:01:39,100 So here is some high-dimensional data. 920 01:01:39,100 --> 01:01:44,190 Remember that to fit a Gaussian to a bunch of data, 921 01:01:44,190 --> 01:01:46,290 all we need to do is to find the mean 922 01:01:46,290 --> 01:01:49,740 and variants in one dimension. 923 01:01:49,740 --> 01:01:51,510 For higher dimensions, we just need 924 01:01:51,510 --> 01:01:57,060 to find the mean and the covariance matrix. 925 01:01:57,060 --> 01:01:59,220 So that's simple. 926 01:01:59,220 --> 01:02:01,540 So here's our set of observations. 927 01:02:01,540 --> 01:02:05,070 Now, instead of being scalars, they're vectors. 928 01:02:05,070 --> 01:02:07,170 First thing we do is subtract the mean. 929 01:02:07,170 --> 01:02:09,660 So we calculate the mean by summing 930 01:02:09,660 --> 01:02:13,200 all of those observations, dividing those numbers, 931 01:02:13,200 --> 01:02:14,430 divide by m. 932 01:02:14,430 --> 01:02:17,850 So there we find the mean. 933 01:02:17,850 --> 01:02:22,560 We compute a new data set with the mean subtracted. 934 01:02:22,560 --> 01:02:25,220 So from every one of these observations, 935 01:02:25,220 --> 01:02:27,630 we subtract the mean. 936 01:02:27,630 --> 01:02:29,050 And we're going to call that z. 937 01:02:33,580 --> 01:02:35,530 So there is our mean subtracted here. 938 01:02:35,530 --> 01:02:37,450 I've subtracted the mean. 939 01:02:37,450 --> 01:02:40,210 So those are the x's. 940 01:02:40,210 --> 01:02:41,050 Subtract the mean. 941 01:02:41,050 --> 01:02:43,780 Those are now our z's, our mean-subtracted data. 942 01:02:47,556 --> 01:02:50,460 Does that makes sense? 943 01:02:50,460 --> 01:02:53,650 Now, we're going to calculate this covariance matrix. 944 01:02:53,650 --> 01:02:58,500 Well, all we do is we find the variance 945 01:02:58,500 --> 01:03:02,930 in each direction and the covariances. 946 01:03:02,930 --> 01:03:06,960 So it's going to be a matrix in low-dimensional data. 947 01:03:06,960 --> 01:03:10,610 It's a two-by-two matrix. 948 01:03:10,610 --> 01:03:14,440 So we're going to find the variance in the z1 direction. 949 01:03:14,440 --> 01:03:19,060 It's just z1 times z1, summed over all the observations, 950 01:03:19,060 --> 01:03:19,830 divided by m. 951 01:03:22,970 --> 01:03:27,560 Th variance in the z2 direction is just the sum 952 01:03:27,560 --> 01:03:32,390 of z2, j, z2, j divided by m. 953 01:03:32,390 --> 01:03:35,570 The covariance is just the cross terms, 954 01:03:35,570 --> 01:03:39,620 z1 one times z2 and z2 times z1. 955 01:03:39,620 --> 01:03:42,000 Of course, those are equal to each other. 956 01:03:42,000 --> 01:03:47,410 So in a covariance matrix, it's symmetric. 957 01:03:47,410 --> 01:03:49,100 So how do we calculate this? 958 01:03:49,100 --> 01:03:53,350 It turns out that in MATLAB, this is super-duper easy. 959 01:03:55,890 --> 01:04:00,540 So if this is our vector, that's our vector, one 960 01:04:00,540 --> 01:04:07,170 of our observations, we can compute the inner product 961 01:04:07,170 --> 01:04:08,550 z transpose z. 962 01:04:08,550 --> 01:04:11,970 So the inner product is just z transpose z, 963 01:04:11,970 --> 01:04:14,370 which is z1, z2, z1, z2. 964 01:04:14,370 --> 01:04:19,290 That's the square magnitude of z. 965 01:04:19,290 --> 01:04:24,070 There's another kind of product called the outer product. 966 01:04:24,070 --> 01:04:25,570 Remember that. 967 01:04:25,570 --> 01:04:29,640 So the outer product looks like this. 968 01:04:29,640 --> 01:04:31,740 This is a 1 by 2. 969 01:04:31,740 --> 01:04:34,230 That's a rho vector times a column 970 01:04:34,230 --> 01:04:36,060 vector is equal to a scalar. 971 01:04:36,060 --> 01:04:41,700 1 by 2 times 2 by 1 equals by 1 by 1-- two rows, one column-- 972 01:04:41,700 --> 01:04:49,470 times 1 by 2, gives you a 2 by 2 matrix that looks like this. 973 01:04:49,470 --> 01:04:53,880 z1 times z1, z1, z2, z1, z2, z2, z2. 974 01:04:53,880 --> 01:04:54,630 Why? 975 01:04:54,630 --> 01:04:59,370 It's z1 times z1 equals that. 976 01:04:59,370 --> 01:05:07,050 z1 times z2, z2 z1, one z2 z2. 977 01:05:07,050 --> 01:05:11,430 So that outer product already gives us 978 01:05:11,430 --> 01:05:16,890 the components to compute the correlation matrix. 979 01:05:16,890 --> 01:05:21,750 So what we do is we just take this vector, 980 01:05:21,750 --> 01:05:25,320 z the j-th observation of this vector z, 981 01:05:25,320 --> 01:05:29,790 and multiply it by the j-th observation of this vector z 982 01:05:29,790 --> 01:05:30,690 transpose. 983 01:05:30,690 --> 01:05:34,510 And that gives us this matrix. 984 01:05:34,510 --> 01:05:38,250 And we sum over all this. 985 01:05:38,250 --> 01:05:43,130 And you see that is exactly the covariance matrix. 986 01:05:48,450 --> 01:05:55,080 So if we have m observations of vector z, 987 01:05:55,080 --> 01:05:57,630 we put them in matrix form. 988 01:05:57,630 --> 01:06:00,450 So we have a big, long data matrix. 989 01:06:00,450 --> 01:06:02,550 Like this. 990 01:06:02,550 --> 01:06:06,510 There are m observations of this two-dimensional vector z. 991 01:06:09,320 --> 01:06:14,560 The data dimension, the data vector has, mentioned 2. 992 01:06:14,560 --> 01:06:16,180 Their are m observations. 993 01:06:16,180 --> 01:06:18,570 So m is the number of samples. 994 01:06:18,570 --> 01:06:21,485 So this is an n-by-m matrix. 995 01:06:25,370 --> 01:06:27,690 So if you want to compute the covariance matrix, 996 01:06:27,690 --> 01:06:31,340 you just in MATLAB, literally do z. 997 01:06:31,340 --> 01:06:36,850 This big matrix z times that matrix transpose. 998 01:06:36,850 --> 01:06:41,150 And that automatically finds the covariance matrix for you 999 01:06:41,150 --> 01:06:42,920 in one line of MATLAB. 1000 01:06:47,200 --> 01:06:49,480 There's a little trick to subtract the mean easily. 1001 01:06:49,480 --> 01:06:53,970 So remember, your original observations are x. 1002 01:06:53,970 --> 01:06:57,510 You compute the mean across the rows. 1003 01:06:57,510 --> 01:07:02,880 Thus, you're going you're going to sum across columns to give 1004 01:07:02,880 --> 01:07:04,410 you a mean for each row. 1005 01:07:04,410 --> 01:07:10,530 That gives you a mean of that first component of your vector, 1006 01:07:10,530 --> 01:07:12,030 mean of the second component. 1007 01:07:12,030 --> 01:07:15,480 That's really easy in the lab. 1008 01:07:15,480 --> 01:07:23,490 mu is the mean of x summing cross the second component. 1009 01:07:23,490 --> 01:07:25,980 That gives you a mean vector and then 1010 01:07:25,980 --> 01:07:30,030 you use repmat to fill that mean out in all of the columns 1011 01:07:30,030 --> 01:07:33,420 and [INAUDIBLE] subtract this mean from x 1012 01:07:33,420 --> 01:07:34,980 to get this data matrix z. 1013 01:07:38,590 --> 01:07:42,280 So now, let's apply those tools to actually do 1014 01:07:42,280 --> 01:07:44,200 some principal components analysis. 1015 01:07:47,280 --> 01:07:51,150 So principal components analysis is really amazing. 1016 01:07:51,150 --> 01:07:56,010 If you look at single nucleotide polymorphisms and populations 1017 01:07:56,010 --> 01:07:58,860 of people, there are like hundreds of genes 1018 01:07:58,860 --> 01:07:59,800 that you can look at. 1019 01:07:59,800 --> 01:08:05,220 You can look at different variations of a gene 1020 01:08:05,220 --> 01:08:06,960 across hundreds of genes. 1021 01:08:06,960 --> 01:08:09,220 But it's this enormous data set. 1022 01:08:09,220 --> 01:08:11,940 And you can find out which directions 1023 01:08:11,940 --> 01:08:14,550 in that space of genes give you information 1024 01:08:14,550 --> 01:08:17,229 about the genome of people. 1025 01:08:17,229 --> 01:08:21,390 And for example, if you look at a number of genes 1026 01:08:21,390 --> 01:08:23,640 across people with different backgrounds, 1027 01:08:23,640 --> 01:08:26,310 you can see that they're actually clusters corresponding 1028 01:08:26,310 --> 01:08:29,700 to people with different backgrounds. 1029 01:08:29,700 --> 01:08:31,840 You can do single-cell profiling. 1030 01:08:31,840 --> 01:08:35,790 So you can do the same thing in different cells with a tissue. 1031 01:08:35,790 --> 01:08:39,930 So you look at RNA transcriptional profiling. 1032 01:08:39,930 --> 01:08:44,460 You see what are the genes that are being 1033 01:08:44,460 --> 01:08:46,529 expressed in individual cells. 1034 01:08:46,529 --> 01:08:48,569 You can do principal components analysis 1035 01:08:48,569 --> 01:08:50,460 of those different genes and find 1036 01:08:50,460 --> 01:08:53,955 clusters for different cell types within a tissue. 1037 01:08:53,955 --> 01:09:00,029 This is now being applied very commonly in brain tissue now 1038 01:09:00,029 --> 01:09:02,960 to extract different cell types. 1039 01:09:02,960 --> 01:09:07,460 You can use images and find out which components of an image 1040 01:09:07,460 --> 01:09:10,250 actually give you information about different faces. 1041 01:09:10,250 --> 01:09:16,130 So you can find a bunch of different faces, 1042 01:09:16,130 --> 01:09:20,830 find the covariance matrix of those images, 1043 01:09:20,830 --> 01:09:26,439 take that, do eigendecomposition on that covariance matrix. 1044 01:09:26,439 --> 01:09:29,380 And extract what are called eigenfaces. 1045 01:09:29,380 --> 01:09:34,029 These are dimensions on which the images carry information 1046 01:09:34,029 --> 01:09:37,510 about face identity. 1047 01:09:37,510 --> 01:09:40,359 You can use principal components analysis 1048 01:09:40,359 --> 01:09:45,460 to decompose spike waveforms into different spikes. 1049 01:09:45,460 --> 01:09:47,960 This is a very common way of doing spike sorting. 1050 01:09:47,960 --> 01:09:49,819 So when you stick an electrode in the brain, 1051 01:09:49,819 --> 01:09:51,580 you'd record from different cells 1052 01:09:51,580 --> 01:09:53,080 at the end of the electrode. 1053 01:09:53,080 --> 01:09:55,750 Each one of those has a different way of form 1054 01:09:55,750 --> 01:09:59,560 and you can use this method to extract the different waveforms 1055 01:09:59,560 --> 01:10:01,900 people have even recently used this 1056 01:10:01,900 --> 01:10:07,060 now to understand the low-dimensional trajectories 1057 01:10:07,060 --> 01:10:09,940 of movements. 1058 01:10:09,940 --> 01:10:11,905 So if you take a movie-- 1059 01:10:11,905 --> 01:10:14,092 SPEAKER: After tracking, a reconstruction 1060 01:10:14,092 --> 01:10:17,140 of the global trajectory can be made from the stepper motor 1061 01:10:17,140 --> 01:10:19,780 movements, while the local shape changes of the worm 1062 01:10:19,780 --> 01:10:20,815 can be seen in detail. 1063 01:10:24,930 --> 01:10:28,000 MICHALE FEE: OK, so here you see a c elegans, 1064 01:10:28,000 --> 01:10:30,130 a worm moving along. 1065 01:10:30,130 --> 01:10:33,400 This is an image, so it's a very high-dimensional. 1066 01:10:33,400 --> 01:10:36,640 There are 1,000 pixels in this image. 1067 01:10:36,640 --> 01:10:46,030 And you can decompose that image into a trajectory 1068 01:10:46,030 --> 01:10:47,410 in a low-dimensional space. 1069 01:10:47,410 --> 01:10:52,060 And it's been used to describe the movements 1070 01:10:52,060 --> 01:10:54,370 in a low-dimensional space and relate 1071 01:10:54,370 --> 01:10:59,320 to a representation of the neural activity 1072 01:10:59,320 --> 01:11:00,870 in low dimensions as well. 1073 01:11:00,870 --> 01:11:05,520 OK, so it's a very powerful technique. 1074 01:11:05,520 --> 01:11:10,290 So let me just first demonstrate PCA on just some simple 2D 1075 01:11:10,290 --> 01:11:11,200 data. 1076 01:11:11,200 --> 01:11:13,770 So here's a cloud of points given 1077 01:11:13,770 --> 01:11:17,000 by a Gaussian distribution. 1078 01:11:17,000 --> 01:11:19,220 So those are a bunch of vectors x. 1079 01:11:19,220 --> 01:11:23,630 We can transform those vectors x using phi s phi transpose 1080 01:11:23,630 --> 01:11:29,090 to produce a Gaussian, a cloud of points with a Gaussian 1081 01:11:29,090 --> 01:11:32,090 distribution, rotated at 45 degrees, 1082 01:11:32,090 --> 01:11:38,330 and stretched by 1.7-ish along one axis and compressed by that 1083 01:11:38,330 --> 01:11:41,930 amount along another axis. 1084 01:11:41,930 --> 01:11:46,190 So we can build this rotation matrix, this stretch matrix, 1085 01:11:46,190 --> 01:11:49,340 and build a transformation matrix-- 1086 01:11:49,340 --> 01:11:51,551 r, s, r transpose. 1087 01:11:51,551 --> 01:11:52,850 Multiply that by x. 1088 01:11:52,850 --> 01:11:55,530 And that gives us this data set here. 1089 01:11:55,530 --> 01:11:57,070 OK, so we're going to take that data 1090 01:11:57,070 --> 01:12:00,002 set and do principal components analysis on it. 1091 01:12:00,002 --> 01:12:01,460 And what that's going to do is it's 1092 01:12:01,460 --> 01:12:07,130 going to find the dimensions in this data set that 1093 01:12:07,130 --> 01:12:08,900 have the highest variance. 1094 01:12:08,900 --> 01:12:10,970 It's basically going to extract the variance 1095 01:12:10,970 --> 01:12:12,600 in the different dimensions. 1096 01:12:12,600 --> 01:12:14,480 So we take that set of points. 1097 01:12:14,480 --> 01:12:17,990 We just compute the covariance matrix 1098 01:12:17,990 --> 01:12:23,030 by taking z, z transpose, times 1 over m. 1099 01:12:23,030 --> 01:12:25,200 That computes that covariance matrix. 1100 01:12:25,200 --> 01:12:28,370 And then, we're going to use the eig function in MATLAB 1101 01:12:28,370 --> 01:12:31,820 to extract the eigenvectors and eigenvalues 1102 01:12:31,820 --> 01:12:36,945 of the covariance matrix OK, so q-- 1103 01:12:36,945 --> 01:12:40,020 we're going to call q is the variable name 1104 01:12:40,020 --> 01:12:44,550 for the covariance matrix it's zz transpose over m. 1105 01:12:44,550 --> 01:12:46,050 Call eig of q. 1106 01:12:48,910 --> 01:12:53,880 That returns the rotation matrix. 1107 01:12:53,880 --> 01:12:57,540 And that rotation matrix, the columns of which 1108 01:12:57,540 --> 01:13:02,130 are the eigenvectors, it returns the matrix of eigenvalues, 1109 01:13:02,130 --> 01:13:06,270 the diagonal elements are the eigenvalues. 1110 01:13:06,270 --> 01:13:09,480 Sometimes, you need to do a flip-left-right 1111 01:13:09,480 --> 01:13:13,800 because I sometimes return the lowest eigenvalues first. 1112 01:13:13,800 --> 01:13:18,570 But I generally want to plot put the largest eigenvalue first. 1113 01:13:18,570 --> 01:13:21,390 So there's the largest one, there's the smallest one. 1114 01:13:23,920 --> 01:13:27,850 And now, what we do, is we simply rotate. 1115 01:13:27,850 --> 01:13:30,070 We [AUDIO OUT] basis. 1116 01:13:30,070 --> 01:13:35,050 We can rotate this data set using the rotation 1117 01:13:35,050 --> 01:13:41,540 matrix that the principal components analysis found. 1118 01:13:41,540 --> 01:13:44,690 OK, so we compute the covariance matrix. 1119 01:13:44,690 --> 01:13:46,910 Find the eigenvectors and eigenvalues 1120 01:13:46,910 --> 01:13:50,180 of the covariance matrix right there. 1121 01:13:50,180 --> 01:13:53,920 And then, we just rotate the data 1122 01:13:53,920 --> 01:13:59,470 set into that new basis of eigenvectors and eigenvalues. 1123 01:14:02,380 --> 01:14:04,270 It's useful for clustering. 1124 01:14:04,270 --> 01:14:09,320 So if we have two clusters, we can take the clusters, 1125 01:14:09,320 --> 01:14:11,630 compute the covariance matrix. 1126 01:14:11,630 --> 01:14:13,610 Find the eigenvectors and eigenvalues 1127 01:14:13,610 --> 01:14:16,810 of that covariance matrix. 1128 01:14:16,810 --> 01:14:22,140 And then, rotate the data set into a basis set 1129 01:14:22,140 --> 01:14:27,460 in which the dimensions in the data on which 1130 01:14:27,460 --> 01:14:34,935 variances largest are along the standard basis vectors. 1131 01:14:40,900 --> 01:14:42,920 Let's look at a problem in the time domain. 1132 01:14:42,920 --> 01:14:48,400 So here we have a couple of time-dependent signals. 1133 01:14:48,400 --> 01:14:53,530 So this is some amplitude as a function of time. 1134 01:14:53,530 --> 01:14:55,910 These are signals that I constructed. 1135 01:14:55,910 --> 01:15:02,240 They're some wiggly function that I added noise to. 1136 01:15:02,240 --> 01:15:06,190 What we do is we take each one of those times series, 1137 01:15:06,190 --> 01:15:08,410 and we stack them up in a bunch of columns. 1138 01:15:08,410 --> 01:15:15,210 So our vector is now a set of 100 time samples. 1139 01:15:15,210 --> 01:15:19,396 So there is a vector of 100 different time points. 1140 01:15:19,396 --> 01:15:21,630 Does that make sense? 1141 01:15:21,630 --> 01:15:28,440 And we have 200 observations of those 100-dimensional vectors. 1142 01:15:28,440 --> 01:15:34,270 So we have a data vector x that has columns. 1143 01:15:34,270 --> 01:15:35,790 That are hundreds dimensional. 1144 01:15:35,790 --> 01:15:37,950 And we have 200 of those observations. 1145 01:15:37,950 --> 01:15:40,800 So it's 100-by-200 matrix. 1146 01:15:40,800 --> 01:15:43,330 100-by-200 matrix. 1147 01:15:43,330 --> 01:15:47,140 We do the means subtraction we subtract the mean using 1148 01:15:47,140 --> 01:15:50,570 that trick that I showed you. 1149 01:15:50,570 --> 01:15:52,710 Compute the covariance matrix. 1150 01:15:52,710 --> 01:15:54,500 So there we compute the mean. 1151 01:15:54,500 --> 01:15:57,260 We subtract the mean using repmat. 1152 01:15:57,260 --> 01:16:00,110 Subtract the mean from the data to get z. 1153 01:16:00,110 --> 01:16:03,560 Compute the covariance matrix Q. That's 1154 01:16:03,560 --> 01:16:08,610 what the covariance matrix looks like for those data. 1155 01:16:08,610 --> 01:16:14,190 And now, we plug it into eig to extract the eigenvectors 1156 01:16:14,190 --> 01:16:16,450 and eigenvalues. 1157 01:16:16,450 --> 01:16:23,410 OK, so extract F and V. If we look at the eigenvalues, 1158 01:16:23,410 --> 01:16:26,050 you can see that there are hundreds 1159 01:16:26,050 --> 01:16:30,050 eigenvalues because those data have 100 dimensions. 1160 01:16:30,050 --> 01:16:32,170 So there are hundreds eigenvalues. 1161 01:16:32,170 --> 01:16:35,860 You could see that two of those eigenvalues are big, 1162 01:16:35,860 --> 01:16:38,080 and the rest are small. 1163 01:16:38,080 --> 01:16:40,560 This is on a log scale. 1164 01:16:40,560 --> 01:16:44,230 What that says is that almost all 1165 01:16:44,230 --> 01:16:48,350 of the variance in these data exist in just two dimensions. 1166 01:16:48,350 --> 01:16:50,650 It's 100-dimensional space. 1167 01:16:50,650 --> 01:16:54,280 But the data are living in two dimensions. 1168 01:16:54,280 --> 01:16:55,810 And all the rest is noise. 1169 01:16:58,498 --> 01:16:59,730 Does that makes sense? 1170 01:17:03,410 --> 01:17:06,740 So what you'll typically do is take some data, 1171 01:17:06,740 --> 01:17:10,010 compute the covariance matrix, find the eigenvalues, 1172 01:17:10,010 --> 01:17:12,770 and look at the spectrum of eigenvalues. 1173 01:17:12,770 --> 01:17:15,520 And you'll very often see that there 1174 01:17:15,520 --> 01:17:18,945 is a lot of variance in a small subset of eigenvalues. 1175 01:17:18,945 --> 01:17:22,250 Then, it tells you that the data are really 1176 01:17:22,250 --> 01:17:27,320 living in a lower-dimensional subspace 1177 01:17:27,320 --> 01:17:30,800 than the full dimensionality of the data. 1178 01:17:30,800 --> 01:17:32,330 So that's where your signal is. 1179 01:17:32,330 --> 01:17:34,310 And all the rest of that is noise. 1180 01:17:34,310 --> 01:17:36,050 You can plot the cumulative of this. 1181 01:17:36,050 --> 01:17:38,120 And you can say that the first two 1182 01:17:38,120 --> 01:17:45,080 components explain over 60% of the total variance in the data. 1183 01:17:45,080 --> 01:17:47,710 So since there are two large eigenvalues, 1184 01:17:47,710 --> 01:17:50,560 let's look at the eigenvectors associated with those. 1185 01:17:50,560 --> 01:17:52,420 And we can find those. 1186 01:17:52,420 --> 01:17:56,380 Those are just the first two columns of this matrix F 1187 01:17:56,380 --> 01:17:58,660 that the eig function returned to us. 1188 01:17:58,660 --> 01:18:02,050 And that's what those two eigenvectors look like. 1189 01:18:04,940 --> 01:18:07,250 That's what the original data looked like. 1190 01:18:07,250 --> 01:18:10,310 The eigenvectors, the columns of the F matrix, 1191 01:18:10,310 --> 01:18:13,330 are an orthogonal basis set. 1192 01:18:13,330 --> 01:18:16,520 A new basis set. 1193 01:18:16,520 --> 01:18:21,300 So those are the first two eigenvectors. 1194 01:18:21,300 --> 01:18:23,540 And you can see that the signal lives 1195 01:18:23,540 --> 01:18:27,320 in this low-dimensional space of these two eigenvectors. 1196 01:18:27,320 --> 01:18:29,750 All of the other eigenvectors are just noise. 1197 01:18:34,330 --> 01:18:42,330 So we can do is we can project the data into this new basis 1198 01:18:42,330 --> 01:18:43,620 set. 1199 01:18:43,620 --> 01:18:44,920 So let's do that. 1200 01:18:44,920 --> 01:18:49,370 We simply do a change of basis. 1201 01:18:49,370 --> 01:18:52,430 The f is a rotation matrix. 1202 01:18:52,430 --> 01:18:56,420 We can project our data z into this new basis set 1203 01:18:56,420 --> 01:18:58,710 and see what it looks like. 1204 01:18:58,710 --> 01:19:00,330 Turns out, that's what it looks like. 1205 01:19:00,330 --> 01:19:06,210 There are two clusters in those data corresponding 1206 01:19:06,210 --> 01:19:09,900 to the two different wave forms that you could see in the data. 1207 01:19:12,317 --> 01:19:14,150 Right there, you can see that there are kind 1208 01:19:14,150 --> 01:19:15,830 of two wave forms in the data. 1209 01:19:18,470 --> 01:19:21,390 If you projected data into this low-dimensional space, 1210 01:19:21,390 --> 01:19:23,330 you can see that there are two clusters there. 1211 01:19:25,940 --> 01:19:32,690 If you projected data into other projections, you don't see it. 1212 01:19:32,690 --> 01:19:35,330 It's only in this particular projection 1213 01:19:35,330 --> 01:19:37,965 that you have these two very distinct clusters 1214 01:19:37,965 --> 01:19:39,590 corresponding to the two different wave 1215 01:19:39,590 --> 01:19:42,670 forms in the data. 1216 01:19:42,670 --> 01:19:47,050 Now, almost all of the variance is 1217 01:19:47,050 --> 01:19:50,090 in the space of the first two principal components. 1218 01:19:50,090 --> 01:19:52,060 So what you can actually do is, you 1219 01:19:52,060 --> 01:19:56,800 can project the data into these first two principal components, 1220 01:19:56,800 --> 01:20:00,310 set all of the other principal components to zero, 1221 01:20:00,310 --> 01:20:03,410 and then rotate back to the original basis set. 1222 01:20:03,410 --> 01:20:06,490 That is that you're setting as much of the noise 1223 01:20:06,490 --> 01:20:07,910 to zero as you can. 1224 01:20:07,910 --> 01:20:10,900 You're getting rid of most of the noise. 1225 01:20:10,900 --> 01:20:14,050 And then, when you rotate back to the original basis set, 1226 01:20:14,050 --> 01:20:15,820 you've gotten rid of most of the noise. 1227 01:20:15,820 --> 01:20:19,090 And that's called principal components filtering. 1228 01:20:19,090 --> 01:20:23,110 So here's before filtering and here's after filtering. 1229 01:20:23,110 --> 01:20:27,400 OK, so youve found the low-dimensional space, 1230 01:20:27,400 --> 01:20:31,510 in which all the data sits, in which the signal sits, 1231 01:20:31,510 --> 01:20:34,430 everything outside of that space is noise. 1232 01:20:34,430 --> 01:20:40,030 So you rotate the data into a new basis set. 1233 01:20:40,030 --> 01:20:42,760 You can filter out all the other dimensions 1234 01:20:42,760 --> 01:20:44,140 that just have noise. 1235 01:20:44,140 --> 01:20:45,880 You filter back. 1236 01:20:45,880 --> 01:20:49,040 And you just keep the signal. 1237 01:20:49,040 --> 01:20:49,710 And that's it. 1238 01:20:49,710 --> 01:20:53,400 So that's sort of a brief intro to principal component 1239 01:20:53,400 --> 01:20:54,550 analysis. 1240 01:20:54,550 --> 01:20:57,110 But there are a lot of things you can use it for. 1241 01:20:57,110 --> 01:20:58,180 It's a lot of fun. 1242 01:20:58,180 --> 01:21:00,450 And it's a great intro to all the other 1243 01:21:00,450 --> 01:21:03,900 amazing dimensionality reduction techniques that there are. 1244 01:21:03,900 --> 01:21:06,530 I apologize for going over.