1
00:00:01,000 --> 00:00:03,540
We have presented
the complete solution
2
00:00:03,540 --> 00:00:06,920
to the liner least mean squares
estimation problem, when
3
00:00:06,920 --> 00:00:10,430
we want to estimate a certain
unknown random variable
4
00:00:10,430 --> 00:00:13,720
on the basis of a
different random variable X
5
00:00:13,720 --> 00:00:15,550
that we get to observe.
6
00:00:15,550 --> 00:00:19,720
But what if we have
multiple observations?
7
00:00:19,720 --> 00:00:23,510
What would be the analogous
formulation of the problem?
8
00:00:23,510 --> 00:00:24,950
Here's the idea.
9
00:00:24,950 --> 00:00:28,320
Once more, we restrict
ourselves to estimators
10
00:00:28,320 --> 00:00:31,970
that are linear functions of
the data, linear functions
11
00:00:31,970 --> 00:00:34,280
of the observations
that we have.
12
00:00:34,280 --> 00:00:37,670
And then we pose the
problem of finding the best
13
00:00:37,670 --> 00:00:42,570
choices of these coefficients
a1 up to a n and b.
14
00:00:42,570 --> 00:00:45,540
What does it mean to
find the best choices?
15
00:00:45,540 --> 00:00:49,010
It means that if we
fix certain choices,
16
00:00:49,010 --> 00:00:52,170
we obtain an estimator,
we look at the difference
17
00:00:52,170 --> 00:00:54,480
between the estimator
and the quantity
18
00:00:54,480 --> 00:00:56,700
we're trying to estimate,
take the square,
19
00:00:56,700 --> 00:00:58,880
and then take the expectation.
20
00:00:58,880 --> 00:01:01,910
So once more, we're looking
at the mean squared error
21
00:01:01,910 --> 00:01:06,970
of our estimator and we try to
make it as small as possible.
22
00:01:06,970 --> 00:01:10,760
So this is a well-defined
optimization problem.
23
00:01:10,760 --> 00:01:15,830
We have a quantity, which is a
function of certain parameters.
24
00:01:15,830 --> 00:01:19,050
And we wish to find the
choices for those parameters,
25
00:01:19,050 --> 00:01:21,420
or those coefficients,
that will make
26
00:01:21,420 --> 00:01:24,930
this quantity as
small as possible.
27
00:01:24,930 --> 00:01:27,820
One first comment is
similar to the case
28
00:01:27,820 --> 00:01:30,920
where we had a single
measurement [and]
29
00:01:30,920 --> 00:01:32,280
is the following.
30
00:01:32,280 --> 00:01:35,560
If it turns out that the
conditional expectation
31
00:01:35,560 --> 00:01:38,590
of Theta given all
of the data that we
32
00:01:38,590 --> 00:01:44,440
have is linear in X, if it is
of this form, then what happens?
33
00:01:44,440 --> 00:01:47,990
We know that this is the
best possible estimator.
34
00:01:47,990 --> 00:01:51,720
If it is also linear, then
it is the best estimator
35
00:01:51,720 --> 00:01:55,470
within the class of
linear estimators as well
36
00:01:55,470 --> 00:01:59,100
and, therefore, the linear
least mean squares estimator
37
00:01:59,100 --> 00:02:03,800
is the same as the general
least mean squares estimator.
38
00:02:03,800 --> 00:02:08,050
So if for some problems it
turns out that this is linear,
39
00:02:08,050 --> 00:02:13,240
then we automatically also have
the optimal linear estimator.
40
00:02:13,240 --> 00:02:15,520
And this is going to
be the case, once more,
41
00:02:15,520 --> 00:02:20,560
for certain normal problems with
a linear structure of the type
42
00:02:20,560 --> 00:02:22,520
that we have studied earlier.
43
00:02:25,740 --> 00:02:28,870
Now, let us look
into what it takes
44
00:02:28,870 --> 00:02:32,079
to carry out this optimization.
45
00:02:32,079 --> 00:02:35,100
If we had a single
observation, then we
46
00:02:35,100 --> 00:02:38,710
have seen a closed form formula,
a fairly simple formula,
47
00:02:38,710 --> 00:02:41,650
that tells us what the
coefficients should be.
48
00:02:41,650 --> 00:02:43,920
For the more general
case, formulas
49
00:02:43,920 --> 00:02:47,090
would not be as
simple, but we can
50
00:02:47,090 --> 00:02:49,700
make the following observations.
51
00:02:49,700 --> 00:02:53,510
If you take this
expression and expand it,
52
00:02:53,510 --> 00:02:56,250
it's going to have
a bunch of terms.
53
00:02:56,250 --> 00:03:00,650
For example, it's going to have
a term of the form a1 squared
54
00:03:00,650 --> 00:03:04,730
times the expected
value of X1 squared.
55
00:03:04,730 --> 00:03:11,590
It's going to have a term
such as twice a1, a2 times
56
00:03:11,590 --> 00:03:16,150
the expected value of X1, X2.
57
00:03:16,150 --> 00:03:20,760
And then there's going to be
many more terms to some of them
58
00:03:20,760 --> 00:03:26,920
will also involve products
of Theta with this.
59
00:03:26,920 --> 00:03:32,829
So we might see that we have
a term of the form a1 expected
60
00:03:32,829 --> 00:03:36,290
value of X1 Theta.
61
00:03:36,290 --> 00:03:40,010
And then, there's going to
be many, many more terms.
62
00:03:40,010 --> 00:03:42,350
What's the important
thing to notice?
63
00:03:42,350 --> 00:03:46,980
That this expression as a
function of the coefficient
64
00:03:46,980 --> 00:03:49,526
involves terms
either of this kind
65
00:03:49,526 --> 00:03:51,570
or of this kind,
or of that kind,
66
00:03:51,570 --> 00:03:55,800
first-order or
second-order terms.
67
00:03:55,800 --> 00:03:57,430
To minimize this
expression, we're
68
00:03:57,430 --> 00:04:02,730
going to take the derivative
of this and set it equal to 0.
69
00:04:02,730 --> 00:04:06,210
When you take the derivative
of a function that
70
00:04:06,210 --> 00:04:09,660
involves only quadratic
and linear terms,
71
00:04:09,660 --> 00:04:14,410
you get something that's
linear in the coefficients.
72
00:04:14,410 --> 00:04:16,730
The conclusion out of
all this discussion
73
00:04:16,730 --> 00:04:21,480
is that when you actually go
and carry out this minimization
74
00:04:21,480 --> 00:04:23,930
by setting derivatives
to zero, what you
75
00:04:23,930 --> 00:04:29,130
will end up doing is solving
a system of linear equations
76
00:04:29,130 --> 00:04:32,085
in the coefficients that
you're trying to determine.
77
00:04:32,085 --> 00:04:34,310
And why is this interesting?
78
00:04:34,310 --> 00:04:36,650
Well, it is because
if you actually
79
00:04:36,650 --> 00:04:39,010
want to carry out
this minimization,
80
00:04:39,010 --> 00:04:43,050
all you need to do is to solve
a linear system, which is easily
81
00:04:43,050 --> 00:04:46,370
done on a computer.
82
00:04:46,370 --> 00:04:51,100
The next observation is
that this expression only
83
00:04:51,100 --> 00:04:55,860
involves expectations
of various terms
84
00:04:55,860 --> 00:04:59,750
that are second order in the
random variables involved.
85
00:04:59,750 --> 00:05:02,950
So it involves the expected
value of X1 squared,
86
00:05:02,950 --> 00:05:05,050
it involves this term,
which has something
87
00:05:05,050 --> 00:05:07,960
to do with the
covariance of X1 and X2.
88
00:05:07,960 --> 00:05:11,280
This term that has something
to do with the covariance of X1
89
00:05:11,280 --> 00:05:12,910
with Theta.
90
00:05:12,910 --> 00:05:17,480
But these are the only terms out
of the distribution of the X's
91
00:05:17,480 --> 00:05:20,310
and of Theta that will matter.
92
00:05:20,310 --> 00:05:25,420
So similar to the case where
we had a single observation,
93
00:05:25,420 --> 00:05:27,360
in order to solve
this problem, we
94
00:05:27,360 --> 00:05:31,590
do not need to know the
complete distribution of the X's
95
00:05:31,590 --> 00:05:32,705
and of Theta.
96
00:05:32,705 --> 00:05:35,570
It is enough to know
all of the means,
97
00:05:35,570 --> 00:05:39,040
variances, and covariances
of the random variables
98
00:05:39,040 --> 00:05:40,550
that are involved.
99
00:05:40,550 --> 00:05:43,390
And once more, this
makes this approach
100
00:05:43,390 --> 00:05:47,060
to estimation a practical
one, because we do not
101
00:05:47,060 --> 00:05:50,090
need to model in complete
detail the distribution
102
00:05:50,090 --> 00:05:53,470
of the different
random variables.
103
00:05:53,470 --> 00:05:58,130
Finally, if we do not have just
one unknown random variable,
104
00:05:58,130 --> 00:06:00,570
but we have multiple
random variables that we
105
00:06:00,570 --> 00:06:03,740
want to estimate,
what should we do?
106
00:06:03,740 --> 00:06:05,800
Well, this is pretty simple.
107
00:06:05,800 --> 00:06:08,250
You just apply this
estimation methodology
108
00:06:08,250 --> 00:06:13,390
to each one of the unknown
random variables separately.
109
00:06:13,390 --> 00:06:18,720
To conclude, this linear
estimation methodology
110
00:06:18,720 --> 00:06:23,900
applies also to the case where
you have multiple observations.
111
00:06:23,900 --> 00:06:27,120
You need to solve a certain
computational problem in order
112
00:06:27,120 --> 00:06:30,390
to find the structure of
the best linear estimator,
113
00:06:30,390 --> 00:06:33,640
but it is not a very difficult
computational problem,
114
00:06:33,640 --> 00:06:36,260
because all that it
involves is to minimize
115
00:06:36,260 --> 00:06:38,780
a quadratic function
of the coefficients
116
00:06:38,780 --> 00:06:40,720
that you are trying
to determine.
117
00:06:40,720 --> 00:06:43,130
And this leads us
to having to solve
118
00:06:43,130 --> 00:06:45,230
a system of linear equations.
119
00:06:45,230 --> 00:06:48,420
For all these reasons,
linear estimation,
120
00:06:48,420 --> 00:06:53,310
or estimation using linear
estimators, is quite practical.