1
00:00:00,290 --> 00:00:03,060
The purpose of this segment is
to give you a little bit of
2
00:00:03,060 --> 00:00:04,510
the bigger picture.
3
00:00:04,510 --> 00:00:07,750
We did discuss some
inequalities, we did discuss
4
00:00:07,750 --> 00:00:09,820
convergence of the
sample mean--
5
00:00:09,820 --> 00:00:13,190
that's the weak law of large
numbers-- and we did discuss a
6
00:00:13,190 --> 00:00:15,330
particular notion of convergence
of random
7
00:00:15,330 --> 00:00:17,860
variables, convergence
in probability.
8
00:00:17,860 --> 00:00:21,510
How far can we take
those topics?
9
00:00:21,510 --> 00:00:23,790
Let's start with the issue
of inequalities.
10
00:00:23,790 --> 00:00:27,010
Here, one would like to obtain
bounds and approximations on
11
00:00:27,010 --> 00:00:30,620
tail probabilities that are
better than the Markov and
12
00:00:30,620 --> 00:00:33,020
Cherbyshev inequalities
that we have seen.
13
00:00:33,020 --> 00:00:34,740
This is indeed possible.
14
00:00:34,740 --> 00:00:38,300
For example, there is a
so-called Chernoff bound that
15
00:00:38,300 --> 00:00:40,740
takes the following form.
16
00:00:40,740 --> 00:00:44,630
The Chernoff bound tells us that
the probability that the
17
00:00:44,630 --> 00:00:51,030
sample mean is away from the
true mean by at least a, where
18
00:00:51,030 --> 00:00:55,710
a is a positive number, this
probability is bounded above
19
00:00:55,710 --> 00:01:00,290
by a function that falls
exponentially with n and where
20
00:01:00,290 --> 00:01:03,700
the exponent depends on the
particular number, a, that we
21
00:01:03,700 --> 00:01:04,860
are considering.
22
00:01:04,860 --> 00:01:07,110
But in any case, this
term in the exponent
23
00:01:07,110 --> 00:01:09,510
is a positive quantity.
24
00:01:09,510 --> 00:01:14,100
Notice that this is much better,
much stronger than
25
00:01:14,100 --> 00:01:17,900
what we obtained from the
Cherbyshev inequality because
26
00:01:17,900 --> 00:01:21,560
in the Cherbyshev inequality,
we only obtain an inequality
27
00:01:21,560 --> 00:01:24,470
for this probability that
falls off as the
28
00:01:24,470 --> 00:01:26,950
rate of 1 over n.
29
00:01:26,950 --> 00:01:29,830
So this falls much faster, and
so it tells us that this
30
00:01:29,830 --> 00:01:33,550
probability is indeed much
smaller than what the
31
00:01:33,550 --> 00:01:36,100
Cherbyshev inequality
might predict.
32
00:01:36,100 --> 00:01:38,860
However, this inequality
requires some additional
33
00:01:38,860 --> 00:01:42,259
assumptions on the random
variables involved.
34
00:01:42,259 --> 00:01:45,880
Another type of approximation
on this tail probability can
35
00:01:45,880 --> 00:01:49,120
be obtained through the central
limit theorem, which
36
00:01:49,120 --> 00:01:51,060
will actually be the next topic
37
00:01:51,060 --> 00:01:52,960
that we will be studying.
38
00:01:52,960 --> 00:01:57,660
Very loosely speaking, the
central limit theorem tells us
39
00:01:57,660 --> 00:02:03,530
that the random variable M sub
n, which is the sample mean,
40
00:02:03,530 --> 00:02:09,910
behaves as if it were a normal
random variable with the mean
41
00:02:09,910 --> 00:02:13,380
and the variance that
it should have.
42
00:02:13,380 --> 00:02:16,650
We know that this is the mean
and the variance of the sample
43
00:02:16,650 --> 00:02:19,840
mean, but the central limit
theorem tells us that in
44
00:02:19,840 --> 00:02:23,800
addition to that, we can also
pretend that the sample mean
45
00:02:23,800 --> 00:02:27,870
is normal and carry out
approximations as if this were
46
00:02:27,870 --> 00:02:30,020
a normal random variable.
47
00:02:30,020 --> 00:02:32,640
Now, this statement that
I'm making here is
48
00:02:32,640 --> 00:02:34,660
only a loose statement.
49
00:02:34,660 --> 00:02:38,550
It is not mathematically
completely accurate.
50
00:02:38,550 --> 00:02:41,579
We will see later a more
accurate statement of the
51
00:02:41,579 --> 00:02:43,640
central limit theorem.
52
00:02:43,640 --> 00:02:47,090
In a different direction, we can
talk about different types
53
00:02:47,090 --> 00:02:48,750
of convergence.
54
00:02:48,750 --> 00:02:52,550
We did define convergence in
probability, but that's not
55
00:02:52,550 --> 00:02:54,490
the only notion of convergence
that's
56
00:02:54,490 --> 00:02:56,350
relevant to random variables.
57
00:02:56,350 --> 00:02:59,160
There's an alternative notion,
which is convergence with
58
00:02:59,160 --> 00:03:00,770
probability one.
59
00:03:00,770 --> 00:03:03,240
Here is what it means.
60
00:03:03,240 --> 00:03:06,240
We have a single probabilistic
experiment.
61
00:03:06,240 --> 00:03:09,290
And within that the experiment,
we have a sequence
62
00:03:09,290 --> 00:03:15,220
of random variables and another
random variable, and
63
00:03:15,220 --> 00:03:18,160
we want to talk about this
random variable converging to
64
00:03:18,160 --> 00:03:20,360
that random variable.
65
00:03:20,360 --> 00:03:22,280
What do we mean by that?
66
00:03:22,280 --> 00:03:27,040
We consider a typical outcome
of the experiment, that is,
67
00:03:27,040 --> 00:03:29,110
some omega.
68
00:03:29,110 --> 00:03:33,730
Look at the values of the random
variable Yn under that
69
00:03:33,730 --> 00:03:38,530
particular omega, and look at
that sequence of values, the
70
00:03:38,530 --> 00:03:40,890
values of the different random
variables under that
71
00:03:40,890 --> 00:03:42,740
particular outcome.
72
00:03:42,740 --> 00:03:47,010
Under that particular outcome, Y
also has a certain numerical
73
00:03:47,010 --> 00:03:53,070
value, and we're interested in
whether this convergence takes
74
00:03:53,070 --> 00:03:56,760
place as n goes to infinity.
75
00:03:56,760 --> 00:03:59,520
Now for some outcomes, omega,
this will happen.
76
00:03:59,520 --> 00:04:02,080
For some, it will not happen.
77
00:04:02,080 --> 00:04:04,730
We will say that we have
convergence with probability
78
00:04:04,730 --> 00:04:13,870
one if this event has
probability equal to 1.
79
00:04:13,870 --> 00:04:18,060
That is, there is probability
one, that is, essential
80
00:04:18,060 --> 00:04:21,760
certainty, that when an outcome
of the experiment is
81
00:04:21,760 --> 00:04:25,380
obtained, the resulting sequence
of values of the
82
00:04:25,380 --> 00:04:28,730
random variables Yn will
converge to the value of the
83
00:04:28,730 --> 00:04:30,980
random variable Y.
84
00:04:30,980 --> 00:04:35,010
Now, this definition is easy to
write down, but to actually
85
00:04:35,010 --> 00:04:37,670
understand what it really
means and the ways it is
86
00:04:37,670 --> 00:04:41,770
different from convergence in
probability is not so easy.
87
00:04:41,770 --> 00:04:45,050
It does take some conceptual
effort, and we will not
88
00:04:45,050 --> 00:04:47,880
discuss it any further
at this point.
89
00:04:47,880 --> 00:04:50,670
Let me just say that this
is a stronger notion of
90
00:04:50,670 --> 00:04:51,659
convergence.
91
00:04:51,659 --> 00:04:54,710
If you have convergence with
probability one, you also gets
92
00:04:54,710 --> 00:04:57,080
convergence in probability.
93
00:04:57,080 --> 00:05:02,880
And it turns out that the law
of large numbers also holds
94
00:05:02,880 --> 00:05:06,280
under this stronger notion
of convergence.
95
00:05:06,280 --> 00:05:10,420
That is, we have that the sample
mean converges to the
96
00:05:10,420 --> 00:05:14,170
true mean with probability
one.
97
00:05:14,170 --> 00:05:17,970
This is the so-called strong
law of large numbers, and
98
00:05:17,970 --> 00:05:21,050
because this is a stronger
notion of convergence, a more
99
00:05:21,050 --> 00:05:26,320
demanding one, that's why this
is called the strong law.
100
00:05:26,320 --> 00:05:30,490
Incidentally, at this point, you
might be quite uncertain
101
00:05:30,490 --> 00:05:34,470
and confused as to what is
really the difference between
102
00:05:34,470 --> 00:05:37,040
these two notions
of convergence.
103
00:05:37,040 --> 00:05:40,470
The definitions do look
different, but what is the
104
00:05:40,470 --> 00:05:41,900
real difference?
105
00:05:41,900 --> 00:05:44,770
This is quite subtle,
and it does take
106
00:05:44,770 --> 00:05:46,270
quite a bit of thinking.
107
00:05:46,270 --> 00:05:50,090
It's not supposed to be
something that is obvious.
108
00:05:50,090 --> 00:05:53,190
So the purpose of this
discussion is only to point
109
00:05:53,190 --> 00:05:57,480
out these further directions
but without, at this point,
110
00:05:57,480 --> 00:05:59,890
going into it in any depth.
111
00:05:59,890 --> 00:06:03,730
Finally, there is another notion
of convergence in which
112
00:06:03,730 --> 00:06:06,460
we're looking at the
distributions of the random
113
00:06:06,460 --> 00:06:08,070
variables involved.
114
00:06:08,070 --> 00:06:10,790
So we may have a sequence
of random variables.
115
00:06:10,790 --> 00:06:13,770
Each one of them has a certain
distribution described by a
116
00:06:13,770 --> 00:06:17,680
CDF, and we can ask the
question, does this sequence
117
00:06:17,680 --> 00:06:21,510
of CDFs converge to
a limiting CDF?
118
00:06:21,510 --> 00:06:24,800
If that happens, then we say
that we have convergence in
119
00:06:24,800 --> 00:06:28,450
distribution, and this is
more or less the type of
120
00:06:28,450 --> 00:06:32,220
convergence that shows up when
we deal with the central limit
121
00:06:32,220 --> 00:06:35,040
theorem because this is really
a statement about
122
00:06:35,040 --> 00:06:37,560
distributions, that the
distribution of the sample
123
00:06:37,560 --> 00:06:41,570
mean in some sense starts to
approach the distribution of a
124
00:06:41,570 --> 00:06:42,820
normal random variable.