WEBVTT
00:00:00.060 --> 00:00:01.780
The following
content is provided
00:00:01.780 --> 00:00:04.019
under a Creative
Commons license.
00:00:04.019 --> 00:00:06.870
Your support will help MIT
OpenCourseWare continue
00:00:06.870 --> 00:00:10.730
to offer high quality
educational resources for free.
00:00:10.730 --> 00:00:13.330
To make a donation or
view additional materials
00:00:13.330 --> 00:00:15.780
from hundreds of
MIT courses, visit
00:00:15.780 --> 00:00:20.720
MIT OpenCourseWare
at ocw.mit.edu
00:00:20.720 --> 00:00:24.420
PROFESSOR: So we started by
talking about thermodynamics.
00:00:24.420 --> 00:00:28.500
And then switched off to
talking about probability.
00:00:28.500 --> 00:00:31.850
And you may well ask, what's
the connection between these?
00:00:31.850 --> 00:00:35.030
And we will eventually try
to build that connection
00:00:35.030 --> 00:00:36.920
through statistical physics.
00:00:36.920 --> 00:00:39.890
And maybe this lecture
today will sort of
00:00:39.890 --> 00:00:44.670
provide you with why these
elements of probability
00:00:44.670 --> 00:00:48.040
are important and essential
to making this bridge.
00:00:48.040 --> 00:00:51.920
So last time, I started with
talking about the Central Limit
00:00:51.920 --> 00:01:02.950
Theorem which pertains to
adding lots of variables
00:01:02.950 --> 00:01:06.810
together to form a sum.
00:01:06.810 --> 00:01:10.100
And the control parameter
that we will use
00:01:10.100 --> 00:01:12.090
is this number of
terms in the sum.
00:01:15.170 --> 00:01:21.190
So in principle,
there's a joint PDF
00:01:21.190 --> 00:01:25.740
that determines how these
variables are distributed.
00:01:29.380 --> 00:01:34.600
And using that, we can calculate
various characteristics
00:01:34.600 --> 00:01:36.410
of this sum.
00:01:36.410 --> 00:01:41.310
If I were to raise the
sum to some power m,
00:01:41.310 --> 00:01:46.840
I could do that by doing a sum
over i running from let's say
00:01:46.840 --> 00:01:55.780
i1 running from 1 to N, i2
running from-- im running
00:01:55.780 --> 00:01:59.940
from 1 to N, so basically
speaking this sum.
00:01:59.940 --> 00:02:05.880
And then I have x of
i1, x of i2, x of im.
00:02:05.880 --> 00:02:10.160
So basically I multiplied m
copies of the original sum
00:02:10.160 --> 00:02:11.690
together.
00:02:11.690 --> 00:02:17.560
And if I were to calculate
some moment of this,
00:02:17.560 --> 00:02:22.010
basically the moment of a sum
is the sum of the moments.
00:02:22.010 --> 00:02:24.320
I could do this.
00:02:24.320 --> 00:02:27.570
Now the last thing
that we did last time
00:02:27.570 --> 00:02:30.050
was to look at some
characteristic function
00:02:30.050 --> 00:02:32.740
for the sum related to the
characteristic function
00:02:32.740 --> 00:02:36.210
of this joint
probability distribution,
00:02:36.210 --> 00:02:40.030
and conclude that actually
exactly the same relation holds
00:02:40.030 --> 00:02:44.020
if I were to put index
c for a cumulant.
00:02:44.020 --> 00:02:48.240
And that is basically, say the
mean is the sum of the means,
00:02:48.240 --> 00:02:51.190
the variance is sum of
all possible variances
00:02:51.190 --> 00:02:52.640
and covariances.
00:02:52.640 --> 00:02:56.390
And this holds to all orders.
00:02:56.390 --> 00:02:56.890
OK?
00:02:56.890 --> 00:02:57.390
Fine.
00:02:57.390 --> 00:03:00.390
So where do we go from here?
00:03:00.390 --> 00:03:04.370
We are going to gradually
simplify the problem in order
00:03:04.370 --> 00:03:08.150
to get some final
result that we want.
00:03:08.150 --> 00:03:11.780
But that result eventually
is a little bit more general
00:03:11.780 --> 00:03:13.920
than the simplification.
00:03:13.920 --> 00:03:15.810
The first simplification
that we do
00:03:15.810 --> 00:03:17.780
is to look at
independent variables.
00:03:22.870 --> 00:03:26.340
And what happened when we
had the independent variables
00:03:26.340 --> 00:03:28.700
was that the
probability distribution
00:03:28.700 --> 00:03:32.300
could be written as the product
of probability distributions
00:03:32.300 --> 00:03:33.990
pertaining to different ones.
00:03:33.990 --> 00:03:41.290
I would have a p1 acting
on x1, a p2 acting on x2,
00:03:41.290 --> 00:03:43.170
a pn acting on the xn.
00:03:49.430 --> 00:03:53.650
Now, when we did that,
we saw that actually one
00:03:53.650 --> 00:03:57.185
of the conditions that would
then follow from this if we
00:03:57.185 --> 00:04:00.150
were to Fourier transform and
then try to expand in powers
00:04:00.150 --> 00:04:04.160
of k, is we would never get in
the expansion of the log terms
00:04:04.160 --> 00:04:07.830
that were coupling
different k's.
00:04:07.830 --> 00:04:13.190
Essentially all of the joint
cumulants involving things
00:04:13.190 --> 00:04:16.560
other than one variable
by itself would vanish.
00:04:16.560 --> 00:04:21.209
So essentially in that
limit, the only terms
00:04:21.209 --> 00:04:23.920
in this that would
survive we're the ones
00:04:23.920 --> 00:04:27.270
in which all of the
indices were the same.
00:04:27.270 --> 00:04:30.090
So basically in that
case, I would write this
00:04:30.090 --> 00:04:42.830
as a sum i running from one
to N, xi to the power of N.
00:04:42.830 --> 00:04:46.690
So basically for
independent variables,
00:04:46.690 --> 00:04:50.010
let's say, the variance is
the sum of the variances,
00:04:50.010 --> 00:04:52.980
the third cumulant is the sum of
the third cumulants, et cetera.
00:04:56.050 --> 00:04:59.110
One more simplification.
00:04:59.110 --> 00:05:02.140
Again not necessary
for the final thing
00:05:02.140 --> 00:05:04.710
that we want to have in mind.
00:05:04.710 --> 00:05:06.710
But let's just assume
that all of these
00:05:06.710 --> 00:05:12.959
are identically distributed.
00:05:16.312 --> 00:05:20.920
By that I mean that this is
basically the same probability
00:05:20.920 --> 00:05:23.200
that I would use for
each one of them.
00:05:23.200 --> 00:05:27.350
So this I could write
as a product over i one
00:05:27.350 --> 00:05:32.360
to N, the same p for each xi.
00:05:32.360 --> 00:05:41.190
Just to make sure you sum
notation that you may see every
00:05:41.190 --> 00:05:46.660
now and then, variables that
are independent and identically
00:05:46.660 --> 00:05:48.843
distributed are
sometimes called IID's.
00:05:53.850 --> 00:06:00.010
And if I focus my attention
to these IID's, then
00:06:00.010 --> 00:06:03.620
all of these things are
clearly the same thing.
00:06:03.620 --> 00:06:08.160
And the answer would be
simply N times the cumulant
00:06:08.160 --> 00:06:11.730
that I would have
for one of them.
00:06:11.730 --> 00:06:14.310
This-- actually some
version of this,
00:06:14.310 --> 00:06:17.320
we already saw for the
binomial distribution
00:06:17.320 --> 00:06:19.340
in which the same
coin, let's say,
00:06:19.340 --> 00:06:22.600
was thrown N independent times.
00:06:22.600 --> 00:06:27.000
And all of the cumulants for
the sum of the number of heads,
00:06:27.000 --> 00:06:30.560
let's say, were related to
the cumulants in one trial
00:06:30.560 --> 00:06:33.520
that you would get.
00:06:33.520 --> 00:06:34.860
OK?
00:06:34.860 --> 00:06:35.900
So fine.
00:06:35.900 --> 00:06:40.180
Nothing so far here.
00:06:40.180 --> 00:06:43.880
However let's imagine now that
I construct a variable that I
00:06:43.880 --> 00:06:48.170
will call y, which
is the variable
00:06:48.170 --> 00:06:50.720
x, this sum that I have.
00:06:50.720 --> 00:07:00.130
From it I subtract
N times the mean,
00:07:00.130 --> 00:07:07.925
and then I divide
by square root of N.
00:07:07.925 --> 00:07:11.170
I can certainly choose to do so.
00:07:11.170 --> 00:07:16.620
Then what we observe here
is that the average of y
00:07:16.620 --> 00:07:18.470
by this construction is 0.
00:07:18.470 --> 00:07:22.530
Because essentially, I make
sure that the average of x
00:07:22.530 --> 00:07:25.164
is subtracted.
00:07:25.164 --> 00:07:27.060
No problem.
00:07:27.060 --> 00:07:31.350
Average of y squared--
not average of y squared,
00:07:31.350 --> 00:07:33.380
but the variance.
00:07:33.380 --> 00:07:36.860
Surely it's easy to show
the variance doesn't really
00:07:36.860 --> 00:07:39.840
depend on the subtraction.
00:07:39.840 --> 00:07:43.750
It is the same thing
as the variance of x.
00:07:43.750 --> 00:07:46.740
So it is going to
be essentially x
00:07:46.740 --> 00:07:51.480
squared c divided
by square of this.
00:07:51.480 --> 00:07:56.720
So I will have N.
And x squared, big x
00:07:56.720 --> 00:07:59.500
squared cumulant,
according to this rule,
00:07:59.500 --> 00:08:01.850
is N times small x
squared cumulant.
00:08:01.850 --> 00:08:05.180
And I get something like this.
00:08:05.180 --> 00:08:07.930
Still nothing interesting.
00:08:07.930 --> 00:08:11.720
But now let's look
at the m-th cumulant.
00:08:11.720 --> 00:08:20.650
So let's look at y m c for
m that is greater than 2.
00:08:20.650 --> 00:08:21.970
And then what do I get?
00:08:21.970 --> 00:08:32.340
I will get to N times x m c
divided by N to the m over 2.
00:08:32.340 --> 00:08:35.570
The N to the power
of m over 2 just
00:08:35.570 --> 00:08:38.400
came from raising this
to the power of m,
00:08:38.400 --> 00:08:41.020
since I'm looking at y to the m.
00:08:41.020 --> 00:08:46.720
And x to the m c, according
to this, is N times x1.
00:08:49.280 --> 00:08:50.920
Now we see that
this is something
00:08:50.920 --> 00:08:55.600
that is proportional to the N to
the power of 1 minus m over 2.
00:08:55.600 --> 00:08:59.730
And since I chose m
to be greater than 2,
00:08:59.730 --> 00:09:04.260
in the limit that N becomes
much, much larger than 1,
00:09:04.260 --> 00:09:05.434
this goes to 0.
00:09:08.160 --> 00:09:13.120
So if I look at the limit where
the number of terms in the sum
00:09:13.120 --> 00:09:17.280
is much larger than
1, what I conclude
00:09:17.280 --> 00:09:19.920
is that the probability
distribution for this variable
00:09:19.920 --> 00:09:25.170
that I have constructed has
0 mean, a finite variance,
00:09:25.170 --> 00:09:27.110
and all the other
higher order cumulants
00:09:27.110 --> 00:09:29.640
are asymptotically vanishing.
00:09:29.640 --> 00:09:34.630
So I know that the
probability of y,
00:09:34.630 --> 00:09:38.280
which is this variable that
I have given you up there,
00:09:38.280 --> 00:09:41.570
is given by the one distribution
that we know is completely
00:09:41.570 --> 00:09:44.565
characterized by its first
and second cumulant, which
00:09:44.565 --> 00:09:45.940
is the Gaussian.
00:09:45.940 --> 00:09:51.220
So it is exponential of minus y
squared, two times its variance
00:09:51.220 --> 00:09:53.182
divided, appropriately
normalized.
00:10:03.520 --> 00:10:06.415
Essentially this sum is
Gaussian distributed.
00:10:10.030 --> 00:10:15.660
And this result
is true for things
00:10:15.660 --> 00:10:34.830
that are not IID's so long
as this sum i1 to im, one
00:10:34.830 --> 00:10:45.670
to N, xi1 to xim goes as
N goes to infinity, much,
00:10:45.670 --> 00:10:48.980
much less than 1,
as long as it is
00:10:48.980 --> 00:10:54.165
less than-- less than strictly
than N to the m over 2.
00:10:57.780 --> 00:11:00.870
So basically, what
I want to do is
00:11:00.870 --> 00:11:05.470
to ensure that when I
construct the analog of this,
00:11:05.470 --> 00:11:09.380
I would have something that when
I divide by N to the m over 2,
00:11:09.380 --> 00:11:11.870
I will asymptotically go to 0.
00:11:11.870 --> 00:11:16.300
So in the case of IID's,
the numerator goes like N,
00:11:16.300 --> 00:11:18.330
it could be that I
have correlations
00:11:18.330 --> 00:11:21.520
among the variables et
cetera, so that there
00:11:21.520 --> 00:11:25.870
are other terms in the sum
because of the correlations
00:11:25.870 --> 00:11:28.770
as long as the sum total
of them asymptotically
00:11:28.770 --> 00:11:31.810
grows less than N
to the m over 2,
00:11:31.810 --> 00:11:34.580
this statement that
the sum is Gaussian
00:11:34.580 --> 00:11:37.520
distributed it is
going to be valid.
00:11:37.520 --> 00:11:38.020
Yes.
00:11:38.020 --> 00:11:40.536
AUDIENCE: Question--
how can you compare
00:11:40.536 --> 00:11:46.497
a value of [INAUDIBLE] with
number of variables that you
00:11:46.497 --> 00:11:46.996
[INAUDIBLE]?
00:11:46.996 --> 00:11:49.810
Because this is
a-- just, if, say,
00:11:49.810 --> 00:11:53.305
your random value is
set [? in advance-- ?]
00:11:53.305 --> 00:11:56.450
PROFESSOR: So basically,
you choose a probability
00:11:56.450 --> 00:12:01.320
distribution-- at least in
this case, it is obvious.
00:12:01.320 --> 00:12:03.970
In this case, basically
what we want to know
00:12:03.970 --> 00:12:06.200
is that there is a
probability distribution
00:12:06.200 --> 00:12:08.310
for individual variables.
00:12:08.310 --> 00:12:11.170
And I repeat it
many, many times.
00:12:11.170 --> 00:12:13.450
So it is like the coin.
00:12:13.450 --> 00:12:15.945
So for the coin I
will ensure that I
00:12:15.945 --> 00:12:18.070
will throw it hundreds of times.
00:12:18.070 --> 00:12:20.510
Now suppose that for
some reason, if I throw
00:12:20.510 --> 00:12:23.730
the coin once, the
next five times
00:12:23.730 --> 00:12:27.480
it is much more likely to be the
same thing that I had before.
00:12:27.480 --> 00:12:30.030
Kind of some strange
coin, or whatever.
00:12:30.030 --> 00:12:33.430
Then there is some
correlation up to five.
00:12:33.430 --> 00:12:36.010
So when I'm calculating
things up to five,
00:12:36.010 --> 00:12:38.670
there all kinds of
results over here.
00:12:38.670 --> 00:12:42.860
But as long as that's five
is independent of the length
00:12:42.860 --> 00:12:46.960
of the sequence, if I throw
things 1,000 times, still only
00:12:46.960 --> 00:12:49.230
groups of five that
are correlated,
00:12:49.230 --> 00:12:50.860
then this result still holds.
00:12:50.860 --> 00:12:54.450
Because I have the additional
parameter N to play with.
00:12:54.450 --> 00:12:57.810
So I want to have a
parameter N to play with
00:12:57.810 --> 00:13:01.840
to go to infinity which
is independent of what
00:13:01.840 --> 00:13:04.881
characterizes the
distribution of my variable.
00:13:04.881 --> 00:13:06.797
AUDIENCE: I was mainly
concerned with the fact
00:13:06.797 --> 00:13:10.222
that you compare
the cumulant which
00:13:10.222 --> 00:13:13.673
has the same dimension
as your random variable.
00:13:13.673 --> 00:13:17.620
So if my random variable is--
I measure length or something.
00:13:17.620 --> 00:13:23.365
I do it many, many times
length is measured in meters,
00:13:23.365 --> 00:13:26.420
and you try to compare it
to a number of measurements.
00:13:26.420 --> 00:13:29.662
So, shouldn't there be
some dimensionful constant
00:13:29.662 --> 00:13:31.000
on the right?
00:13:31.000 --> 00:13:32.970
PROFESSOR: So
here, this quantity
00:13:32.970 --> 00:13:36.950
has dimensions of
meter to m-th power,
00:13:36.950 --> 00:13:40.280
this quantity has dimensions
of meter to the m-th power.
00:13:40.280 --> 00:13:43.630
This quantity is dimensionless.
00:13:43.630 --> 00:13:44.180
Right?
00:13:44.180 --> 00:13:47.120
So what I want is
the N dependence
00:13:47.120 --> 00:13:51.670
to be such that when I go
to large N, it goes to 0.
00:13:51.670 --> 00:13:54.040
It is true that this
is still multiplying
00:13:54.040 --> 00:13:58.073
something that has-- so it is.
00:13:58.073 --> 00:14:02.240
AUDIENCE: It's like less than
something of order of N to m/2?
00:14:02.240 --> 00:14:03.170
OK.
00:14:03.170 --> 00:14:06.229
PROFESSOR: Oh this
is what you-- order.
00:14:06.229 --> 00:14:06.729
Thank you.
00:14:13.631 --> 00:14:19.547
AUDIENCE: The last time
[INAUDIBLE] cumulant
00:14:19.547 --> 00:14:20.997
[INAUDIBLE]?
00:14:20.997 --> 00:14:22.080
PROFESSOR: Yes, thank you.
00:14:27.800 --> 00:14:30.635
Any other correction,
clarification?
00:14:33.270 --> 00:14:34.630
OK.
00:14:34.630 --> 00:14:38.060
So again but we will
see that essentially
00:14:38.060 --> 00:14:40.530
in statistical
physics, we will have,
00:14:40.530 --> 00:14:43.620
always, to deal with
some analog of this N,
00:14:43.620 --> 00:14:47.090
like the part number of
molecules of gas in this room,
00:14:47.090 --> 00:14:54.370
et cetera, that enables us
to use something like this.
00:14:54.370 --> 00:14:57.420
I mean, it is clear
that in this case,
00:14:57.420 --> 00:15:02.450
I chose to subtract the mean
and divide by N to the 1/2.
00:15:02.450 --> 00:15:08.020
But suppose I didn't have
the division by N to the 1/2.
00:15:08.020 --> 00:15:11.940
Then what happens is that I
could have divided for example
00:15:11.940 --> 00:15:16.490
by N. Then my
distribution for something
00:15:16.490 --> 00:15:21.300
that has a well-defined,
independent mean
00:15:21.300 --> 00:15:24.540
would have gone to something
like a delta function
00:15:24.540 --> 00:15:27.120
in the limit of N
going to infinity.
00:15:27.120 --> 00:15:31.210
But I kind of sort
of change my scale
00:15:31.210 --> 00:15:34.340
by dividing by N to
the 1/2 rather than
00:15:34.340 --> 00:15:37.940
N to sort of emphasize that
the scale of fluctuations
00:15:37.940 --> 00:15:40.190
is of the order of
square root of N.
00:15:40.190 --> 00:15:44.120
This is again something
that generically happens.
00:15:44.120 --> 00:15:47.580
So let's say, we know the
energy of the gas in this room
00:15:47.580 --> 00:15:50.810
to be proportional to
volume or whatever.
00:15:50.810 --> 00:15:52.960
The amount of
uncertainty that we have
00:15:52.960 --> 00:15:56.830
will be of the order of
square root of volume.
00:15:56.830 --> 00:15:59.080
So it's clear that we
are kind of building
00:15:59.080 --> 00:16:05.580
results that have to do with
dependencies on N. So let's
00:16:05.580 --> 00:16:16.010
sort of look at some other
things that happen when we are
00:16:16.010 --> 00:16:19.700
dealing with large number
of degrees of freedom.
00:16:19.700 --> 00:16:24.520
So already we've
spoken about things
00:16:24.520 --> 00:16:34.240
that intensive, variables such
as temperature, pressure, et
00:16:34.240 --> 00:16:36.130
cetera.
00:16:36.130 --> 00:16:40.020
And their characteristic
is that if we express them
00:16:40.020 --> 00:16:44.470
in terms of, say, the
number of constituents,
00:16:44.470 --> 00:16:48.690
they are independent
of that number.
00:16:48.690 --> 00:16:54.490
As opposed to extensive
quantities, such as the energy
00:16:54.490 --> 00:17:02.060
or the volume, et cetera,
that are proportional to this.
00:17:02.060 --> 00:17:05.130
We can certainly
imagine things that
00:17:05.130 --> 00:17:13.079
would increase [INAUDIBLE]
the polynomial, order of N
00:17:13.079 --> 00:17:14.790
to some power.
00:17:14.790 --> 00:17:18.150
If I have N molecules
of gas, and I
00:17:18.150 --> 00:17:20.599
ask how many pairs of
interactions I have,
00:17:20.599 --> 00:17:24.560
you would say it's N, N
minus 1 over 2, for example.
00:17:24.560 --> 00:17:26.920
That would be
something like this.
00:17:26.920 --> 00:17:31.540
But most importantly, when we
deal with statistical physics,
00:17:31.540 --> 00:17:33.420
we will encounter
quantities that
00:17:33.420 --> 00:17:35.870
have exponential dependence.
00:17:35.870 --> 00:17:38.670
That is, they will
be something like e
00:17:38.670 --> 00:17:44.280
to the N with some something
that will appear after.
00:17:44.280 --> 00:17:48.270
An example of that is when we
were, for example, calculating
00:17:48.270 --> 00:17:51.390
the phase space
of gas particles.
00:17:51.390 --> 00:17:56.230
A gas particle by itself can
be in a volume V. Two of them,
00:17:56.230 --> 00:17:58.880
jointly, can occupy
a volume V squared.
00:17:58.880 --> 00:18:01.430
Three of them, V
cubed, et cetera.
00:18:01.430 --> 00:18:04.370
Eventually you hit V to
the N for N particles.
00:18:04.370 --> 00:18:07.030
So that's a kind of
exponential dependence.
00:18:07.030 --> 00:18:11.740
So this is e g V
to the N that you
00:18:11.740 --> 00:18:17.890
would have for joined
volume of N particles.
00:18:17.890 --> 00:18:18.390
OK?
00:18:21.620 --> 00:18:23.900
So some curious
things happen when
00:18:23.900 --> 00:18:27.000
you have these
kinds of variables.
00:18:27.000 --> 00:18:34.120
And one thing that
you may not realize
00:18:34.120 --> 00:18:37.860
is what happens when you
summing exponentials.
00:18:42.250 --> 00:18:46.240
So let's imagine that
I have a sum composed
00:18:46.240 --> 00:18:52.860
of a number of terms i running
from one to script N-- script n
00:18:52.860 --> 00:18:56.000
is the number of terms
in the sum-- that
00:18:56.000 --> 00:18:58.500
are of these exponential types.
00:18:58.500 --> 00:19:02.870
So let's actually sometimes I
will call this-- never mind.
00:19:02.870 --> 00:19:08.260
So let's call these
e to the N phi--
00:19:08.260 --> 00:19:10.510
Let me write it in this fashion.
00:19:10.510 --> 00:19:21.010
Epsilon i where epsilon i
satisfies two conditions.
00:19:21.010 --> 00:19:24.360
One of them, it is positive.
00:19:24.360 --> 00:19:26.520
And the other is
that it has this kind
00:19:26.520 --> 00:19:28.530
of exponential dependence.
00:19:28.530 --> 00:19:33.770
It is order of e to
the N phi i where
00:19:33.770 --> 00:19:37.360
there could be some prefactor or
something else in front to give
00:19:37.360 --> 00:19:39.995
you dimension and stuff like
that that you were discussing.
00:19:43.830 --> 00:19:46.800
I assume that the
number of terms
00:19:46.800 --> 00:19:50.970
is less than or of the
order of some polynomial.
00:19:53.670 --> 00:19:54.170
OK?
00:19:59.300 --> 00:20:05.910
Then my claim is
that, in some sense,
00:20:05.910 --> 00:20:11.140
the sum S is the largest term.
00:20:14.470 --> 00:20:14.970
OK?
00:20:18.350 --> 00:20:21.736
So let's sort of put
this graphically.
00:20:21.736 --> 00:20:27.370
What I'm telling you is that
we have a whole bunch of terms
00:20:27.370 --> 00:20:29.470
that are these epsilons i's.
00:20:29.470 --> 00:20:32.830
They're all positive, so I
can sort of indicate them
00:20:32.830 --> 00:20:40.790
by bars of different lengths
that are positive and so forth.
00:20:40.790 --> 00:20:44.530
So let's say this is epsilon
1, epsilon 2 all the way
00:20:44.530 --> 00:20:47.730
to epsilon N. And let's say
that this guy is the largest.
00:20:52.310 --> 00:20:58.110
And my task is to add up the
length of all of these things.
00:20:58.110 --> 00:21:01.870
So how do I claim that the
length is just the largest one.
00:21:01.870 --> 00:21:04.590
It's in the following sense.
00:21:04.590 --> 00:21:08.230
You would agree that
this sum you say
00:21:08.230 --> 00:21:11.700
is certainly larger
than the largest term,
00:21:11.700 --> 00:21:13.980
because I have added
lots of other things
00:21:13.980 --> 00:21:18.030
to the largest term, and
they are all positive.
00:21:18.030 --> 00:21:21.260
I say, fine, what
I'm going to do
00:21:21.260 --> 00:21:24.600
is I'm going to raise the
length of everybody else
00:21:24.600 --> 00:21:28.082
to be the same thing
as epsilon max.
00:21:31.360 --> 00:21:35.540
And then I would say
that the sum is certainly
00:21:35.540 --> 00:21:42.420
less than this artificial sum
where I have raised everybody
00:21:42.420 --> 00:21:45.530
to epsilon max.
00:21:45.530 --> 00:21:47.510
OK?
00:21:47.510 --> 00:21:52.280
So then what I will do is I will
take log off this expression,
00:21:52.280 --> 00:21:58.040
and it will be bounded by log
of epsilon max and log of N
00:21:58.040 --> 00:22:01.530
epsilon max, which is the same
thing as log of epsilon max
00:22:01.530 --> 00:22:17.240
plus log of N. And
then I divide by N.
00:22:17.240 --> 00:22:21.550
And then note that the
conditions that I have set up
00:22:21.550 --> 00:22:28.800
are such that in the limit
that N goes to infinity,
00:22:28.800 --> 00:22:32.310
script N would be
P log N over N.
00:22:32.310 --> 00:22:39.670
And the limit of this as N
becomes much less than 1 is 0.
00:22:39.670 --> 00:22:43.320
Log N over N goes to 0
as N goes to infinity.
00:22:43.320 --> 00:22:48.150
So basically this sum
is bounded on both sides
00:22:48.150 --> 00:22:49.710
by the same thing.
00:22:49.710 --> 00:22:53.210
So what we've established
is that essentially log of S
00:22:53.210 --> 00:22:58.050
over N, its limit as
N goes to infinity,
00:22:58.050 --> 00:23:05.140
is the same thing as a
log of epsilon max over N,
00:23:05.140 --> 00:23:06.430
which is what?
00:23:06.430 --> 00:23:10.700
If I say my epsilon max's have
this exponential dependence,
00:23:10.700 --> 00:23:12.020
is phi max.
00:23:15.360 --> 00:23:18.190
And actually this is again
the reason for something
00:23:18.190 --> 00:23:20.480
that you probably have seen.
00:23:20.480 --> 00:23:23.610
That using statistical
physics let's
00:23:23.610 --> 00:23:25.860
say a micro-canonical
ensemble when
00:23:25.860 --> 00:23:28.070
you say exactly
what the energy is.
00:23:28.070 --> 00:23:30.450
Or you look at the
canonical ensemble
00:23:30.450 --> 00:23:33.090
where the energy can
be all over the place,
00:23:33.090 --> 00:23:35.111
why do you get the same result?
00:23:35.111 --> 00:23:35.610
This is why.
00:23:40.150 --> 00:23:43.360
Any questions on this?
00:23:43.360 --> 00:23:46.102
Everybody's happy, obviously.
00:23:46.102 --> 00:23:47.325
Good.
00:23:47.325 --> 00:23:48.700
AUDIENCE: [INAUDIBLE]
a question?
00:23:48.700 --> 00:23:49.618
PROFESSOR: Yes.
00:23:49.618 --> 00:23:52.110
AUDIENCE: The N on
the end, [INAUDIBLE]?
00:23:52.110 --> 00:23:54.660
PROFESSOR: There's a script N,
which is the number of terms.
00:23:54.660 --> 00:23:58.710
And there's the Roman N,
which is the parameter that
00:23:58.710 --> 00:24:01.660
is the analog of the number
of degrees of freedom.
00:24:01.660 --> 00:24:04.920
The one that we usually
deal in statistical physics
00:24:04.920 --> 00:24:06.582
would be, say, the
number of particles.
00:24:06.582 --> 00:24:08.832
AUDIENCE: So number of
measurements [INAUDIBLE] number
00:24:08.832 --> 00:24:10.544
of particles.
00:24:10.544 --> 00:24:11.960
PROFESSOR: Number
of measurements?
00:24:11.960 --> 00:24:14.755
AUDIENCE: So the
script N is what?
00:24:14.755 --> 00:24:17.380
PROFESSOR: The script N
could be, for example, I'm
00:24:17.380 --> 00:24:20.380
summing over all
pairs of interactions.
00:24:20.380 --> 00:24:23.660
So the number of pairs
would go like N squared.
00:24:23.660 --> 00:24:26.480
Now in reality
practicality in all cases
00:24:26.480 --> 00:24:30.230
that you will deal with,
this P would be one.
00:24:30.230 --> 00:24:32.640
So the number of terms
that we would be dealing
00:24:32.640 --> 00:24:36.920
would be of the order of the
number of degrees of freedom.
00:24:36.920 --> 00:24:43.002
So, we will see some
examples of that later on.
00:24:43.002 --> 00:24:46.960
AUDIENCE: [INAUDIBLE]
script N might be N squared?
00:24:46.960 --> 00:24:49.520
PROFESSOR: If I'm forced
to come up with a situation
00:24:49.520 --> 00:24:52.450
where script N is
N squared, I would
00:24:52.450 --> 00:24:54.050
say count the number of pairs.
00:24:57.810 --> 00:25:00.150
Number of pairs if
I have N [? sides ?]
00:25:00.150 --> 00:25:02.880
is N, N minus 1 over 2.
00:25:02.880 --> 00:25:06.880
So this is something that
goes like N squared over 2.
00:25:06.880 --> 00:25:09.230
Can I come up with
a physical situation
00:25:09.230 --> 00:25:12.120
where I'm summing over
the number of terms?
00:25:12.120 --> 00:25:14.890
Not obviously, but it could
be something like that.
00:25:14.890 --> 00:25:18.520
The situations in statistical
physics that we come up with
00:25:18.520 --> 00:25:21.500
is typically, let's say, in
going from the micro-canonical
00:25:21.500 --> 00:25:23.910
to the canonical
ensemble, you would
00:25:23.910 --> 00:25:26.820
be summing over energy levels.
00:25:26.820 --> 00:25:29.470
And typically, let's
say, in a system
00:25:29.470 --> 00:25:31.820
that is bounded the
number of energy levels
00:25:31.820 --> 00:25:35.350
is proportional to the
number of particles.
00:25:44.540 --> 00:25:49.500
Now there cases that actually,
in going from micro-canonical
00:25:49.500 --> 00:25:53.020
to canonical, like the energy
of the gas in this room,
00:25:53.020 --> 00:25:58.010
the energy axis goes all
the way from 0 to infinity.
00:25:58.010 --> 00:26:02.200
So there is a continuous version
of the summation procedure
00:26:02.200 --> 00:26:05.340
that we have that is
then usually applied
00:26:05.340 --> 00:26:10.846
which is in mathematics
is called the saddle point
00:26:10.846 --> 00:26:11.345
integration.
00:26:18.680 --> 00:26:23.810
So basically there, rather
than having to deal with a sum,
00:26:23.810 --> 00:26:24.810
I deal with an integral.
00:26:27.380 --> 00:26:33.930
The integration is over
some variable, let's say x.
00:26:33.930 --> 00:26:37.160
Could be energy, whatever.
00:26:37.160 --> 00:26:42.120
And then I have a quantity that
has this exponential character.
00:26:47.780 --> 00:26:51.170
And then again, in
some specific sense,
00:26:51.170 --> 00:26:57.250
I can just look at the largest
value and replace this with e
00:26:57.250 --> 00:27:01.186
to the N phi evaluated at x max.
00:27:01.186 --> 00:27:03.221
I should really write
this as a proportionality,
00:27:03.221 --> 00:27:08.490
but we'll see what
that means shortly.
00:27:08.490 --> 00:27:13.740
So basically it's
the above picture,
00:27:13.740 --> 00:27:17.880
I have a continuous variable.
00:27:17.880 --> 00:27:22.770
And this continuous
variable, let's
00:27:22.770 --> 00:27:26.680
say I have to sum a quantity
that is e to the N phi.
00:27:26.680 --> 00:27:32.375
So maybe I will have to
not sum, but integrate over
00:27:32.375 --> 00:27:34.380
a function such as this.
00:27:34.380 --> 00:27:38.280
And let's say this is the
place where the maximums occur.
00:27:42.310 --> 00:27:46.620
So the procedure
of saddle point is
00:27:46.620 --> 00:27:56.285
to expand phi
around its maximum.
00:27:59.290 --> 00:28:07.690
And then I can write i as an
integral over x, exponential
00:28:07.690 --> 00:28:16.040
of N, phi evaluated
at the maximum.
00:28:16.040 --> 00:28:18.080
Now if I'm doing
a Taylor series,
00:28:18.080 --> 00:28:19.810
then next term in
the Taylor series
00:28:19.810 --> 00:28:23.000
typically would involve
the first derivative.
00:28:23.000 --> 00:28:26.500
But around the maximum,
the first derivative is 0.
00:28:26.500 --> 00:28:31.960
Again if it is a maximum,
the second derivative phi
00:28:31.960 --> 00:28:36.270
double prime evaluated at
this xm, would be negative.
00:28:36.270 --> 00:28:39.885
And that's why I indicate
it in this fashion.
00:28:39.885 --> 00:28:44.870
To sort of emphasize that it
is a negative thing, x minus xm
00:28:44.870 --> 00:28:46.630
squared.
00:28:46.630 --> 00:28:50.280
And then I would have higher
order terms, N minus xm cubed,
00:28:50.280 --> 00:28:51.850
et cetera.
00:28:51.850 --> 00:28:56.210
Actually what I will do is I
will expand all of those things
00:28:56.210 --> 00:28:57.010
separately.
00:28:57.010 --> 00:29:03.910
So I have e to the minus
N over 6 phi triple prime.
00:29:03.910 --> 00:29:10.110
N plus N over 6 phi triple
prime, evaluated at xm,
00:29:10.110 --> 00:29:16.330
x minus xm cubed, and then the
fourth order term and so forth.
00:29:16.330 --> 00:29:19.180
So basically there is
a series such as this
00:29:19.180 --> 00:29:21.700
that I would have to look at.
00:29:32.010 --> 00:29:34.850
So the first term you can
take outside the integral.
00:29:41.680 --> 00:29:46.650
And the integration
against the one of this
00:29:46.650 --> 00:29:49.340
is simply a Gaussian.
00:29:49.340 --> 00:29:53.220
So what I would
get is square root
00:29:53.220 --> 00:30:00.500
of 2 pi divided by the variance,
which is N phi double prime.
00:30:05.150 --> 00:30:09.420
So that's the first term
I have taken care of.
00:30:09.420 --> 00:30:13.090
Now the next term
actually the way
00:30:13.090 --> 00:30:16.920
that I have it, since I'm
expanding something that
00:30:16.920 --> 00:30:21.930
is third order around a
potential that is symmetric.
00:30:21.930 --> 00:30:23.900
That would give me 0.
00:30:23.900 --> 00:30:28.210
The next order term, which is
x minus xm to the fourth power,
00:30:28.210 --> 00:30:31.540
you already know how
to calculate averages
00:30:31.540 --> 00:30:37.060
of various powers with the
Gaussian using Wick's Theorem.
00:30:37.060 --> 00:30:40.620
And it would be
related to essentially
00:30:40.620 --> 00:30:42.420
to the square of the variance.
00:30:42.420 --> 00:30:46.090
The square of the variance
would be essentially the square
00:30:46.090 --> 00:30:48.070
of this quantity out here.
00:30:48.070 --> 00:30:56.830
So I will get a correction
that is order of 1 over N.
00:30:56.830 --> 00:30:59.500
So if you have
sufficient energy,
00:30:59.500 --> 00:31:02.510
you can actually numerically
calculate what this is
00:31:02.510 --> 00:31:06.320
and the higher order
terms, et cetera.
00:31:06.320 --> 00:31:07.286
Yes.
00:31:07.286 --> 00:31:09.150
AUDIENCE: Could
you, briefly remind
00:31:09.150 --> 00:31:12.890
what the second term
in the bracket means?
00:31:12.890 --> 00:31:14.264
PROFESSOR: This?
00:31:14.264 --> 00:31:15.258
This?
00:31:15.258 --> 00:31:17.743
AUDIENCE: The whole thing,
on the second bracket.
00:31:17.743 --> 00:31:23.980
PROFESSOR: In the numerator,
I would have N phi m, N phi
00:31:23.980 --> 00:31:25.020
prime.
00:31:25.020 --> 00:31:28.720
Let's call the deviation y y.
00:31:28.720 --> 00:31:32.930
But phi prime is 0
around the maximum.
00:31:32.930 --> 00:31:34.850
So the next order
term will be phi
00:31:34.850 --> 00:31:37.760
double prime y squared over 2.
00:31:37.760 --> 00:31:43.700
The next order term will be phi
triple prime y cubed over 6.
00:31:43.700 --> 00:31:50.630
e to the minus N phi triple
prime y cubed over 6,
00:31:50.630 --> 00:31:59.150
I can expand as 1 minus N phi
triple prime y cubed over 6,
00:31:59.150 --> 00:32:00.770
which is what this is.
00:32:00.770 --> 00:32:03.530
And then you can go and do that
with all of the other terms.
00:32:09.520 --> 00:32:10.111
Yes.
00:32:10.111 --> 00:32:11.610
AUDIENCE: Isn't it
then you can also
00:32:11.610 --> 00:32:13.795
expand as N the local maximum?
00:32:13.795 --> 00:32:14.670
PROFESSOR: Excellent.
00:32:14.670 --> 00:32:15.430
Good.
00:32:15.430 --> 00:32:19.150
So you are saying, why didn't
I expand around this maximum,
00:32:19.150 --> 00:32:20.450
around this maximum.
00:32:20.450 --> 00:32:22.610
So let's do that.
00:32:22.610 --> 00:32:26.040
xm prime xm double prime.
00:32:26.040 --> 00:32:30.220
So I would have a series
around the other maxima.
00:32:30.220 --> 00:32:37.030
So the next one would be N to
the phi of xm prime, root 2
00:32:37.030 --> 00:32:42.690
pi N phi double
prime at xm prime.
00:32:42.690 --> 00:32:47.910
And then one plus order of 1
over N And then the next one,
00:32:47.910 --> 00:32:48.570
and so forth.
00:32:51.860 --> 00:32:57.920
Now we are interested in the
limit where N goes to infinity.
00:32:57.920 --> 00:33:02.000
Or N is much, much
larger than 1.
00:33:02.000 --> 00:33:05.190
In the limit where N
is much larger than 1,
00:33:05.190 --> 00:33:09.220
Let's imagine that
these two phi's
00:33:09.220 --> 00:33:13.560
if I were to plot not e
to the phi but phi itself.
00:33:13.560 --> 00:33:15.720
Let's imagine that
these two phi's are
00:33:15.720 --> 00:33:20.740
different by I don't know,
0.1, 10 to the minus 4.
00:33:20.740 --> 00:33:21.510
It doesn't matter.
00:33:21.510 --> 00:33:26.560
I'm multiplying
two things with N,
00:33:26.560 --> 00:33:29.540
and then I'm comparing
two exponentials.
00:33:29.540 --> 00:33:35.080
So if this maximum was at 1,
I would have here e to the N.
00:33:35.080 --> 00:33:39.131
If this one was at
1 minus epsilon,
00:33:39.131 --> 00:33:43.440
over here I would have e
to the N minus N epsilon.
00:33:43.440 --> 00:33:47.660
And so I can always ignore
this compared to that.
00:33:51.050 --> 00:33:54.480
And so basically, this
is the leading term.
00:33:54.480 --> 00:34:02.190
And if I were to take its log
and divide by N, what do I get?
00:34:02.190 --> 00:34:06.280
I will get phi of xm.
00:34:06.280 --> 00:34:09.199
And then I would get
from this something
00:34:09.199 --> 00:34:19.000
like minus 1/2 log of N phi
double prime xm over 2 pi.
00:34:19.000 --> 00:34:23.570
And I divided by N,
so this is 1 over N.
00:34:23.570 --> 00:34:26.600
And the next term would be
order of 1 over N squared.
00:34:32.380 --> 00:34:37.210
So systematically,
in the large N limit,
00:34:37.210 --> 00:34:39.880
there is a series
for the quantity log
00:34:39.880 --> 00:34:44.679
i divided by N that
starts with phi of xm.
00:34:44.679 --> 00:34:48.380
And then subsequent terms
to it, you can calculate.
00:34:48.380 --> 00:34:52.699
Actually I was kind
of hesitant in writing
00:34:52.699 --> 00:34:55.020
this as asymptotically
equal because you
00:34:55.020 --> 00:34:58.530
may have worried
about the dimensions.
00:34:58.530 --> 00:35:02.050
There should be something
that has dimensions of x here.
00:35:02.050 --> 00:35:04.910
Now when I take the log it
doesn't matter that much.
00:35:04.910 --> 00:35:07.470
But the dimension
appears over here.
00:35:07.470 --> 00:35:10.880
It's really the size of the
interval that contributes which
00:35:10.880 --> 00:35:13.250
is of the order of N to the 1/2.
00:35:13.250 --> 00:35:15.265
And that's where
the log N comes.
00:35:35.490 --> 00:35:35.990
Questions?
00:35:44.630 --> 00:35:50.710
Now let me do one example of
this because we will need it.
00:35:53.430 --> 00:35:56.920
We can easily show
that N factorial
00:35:56.920 --> 00:36:07.840
you can write as 0 to infinity
dx x to N, e to the minus x.
00:36:07.840 --> 00:36:10.380
And if you don't
believe this, you
00:36:10.380 --> 00:36:16.705
can start with the integral
0 to infinity of dx
00:36:16.705 --> 00:36:21.790
e to the minus alpha
x being one over alpha
00:36:21.790 --> 00:36:25.200
and taking many derivatives.
00:36:25.200 --> 00:36:32.520
If you take N
derivatives on this side,
00:36:32.520 --> 00:36:38.360
you would have 0 to N dx x to
the N, e to the minus alpha x,
00:36:38.360 --> 00:36:41.670
because every time, you
bring down a factor of x.
00:36:41.670 --> 00:36:44.920
On the other side, if you
take derivatives, 1 over alpha
00:36:44.920 --> 00:36:46.760
becomes 1 over
alpha squared, then
00:36:46.760 --> 00:36:50.090
goes to 2 over alpha cubed, then
go c over alpha to the fourth.
00:36:50.090 --> 00:36:55.280
So basically we will N
factorial alpha to the N plus 1.
00:36:55.280 --> 00:36:59.380
So I just set alpha equals to 1.
00:36:59.380 --> 00:37:05.760
Now if you look at the thing
that I have to integrate,
00:37:05.760 --> 00:37:11.630
it is something that
has a function of x,
00:37:11.630 --> 00:37:17.220
the quantity that I should
integrate starts as x to the N,
00:37:17.220 --> 00:37:19.890
and then decays exponentially.
00:37:19.890 --> 00:37:23.562
So over here, I have
x to the N. Out here I
00:37:23.562 --> 00:37:25.475
have e to the minus x.
00:37:25.475 --> 00:37:30.890
It is not quite of the
form that I had before.
00:37:30.890 --> 00:37:35.280
Part of it is proportional to
N in the exponent, part of it
00:37:35.280 --> 00:37:36.210
is not.
00:37:36.210 --> 00:37:38.910
But you can still use
exactly the saddle point
00:37:38.910 --> 00:37:41.590
approach for even this function.
00:37:41.590 --> 00:37:43.610
And so that's what we will do.
00:37:43.610 --> 00:37:48.700
I will write this as
integral 0 to infinity dx e
00:37:48.700 --> 00:37:57.220
to some function of x where
this function of x is N
00:37:57.220 --> 00:37:59.225
log x minus x.
00:38:01.840 --> 00:38:05.830
And then I will follow that
procedure despite this is not
00:38:05.830 --> 00:38:09.360
being quite entirely
proportional to N.
00:38:09.360 --> 00:38:13.820
I will find its maximum
by setting phi prime to 0.
00:38:13.820 --> 00:38:18.300
phi prime is N over x minus 1.
00:38:18.300 --> 00:38:22.580
So clearly, phi prime
to 0 will give me
00:38:22.580 --> 00:38:29.880
that x max is N. So the
location of this maximum
00:38:29.880 --> 00:38:35.890
that I have is in fact N.
00:38:35.890 --> 00:38:40.390
And the second derivative,
phi double prime,
00:38:40.390 --> 00:38:50.020
is minus N over x squared, which
if I evaluate at the maximum,
00:38:50.020 --> 00:38:55.480
is going to be minus 1 over
N. Because the maximum occurs
00:38:55.480 --> 00:38:57.360
at the N.
00:38:57.360 --> 00:39:02.590
So if I'm were to make a
saddle point expansion of this,
00:39:02.590 --> 00:39:10.256
I would say that N factorial
is integral 0 to infinity, dx
00:39:10.256 --> 00:39:18.940
e to the phi evaluated at x
max, which is N log N minus N.
00:39:18.940 --> 00:39:20.740
First derivative is 0.
00:39:20.740 --> 00:39:23.950
The second derivative
will give me minus 1
00:39:23.950 --> 00:39:28.260
over N with a factor
of 2 because I'm
00:39:28.260 --> 00:39:30.780
expanding second order.
00:39:30.780 --> 00:39:35.500
And then I have x
minus this location
00:39:35.500 --> 00:39:37.520
of the maximum squared.
00:39:37.520 --> 00:39:40.020
And there would be higher order
terms from the higher order
00:39:40.020 --> 00:39:40.520
derivatives.
00:39:43.790 --> 00:39:50.880
So I can clearly take e to
the N log N minus N out front.
00:39:50.880 --> 00:39:53.960
And then the
integration that I have
00:39:53.960 --> 00:39:57.930
is just a standard Gaussian
with a variance that
00:39:57.930 --> 00:40:03.790
is just proportional to N.
So I would get a root 2 pi N.
00:40:03.790 --> 00:40:08.220
And then I would have
higher order corrections
00:40:08.220 --> 00:40:11.320
that if you are energetic,
you can actually calculate.
00:40:11.320 --> 00:40:13.460
It's not that difficult.
00:40:13.460 --> 00:40:26.890
So you get this Stirling's
Formula that limit of large N,
00:40:26.890 --> 00:40:34.600
let's do log of N factorial is N
log N minus N. And if you want,
00:40:34.600 --> 00:40:41.320
you can go one step further,
and you have 1/2 log of 2 pi N.
00:40:41.320 --> 00:41:14.572
And the next order term
would be order of 1/N.
00:41:14.572 --> 00:41:15.155
Any questions?
00:41:20.240 --> 00:41:21.210
OK?
00:41:21.210 --> 00:41:24.530
Where do I need to use this?
00:41:24.530 --> 00:41:33.910
Next part, we are going to talk
about entropy, information,
00:41:33.910 --> 00:41:34.560
and estimation.
00:41:45.600 --> 00:41:50.100
So the first four
topics of the course
00:41:50.100 --> 00:41:55.910
thermodynamics, probability,
this kinetic theory of gases,
00:41:55.910 --> 00:41:58.560
and basic of
statistical physics.
00:41:58.560 --> 00:42:03.960
In each one of them, you will
define some version of entropy.
00:42:03.960 --> 00:42:06.220
We already saw the
thermodynamic one
00:42:06.220 --> 00:42:11.420
as dQ divided by T meaning dS.
00:42:11.420 --> 00:42:14.120
Now just thinking
about probability
00:42:14.120 --> 00:42:17.520
will also enable you to
define some form of entropy.
00:42:17.520 --> 00:42:21.000
So let's see how we go about it.
00:42:21.000 --> 00:42:26.230
So also information,
what does that mean?
00:42:26.230 --> 00:42:29.920
It goes back to
work off Shannon.
00:42:29.920 --> 00:42:33.650
And the idea is as
follows, suppose
00:42:33.650 --> 00:42:36.876
you want to send a
message of N characters.
00:42:48.200 --> 00:42:53.560
The characters
themselves are taken
00:42:53.560 --> 00:42:55.670
from some kind of
alphabet, if you
00:42:55.670 --> 00:43:08.365
like, x1 through xM
that has M characters.
00:43:12.890 --> 00:43:15.720
So, for example
if you're sending
00:43:15.720 --> 00:43:17.400
a message in English
language, you
00:43:17.400 --> 00:43:19.640
would be using the
letters A through Z.
00:43:19.640 --> 00:43:21.720
So you have M off 26.
00:43:24.370 --> 00:43:26.450
Maybe if you want
to include space,
00:43:26.450 --> 00:43:28.783
punctuation, it would
be larger than that.
00:43:31.850 --> 00:43:37.580
But let's say if you're
dealing with English language,
00:43:37.580 --> 00:43:40.730
the probabilities of
the different characters
00:43:40.730 --> 00:43:42.260
are not the same.
00:43:42.260 --> 00:43:46.510
So S and P, you are going to
encounter much more frequently
00:43:46.510 --> 00:43:53.820
than, say, Z or X. So let's say
that the frequencies with which
00:43:53.820 --> 00:43:57.980
we expect these characters
to occur are things like P1
00:43:57.980 --> 00:44:00.226
through PM.
00:44:00.226 --> 00:44:00.725
OK?
00:44:03.680 --> 00:44:09.680
Now how many possible
messages are there?
00:44:09.680 --> 00:44:18.560
So number of possible
messages that's are composed
00:44:18.560 --> 00:44:27.650
of N occurrences of alphabet
of M letters you would say
00:44:27.650 --> 00:44:36.130
is M to the N. Now, Shannon was
sort of concerned with sending
00:44:36.130 --> 00:44:39.090
the information
about this message,
00:44:39.090 --> 00:44:44.780
let's say, over a line
where you have converted it
00:44:44.780 --> 00:44:46.830
to, say, a binary code.
00:44:46.830 --> 00:44:51.710
And then you would say
that the number of bits
00:44:51.710 --> 00:45:02.900
that would correspond to M to
the N is the N log base 2 of M.
00:45:02.900 --> 00:45:09.850
That is, if you really
had the simpler case
00:45:09.850 --> 00:45:15.680
where your selections was just
head or tail, it was binary.
00:45:15.680 --> 00:45:18.090
And you wanted to
send to somebody
00:45:18.090 --> 00:45:23.230
else the outcome of
500 throws of a coin.
00:45:23.230 --> 00:45:27.990
It would be a sequence of
500 0's and 1's corresponding
00:45:27.990 --> 00:45:29.280
to head or tails.
00:45:29.280 --> 00:45:35.130
So you would have to send
for the binary case, one
00:45:35.130 --> 00:45:37.560
bit per outcome.
00:45:37.560 --> 00:45:39.830
If it is something
like a base of DNA
00:45:39.830 --> 00:45:43.910
and there are four things,
you would have two per base.
00:45:43.910 --> 00:45:48.030
So that would be log 4 base 2.
00:45:48.030 --> 00:45:53.470
And for English, it would
be log 26 or whatever
00:45:53.470 --> 00:45:56.020
the appropriate number is
with punctuation-- maybe
00:45:56.020 --> 00:46:00.138
comes to 32-- possible
characters than five
00:46:00.138 --> 00:46:02.380
per [? element ?].
00:46:02.380 --> 00:46:04.150
OK.
00:46:04.150 --> 00:46:08.430
But you know that
if you sort of were
00:46:08.430 --> 00:46:13.290
to look at all possible
messages, most of them
00:46:13.290 --> 00:46:14.910
would be junk.
00:46:14.910 --> 00:46:18.710
And in particular, if you had
used this simple substitution
00:46:18.710 --> 00:46:23.000
code, for example, to
mix up your message,
00:46:23.000 --> 00:46:25.990
you replaced A by
something else, et cetera,
00:46:25.990 --> 00:46:30.310
the frequencies
would be preserved.
00:46:30.310 --> 00:46:34.770
So sort of clearly a nice way to
decode this substitution code,
00:46:34.770 --> 00:46:36.940
if you have a long
enough text, you sort of
00:46:36.940 --> 00:46:39.420
look at how many
repetitions they are
00:46:39.420 --> 00:46:41.920
and match them with
their frequencies
00:46:41.920 --> 00:46:45.010
that you expect for
a real language.
00:46:45.010 --> 00:46:49.600
So the number of
possible messages--
00:46:49.600 --> 00:46:58.400
So in a typical
message, what you
00:46:58.400 --> 00:47:15.955
expect Ni, which is Pi
N occurrences, of xi.
00:47:19.070 --> 00:47:24.840
So if you know for example, what
the frequencies of the letters
00:47:24.840 --> 00:47:28.070
in the alphabet are, in
a long enough message,
00:47:28.070 --> 00:47:32.080
you expect that typically
you would get that number.
00:47:32.080 --> 00:47:33.670
Of course, what
that really means
00:47:33.670 --> 00:47:35.870
is that you're going
to get correction
00:47:35.870 --> 00:47:38.890
because not all
messages are the same.
00:47:38.890 --> 00:47:41.860
But the deviation
that you would get
00:47:41.860 --> 00:47:46.810
from getting something that is
proportional to the probability
00:47:46.810 --> 00:47:50.260
through the frequency in the
limit of a very long message
00:47:50.260 --> 00:47:54.620
would be of the order
of N to the 1/2.
00:47:54.620 --> 00:47:57.780
So ignoring this
N to the 1/2, you
00:47:57.780 --> 00:48:00.420
would say that the typical
message that I expect
00:48:00.420 --> 00:48:06.540
to receive will have characters
according to these proportions.
00:48:06.540 --> 00:48:09.070
So if I asked the
following question,
00:48:09.070 --> 00:48:13.360
not what are the number
of all possible messages,
00:48:13.360 --> 00:48:15.625
but what is the number
of typical messages?
00:48:21.980 --> 00:48:24.400
I will call that g.
00:48:24.400 --> 00:48:27.040
The number of typical
messages would
00:48:27.040 --> 00:48:33.080
be always of distributing
these number of characters
00:48:33.080 --> 00:48:37.250
in a message of length N. Again
there are clearly correlations.
00:48:37.250 --> 00:48:39.149
But for the time
being, forgetting all
00:48:39.149 --> 00:48:41.440
of the correlations, if
[? we ?] [? do ?] correlations,
00:48:41.440 --> 00:48:42.870
we only reduce this number.
00:48:58.870 --> 00:49:30.120
So this number is much,
much less time M to the N.
00:49:30.120 --> 00:49:34.680
Now here is I'm going to make an
excursion to so far everything
00:49:34.680 --> 00:49:35.400
was clear.
00:49:35.400 --> 00:49:39.480
Now I'm going to say something
that is kind of theoretically
00:49:39.480 --> 00:49:43.210
correct, but
practically not so much.
00:49:43.210 --> 00:49:46.010
You could, for
example, have some way
00:49:46.010 --> 00:49:50.640
of labeling all possible
typical messages.
00:49:50.640 --> 00:49:54.440
So you would have-- this would
be typical message number
00:49:54.440 --> 00:49:58.970
one, number two, all the way
to typical message number g.
00:49:58.970 --> 00:50:01.310
This is the number
of typical message.
00:50:01.310 --> 00:50:04.990
Suppose I could point out to
one of these messages and say,
00:50:04.990 --> 00:50:08.420
this is the message
that was actually sent.
00:50:08.420 --> 00:50:11.200
How many bits of
information would I
00:50:11.200 --> 00:50:17.530
have to that indicate
one number out of g?
00:50:17.530 --> 00:50:29.620
The number of bits
of information
00:50:29.620 --> 00:50:39.820
for a typical message, rather
than being this object,
00:50:39.820 --> 00:50:41.260
would simply be log g.
00:50:49.240 --> 00:50:50.980
So let's see what this log g is.
00:50:50.980 --> 00:50:53.960
And for the time being,
let's forget the basis.
00:50:53.960 --> 00:50:56.420
I can always change
basis by dividing
00:50:56.420 --> 00:51:00.790
by log of whatever quantity
I'm looking at the basis.
00:51:00.790 --> 00:51:12.024
This is they log of N factorial
divided by these product over i
00:51:12.024 --> 00:51:18.890
of Ni factorials which
are these Pi N's.
00:51:18.890 --> 00:51:23.990
And in the limit of
large N, what I can use
00:51:23.990 --> 00:51:27.390
is the Stirling's Formula
that we had over there.
00:51:27.390 --> 00:51:32.390
So what I have is N log N
minus N in the numerator.
00:51:36.320 --> 00:51:44.664
Minus sum over i Ni
log of Ni minus Ni.
00:51:48.840 --> 00:51:53.370
Of course the sum over
Ni's cancels this N,
00:51:53.370 --> 00:51:56.540
so I don't need to
worry about that.
00:51:56.540 --> 00:51:59.240
And I can rearrange this.
00:51:59.240 --> 00:52:05.860
I can write this as
this N as sum over i Ni.
00:52:05.860 --> 00:52:09.040
Put the terms that are
proportional to Ni together.
00:52:09.040 --> 00:52:12.340
You can see that
I get Ni log of Ni
00:52:12.340 --> 00:52:14.780
over N, which
would be log of Pi.
00:52:14.780 --> 00:52:20.180
And I can actually then
take out a factor of N,
00:52:20.180 --> 00:52:25.438
and write it as sum
over i Pi log of Pi.
00:52:39.140 --> 00:52:41.930
And just as a excursion,
this is something
00:52:41.930 --> 00:52:45.770
that you've already
seen hopefully.
00:52:45.770 --> 00:52:48.555
This is also called
mixing entropy.
00:52:57.400 --> 00:53:00.710
And we will see
it later on, also.
00:53:00.710 --> 00:53:06.350
That is, if I had
initially a bunch of,
00:53:06.350 --> 00:53:10.130
let's say, things that
were of color red,
00:53:10.130 --> 00:53:12.820
and separately in a
box a bunch of things
00:53:12.820 --> 00:53:15.980
that are color green, and
then bunch of things that
00:53:15.980 --> 00:53:23.470
are a different color,
and I knew initially
00:53:23.470 --> 00:53:26.610
where they were in
each separate box,
00:53:26.610 --> 00:53:30.360
and I then mix them up
together so that they're
00:53:30.360 --> 00:53:33.510
putting all possible
random ways,
00:53:33.510 --> 00:53:36.740
and I don't know
which is where, I
00:53:36.740 --> 00:53:42.430
have done something
that is irreversible.
00:53:42.430 --> 00:53:45.390
It is very easy to take
these boxes of marbles
00:53:45.390 --> 00:53:48.470
of different colors
and mix them up.
00:53:48.470 --> 00:53:51.960
You have to do more work
to separate them out.
00:53:51.960 --> 00:53:56.580
And so this increase
in entropy is
00:53:56.580 --> 00:54:00.090
given by precisely
the same formula here.
00:54:00.090 --> 00:54:03.840
And it's called
the mixing entropy.
00:54:03.840 --> 00:54:07.850
So what we can see
now that we sort of
00:54:07.850 --> 00:54:10.820
rather than thinking
of these as particles,
00:54:10.820 --> 00:54:12.820
we were thinking of
these as letters.
00:54:12.820 --> 00:54:15.620
And then we mixed up the
letters in all possible ways
00:54:15.620 --> 00:54:17.560
to make our messages.
00:54:17.560 --> 00:54:28.850
But quite generally for
any discrete probability,
00:54:28.850 --> 00:54:35.920
so a probability that has a
set of possible outcomes Pi,
00:54:35.920 --> 00:54:49.180
we can define an
entropy S associated
00:54:49.180 --> 00:54:55.160
with these set of probabilities,
which is given by this formula.
00:54:55.160 --> 00:55:00.160
Minus sum over i Pi log of Pi.
00:55:00.160 --> 00:55:03.680
If you like, it is also this--
not quite, doesn't makes
00:55:03.680 --> 00:55:14.190
sense-- but it's some kind
of an average of log P.
00:55:14.190 --> 00:55:18.170
So anytime we see a
discrete probability,
00:55:18.170 --> 00:55:20.950
we can certainly do that.
00:55:20.950 --> 00:55:25.750
It turns out that also we will
encounter in cases later on,
00:55:25.750 --> 00:55:30.350
where rather than having
a discrete probability,
00:55:30.350 --> 00:55:34.140
we have a probability
density function.
00:55:34.140 --> 00:55:40.400
And we would be very tempted
to define an entropy associated
00:55:40.400 --> 00:55:47.646
with a PDF to be something
like minus an integral dx
00:55:47.646 --> 00:55:50.020
P of x log of P of x.
00:55:53.060 --> 00:55:56.420
But this is kind of undefined.
00:55:56.420 --> 00:56:01.790
Because probability density
depends on some quantity
00:56:01.790 --> 00:56:03.910
x that has units.
00:56:03.910 --> 00:56:06.660
If this was probability
along a line,
00:56:06.660 --> 00:56:11.190
and I changed my units
from meters to centimeters,
00:56:11.190 --> 00:56:14.870
then this log will
gain a factor that
00:56:14.870 --> 00:56:18.080
will be associated with
the change in scale
00:56:18.080 --> 00:56:21.145
So this is kind of undefined.
00:56:24.200 --> 00:56:28.910
One of the miracles
of statistical physics
00:56:28.910 --> 00:56:33.060
is that we will find
the exact measure
00:56:33.060 --> 00:56:35.840
to make this probability
in the continuum
00:56:35.840 --> 00:56:41.840
unique and independent of
the choice of-- I mean,
00:56:41.840 --> 00:56:44.710
there is a very
precise choice of units
00:56:44.710 --> 00:56:47.310
for measuring things that
would make this well-defined.
00:56:47.310 --> 00:56:47.810
Yes.
00:56:47.810 --> 00:56:51.198
AUDIENCE: But that would be
undefined up to some sort of
00:56:51.198 --> 00:56:51.698
[INAUDIBLE].
00:56:51.698 --> 00:56:52.670
PROFESSOR: After
you [INAUDIBLE].
00:56:52.670 --> 00:56:54.620
AUDIENCE: So you can still
extract dependencies from it.
00:56:54.620 --> 00:56:56.660
PROFESSOR: You can
still calculate things
00:56:56.660 --> 00:56:58.440
like differences, et cetera.
00:56:58.440 --> 00:57:01.495
But there is a certain
lack of definition.
00:57:06.136 --> 00:57:06.635
Yes.
00:57:06.635 --> 00:57:09.012
AUDIENCE: [INAUDIBLE] the
relation between this entropy
00:57:09.012 --> 00:57:12.160
defined here with the
entropy defined earlier,
00:57:12.160 --> 00:57:15.210
you notice the parallel.
00:57:15.210 --> 00:57:17.360
PROFESSOR: We find
that all you have to do
00:57:17.360 --> 00:57:20.030
is to multiply by
a Boltzmann factor,
00:57:20.030 --> 00:57:23.330
and they would become identical.
00:57:23.330 --> 00:57:24.780
So we will see that.
00:57:24.780 --> 00:57:30.570
It turns out that the heat
definition of entropy,
00:57:30.570 --> 00:57:32.990
once you look at
the right variables
00:57:32.990 --> 00:57:37.770
to define probability with, then
the entropy of a probability
00:57:37.770 --> 00:57:39.770
distribution is
exactly the entropy
00:57:39.770 --> 00:57:42.450
that comes from the
heat calculation.
00:57:42.450 --> 00:57:46.612
So up to here, there is a
measured numerical constant
00:57:46.612 --> 00:57:47.570
that we have to define.
00:57:59.290 --> 00:58:00.420
All right.
00:58:00.420 --> 00:58:04.060
But what does this have to
do with this Shannon story?
00:58:20.910 --> 00:58:27.228
Going back to the story, if I
didn't know the probabilities,
00:58:27.228 --> 00:58:30.880
if I didn't know
this, I would say
00:58:30.880 --> 00:58:36.120
that I need to pass on
this amount of information.
00:58:36.120 --> 00:58:40.050
But if I somehow constructed
the right scheme,
00:58:40.050 --> 00:58:43.020
and the person that
I'm sending the message
00:58:43.020 --> 00:58:46.940
knows the probabilities,
then I need
00:58:46.940 --> 00:58:52.570
to send this amount of
information, which is actually
00:58:52.570 --> 00:58:56.560
less than N log M.
00:58:56.560 --> 00:59:01.170
So clearly having knowledge
of the probabilities
00:59:01.170 --> 00:59:05.840
gives you some ability,
some amount of information,
00:59:05.840 --> 00:59:09.980
so that you have
to send less bits.
00:59:09.980 --> 00:59:11.370
OK.
00:59:11.370 --> 00:59:32.440
So the reduction in number
of bits due to knowledge of P
00:59:32.440 --> 00:59:39.880
is the difference between N log
M, which I had to do before,
00:59:39.880 --> 00:59:43.430
and what I have to
do now, which is
00:59:43.430 --> 00:59:50.000
N Pi sum over i Pi log of Pi.
00:59:55.500 --> 01:00:08.710
So which is N log M plus
sum over i Pi log of Pi.
01:00:08.710 --> 01:00:11.830
I can evaluate
this in any basis.
01:00:11.830 --> 01:00:16.220
If I wanted to really count in
terms of the number of bits,
01:00:16.220 --> 01:00:21.620
I would do both of these
things in log base 2.
01:00:21.620 --> 01:00:26.860
It is clearly something that
is proportional to the length
01:00:26.860 --> 01:00:27.810
of the message.
01:00:27.810 --> 01:00:32.700
That is, if I want to send a
book that these twice as big,
01:00:32.700 --> 01:00:35.970
the amount of bits will
be reduced proportionately
01:00:35.970 --> 01:00:37.930
by this amount.
01:00:37.930 --> 01:00:39.820
So you can define
a quantity that
01:00:39.820 --> 01:00:42.163
is basically the
information per bit.
01:00:45.550 --> 01:00:54.590
And this is given the
knowledge of the probabilities,
01:00:54.590 --> 01:01:00.670
you really have gained
an information per bit
01:01:00.670 --> 01:01:06.650
which is the difference of log
M and sum over i Pi log Pi.
01:01:12.880 --> 01:01:16.725
Up to a sign and this
additional factor of log N,
01:01:16.725 --> 01:01:21.705
the entropy-- because I can
actually get rid of this N--
01:01:21.705 --> 01:01:28.430
the entropy and the information
are really the same thing
01:01:28.430 --> 01:01:30.890
up to a sign.
01:01:30.890 --> 01:01:33.070
And just to sort of
make sure that we
01:01:33.070 --> 01:01:37.760
understand the
appropriate limits.
01:01:37.760 --> 01:01:41.720
If I have something
like the case
01:01:41.720 --> 01:01:46.330
where I have a
uniform distribution.
01:01:46.330 --> 01:01:51.430
Let's say that I say that
all characters in my message
01:01:51.430 --> 01:01:54.460
are equally likely to occur.
01:01:54.460 --> 01:01:58.180
If it's a coin, it's unbiased
coin, it's as likely in a throw
01:01:58.180 --> 01:02:00.140
to be head or tail.
01:02:00.140 --> 01:02:02.400
You would say that if
it's an unbiased coin,
01:02:02.400 --> 01:02:06.740
I really should send one
bit per throw of the coin.
01:02:06.740 --> 01:02:09.720
And indeed, that will
follow from this.
01:02:09.720 --> 01:02:11.650
Because in this
case, you can see
01:02:11.650 --> 01:02:14.440
that the information
contained is
01:02:14.440 --> 01:02:24.200
going to be log M. And then
I have plus 1 over M log of 1
01:02:24.200 --> 01:02:28.520
over M. And there are M
such terms that are uniform.
01:02:28.520 --> 01:02:31.240
And this gives me 0.
01:02:31.240 --> 01:02:34.460
There is no information here.
01:02:34.460 --> 01:02:37.870
If I ask what's the
entropy in this case.
01:02:37.870 --> 01:02:41.210
The entropy is M terms.
01:02:41.210 --> 01:02:44.750
Each one of them have
a factor of 1 over M.
01:02:44.750 --> 01:02:49.060
And then I have a
log of 1 over M.
01:02:49.060 --> 01:02:52.400
And there is a minus
sign here overall.
01:02:56.050 --> 01:03:00.150
So this is log of M.
01:03:00.150 --> 01:03:06.310
So you've probably seen this
version of the entropy before.
01:03:06.310 --> 01:03:09.640
That if you have M
equal possibilities,
01:03:09.640 --> 01:03:13.610
the entropy is
related to log M. This
01:03:13.610 --> 01:03:19.380
is the case where all of
outcomes are equally likely.
01:03:19.380 --> 01:03:23.380
So basically this is
a uniform probability.
01:03:23.380 --> 01:03:25.540
Everything is equally likely.
01:03:25.540 --> 01:03:27.480
You have no information.
01:03:27.480 --> 01:03:31.520
You have this maximal
possible entropy.
01:03:31.520 --> 01:03:35.520
The other extreme
of it would be where
01:03:35.520 --> 01:03:37.870
you have a definite result.
01:03:37.870 --> 01:03:42.320
You have a coin that
always gives you heads.
01:03:42.320 --> 01:03:44.300
And if the other
person knows that,
01:03:44.300 --> 01:03:46.360
you don't need to
send any information.
01:03:46.360 --> 01:03:49.560
No matter thousand times,
it will be thousand heads.
01:03:49.560 --> 01:03:53.620
So here, Pi is a delta function.
01:03:53.620 --> 01:03:57.720
Let's say i equals to five
or whatever number is.
01:03:57.720 --> 01:04:01.370
So one of the
variables in the list
01:04:01.370 --> 01:04:02.840
carries all the probability.
01:04:02.840 --> 01:04:05.520
All the others
carry 0 probability.
01:04:05.520 --> 01:04:08.680
How much information
do I have here?
01:04:08.680 --> 01:04:14.230
I have log M. Now when I
go and looked at the list,
01:04:14.230 --> 01:04:21.840
in the list, either P is 0, or
P is one, but the log of 1 and M
01:04:21.840 --> 01:04:23.140
is 0.
01:04:23.140 --> 01:04:25.240
So this is basically
going to give me 0.
01:04:25.240 --> 01:04:28.150
Entropy in this case is 0.
01:04:28.150 --> 01:04:29.860
The information is maximum.
01:04:29.860 --> 01:04:32.310
You don't need to
pass any information.
01:04:32.310 --> 01:04:34.230
So anything else is in between.
01:04:34.230 --> 01:04:37.720
So you sort of think
of a probability that
01:04:37.720 --> 01:04:43.050
is some big thing, some
small things, et cetera,
01:04:43.050 --> 01:04:45.160
you can figure out
what its entropy is
01:04:45.160 --> 01:04:49.460
and what is
information content is.
01:04:49.460 --> 01:04:52.560
So actually I don't
know the answer.
01:04:52.560 --> 01:04:54.720
But presume it's very
easy to figure out
01:04:54.720 --> 01:04:57.810
what's the information
per character
01:04:57.810 --> 01:05:00.400
of the text in English language.
01:05:00.400 --> 01:05:02.415
Once you know the
frequencies of the characters
01:05:02.415 --> 01:05:04.993
you can go and calculate this.
01:05:10.550 --> 01:05:11.240
Questions.
01:05:11.240 --> 01:05:11.790
Yes.
01:05:11.790 --> 01:05:13.540
AUDIENCE: Just to
clarify the terminology,
01:05:13.540 --> 01:05:17.187
so the information
means the [INAUDIBLE]?
01:05:17.187 --> 01:05:18.770
PROFESSOR: The number
of bits that you
01:05:18.770 --> 01:05:22.860
have to transmit to
the other person.
01:05:22.860 --> 01:05:25.410
So the other person
knows the probability.
01:05:25.410 --> 01:05:28.095
Given that they know
the probabilities,
01:05:28.095 --> 01:05:30.280
how many fewer
bits of information
01:05:30.280 --> 01:05:33.160
should I send to them?
01:05:33.160 --> 01:05:36.520
So their knowledge
corresponds to a gain
01:05:36.520 --> 01:05:39.742
in number of bits, which
is given by this formula.
01:05:47.310 --> 01:05:50.980
If you know that the
coin that I'm throwing
01:05:50.980 --> 01:05:55.660
is biased so that it
always comes heads,
01:05:55.660 --> 01:05:59.110
then I don't have to
send you any information.
01:05:59.110 --> 01:06:01.480
So per every time
I throw the coin,
01:06:01.480 --> 01:06:02.820
you have one bit of information.
01:06:12.940 --> 01:06:15.746
Other questions?
01:06:15.746 --> 01:06:21.235
AUDIENCE: The equation, the
top equation, so natural
01:06:21.235 --> 01:06:25.726
log [INAUDIBLE] natural
log of 2, [INAUDIBLE]?
01:06:28.720 --> 01:06:34.510
PROFESSOR: I initially
calculated my standing formula
01:06:34.510 --> 01:06:40.970
as log of N factorial
is N log N minus N.
01:06:40.970 --> 01:06:44.920
So since I had done
everything in natural log,
01:06:44.920 --> 01:06:46.650
I maintained that.
01:06:46.650 --> 01:06:54.050
And then I used this
symbol that log, say, 5 2
01:06:54.050 --> 01:06:57.899
is the same thing that maybe
are used with this notation.
01:06:57.899 --> 01:06:58.440
I don't know.
01:07:01.920 --> 01:07:05.790
So if I don't indicate a number
here, it's the natural log.
01:07:05.790 --> 01:07:07.280
It's base e.
01:07:07.280 --> 01:07:13.240
If I put a number so log,
let's say, base 2 of 5
01:07:13.240 --> 01:07:17.004
is log 5 divided by log 2.
01:07:20.378 --> 01:07:22.790
AUDIENCE: So [INAUDIBLE]?
01:07:22.790 --> 01:07:24.191
PROFESSOR: Log 2, log 2.
01:07:24.191 --> 01:07:24.690
Information.
01:07:27.289 --> 01:07:27.830
AUDIENCE: Oh.
01:07:30.430 --> 01:07:34.290
PROFESSOR: Or if you like, I
could have divided by log 2
01:07:34.290 --> 01:07:35.748
here.
01:07:35.748 --> 01:07:37.692
AUDIENCE: But so
there [INAUDIBLE]
01:07:37.692 --> 01:07:40.122
all of the other
places, and you just
01:07:40.122 --> 01:07:42.552
[? write ?] all
this [INAUDIBLE].
01:07:42.552 --> 01:07:46.345
All right, thank
you, [? Michael. ?]
01:07:46.345 --> 01:07:48.761
PROFESSOR: Right.
01:07:48.761 --> 01:07:49.260
Yeah.
01:07:49.260 --> 01:07:54.970
So this is the general
way to transfer
01:07:54.970 --> 01:07:58.440
between log, natural
log, and any log.
01:07:58.440 --> 01:08:01.340
In the language of
electrical engineering,
01:08:01.340 --> 01:08:05.110
where Shannon worked, it is
common to express everything
01:08:05.110 --> 01:08:07.270
in terms of the number of bits.
01:08:07.270 --> 01:08:09.110
So whenever I'm
expressing things
01:08:09.110 --> 01:08:11.300
in terms of the number
of bits, I really
01:08:11.300 --> 01:08:13.040
should use the log of 2.
01:08:13.040 --> 01:08:16.319
So I really, if I want
to use information,
01:08:16.319 --> 01:08:18.439
I really should use log of 2.
01:08:18.439 --> 01:08:21.020
Whereas in statistical
physics, we usually
01:08:21.020 --> 01:08:24.215
use the natural log
in expressing entropy.
01:08:24.215 --> 01:08:26.657
AUDIENCE: Oh, so it doesn't
really matter [INAUDIBLE].
01:08:26.657 --> 01:08:28.490
PROFESSOR: It's just
an overall coefficient.
01:08:28.490 --> 01:08:30.710
As I said that
eventually, if I want
01:08:30.710 --> 01:08:34.130
to calculate to the heat
version of the entropy,
01:08:34.130 --> 01:08:36.770
I have to multiply by
yet another number, which
01:08:36.770 --> 01:08:38.470
is the Boltzmann constant.
01:08:38.470 --> 01:08:42.410
So really the
conceptual part is more
01:08:42.410 --> 01:08:45.263
important than the
overall numerical factor.
01:08:50.979 --> 01:08:51.479
OK?
01:09:02.590 --> 01:09:08.167
I had the third item in my list
here, which we can finish with,
01:09:08.167 --> 01:09:09.000
which is estimation.
01:09:20.920 --> 01:09:26.450
So frequently you are
faced with the task
01:09:26.450 --> 01:09:29.800
of assigning probabilities.
01:09:29.800 --> 01:09:32.850
So there's a situation.
01:09:32.850 --> 01:09:35.490
You know that there's
a number of outcomes.
01:09:35.490 --> 01:09:37.399
And you want to
assign probabilities
01:09:37.399 --> 01:09:39.170
for these outcomes.
01:09:39.170 --> 01:09:43.939
And the procedure
that we will use
01:09:43.939 --> 01:09:46.900
is summarized by the
following sentence
01:09:46.900 --> 01:09:49.350
that I have to then define.
01:09:49.350 --> 01:09:59.420
The most unbiased--
let's actually
01:09:59.420 --> 01:10:01.550
just say it's the
definition if you like--
01:10:01.550 --> 01:10:14.940
the unbiased assignment
of probabilities
01:10:14.940 --> 01:10:27.560
maximizes the entropy
subject to constraints.
01:10:30.690 --> 01:10:31.575
Known constraints.
01:10:40.810 --> 01:10:42.990
What do I mean by that?
01:10:42.990 --> 01:10:48.080
So suppose I had told you
that we are throwing a dice.
01:10:48.080 --> 01:10:52.060
Or let's say a coin, but
let's go back to the dice.
01:10:52.060 --> 01:10:57.210
And the dice has possibilities
1, 2, 3, 4, 5, 6.
01:10:57.210 --> 01:11:00.100
And this is the only
thing that I know.
01:11:00.100 --> 01:11:03.360
So if somebody says
that I'm throwing a dice
01:11:03.360 --> 01:11:05.030
and you don't know
anything else,
01:11:05.030 --> 01:11:08.675
there's no reason for you to
privilege 6 with respect to 4,
01:11:08.675 --> 01:11:10.240
or 3 with respect to 5.
01:11:10.240 --> 01:11:14.310
So as far as I know, at this
moment in time, all of these
01:11:14.310 --> 01:11:16.030
are equally likely.
01:11:16.030 --> 01:11:22.160
So I will assign each one of
them for probability of 1/6.
01:11:22.160 --> 01:11:25.990
But we also saw over
here what was happening.
01:11:25.990 --> 01:11:28.580
The uniform
probability was the one
01:11:28.580 --> 01:11:30.770
that had the largest entropy.
01:11:30.770 --> 01:11:33.050
If I were to change
the probability
01:11:33.050 --> 01:11:36.200
so that something goes up
and something goes down,
01:11:36.200 --> 01:11:37.720
then I calculate that formula.
01:11:37.720 --> 01:11:41.050
And I find that the--
sorry-- the uniform
01:11:41.050 --> 01:11:42.490
one has the largest entropy.
01:11:42.490 --> 01:11:46.290
This has less entropy
compared to the uniform one.
01:11:46.290 --> 01:11:52.420
So what we have done in
assigning uniform probability
01:11:52.420 --> 01:11:56.790
is really to maximize the
entropy subject to the fact
01:11:56.790 --> 01:11:59.260
that I don't know anything
except that the probabilities
01:11:59.260 --> 01:12:02.420
should add up to 1.
01:12:02.420 --> 01:12:06.230
But now suppose
that somebody threw
01:12:06.230 --> 01:12:08.450
the dice many, many times.
01:12:08.450 --> 01:12:11.230
And each time they
were throwing the dice,
01:12:11.230 --> 01:12:14.050
they were calculating
the number.
01:12:14.050 --> 01:12:16.700
But they didn't give us
the number and frequency
01:12:16.700 --> 01:12:20.770
is what they told us was that
at the end of many, many run,
01:12:20.770 --> 01:12:27.990
the average number that
we were coming up was 3.2,
01:12:27.990 --> 01:12:30.190
4.7, whatever.
01:12:30.190 --> 01:12:33.650
So we know the average of M.
01:12:33.650 --> 01:12:35.820
So I know now some
other constraint.
01:12:35.820 --> 01:12:39.810
I've added to the
information that I had.
01:12:39.810 --> 01:12:43.650
So if I want to reassign
the probabilities given
01:12:43.650 --> 01:12:47.090
that somebody told me that
in a large number of runs,
01:12:47.090 --> 01:12:49.790
the average value of
the faces that showed up
01:12:49.790 --> 01:12:52.020
was some particular value.
01:12:52.020 --> 01:12:53.380
What do I do?
01:12:53.380 --> 01:13:01.290
I say, well, I maximize S which
depends on these Pi's, which
01:13:01.290 --> 01:13:07.190
is minus sum over i Pi
log of Pi, subjected
01:13:07.190 --> 01:13:09.730
to constraints that I know.
01:13:09.730 --> 01:13:11.950
Now one constraint
you already used
01:13:11.950 --> 01:13:16.070
previously is that the
sum of the probabilities
01:13:16.070 --> 01:13:17.863
is equal to 1.
01:13:20.821 --> 01:13:27.440
This I introduce here through
a Lagrange multiplier,
01:13:27.440 --> 01:13:34.330
alpha, which I will adjust later
to make sure that this holds.
01:13:34.330 --> 01:13:39.710
And in general, what we do if
we have multiple constraints is
01:13:39.710 --> 01:13:44.970
we can add more and more
Lagrange multipliers.
01:13:44.970 --> 01:13:53.892
And the average of M is
sum over, let's say, i Pi.
01:13:53.892 --> 01:13:57.150
So 1 times P of
1, 2 times P of 2,
01:13:57.150 --> 01:14:02.650
et cetera, will give you
whatever the average value is.
01:14:02.650 --> 01:14:06.130
So these are the two constraints
that I specified for you here.
01:14:06.130 --> 01:14:10.300
There could've been other
constraints, et cetera.
01:14:10.300 --> 01:14:15.050
So then, if you have a
function with constraint
01:14:15.050 --> 01:14:19.070
that you have to extremize, you
add these Lagrange multipliers.
01:14:19.070 --> 01:14:22.084
Then you do dS by dPi.
01:14:22.084 --> 01:14:34.100
Why did I do this? dS by
dPi, which is minus log of Pi
01:14:34.100 --> 01:14:35.435
from here.
01:14:35.435 --> 01:14:40.400
Derivative of log P is 1 over P,
with this will give me minus 1.
01:14:40.400 --> 01:14:43.740
There is a minus alpha here.
01:14:43.740 --> 01:14:52.770
And then there's a minus
beta times i from here.
01:14:52.770 --> 01:14:58.570
And extremizing means I
have to set this to 0.
01:14:58.570 --> 01:15:04.770
So you can see that the
solution to this is Pi--
01:15:04.770 --> 01:15:09.720
or actually log of Pi,
let's say, is minus 1
01:15:09.720 --> 01:15:15.260
plus alpha minus beta i.
01:15:15.260 --> 01:15:23.660
So that Pi is e to
the minus 1 plus alpha
01:15:23.660 --> 01:15:26.220
e to the minus beta times i.
01:15:29.980 --> 01:15:32.480
I haven't completed the story.
01:15:32.480 --> 01:15:36.300
I really have to
solve the equations
01:15:36.300 --> 01:15:39.830
in terms of alpha and
beta that would give me
01:15:39.830 --> 01:15:44.980
the final results in terms
of the expectation value of i
01:15:44.980 --> 01:15:47.790
as well as some
other quantities.
01:15:47.790 --> 01:15:51.420
But this is the procedure
that you would normally
01:15:51.420 --> 01:15:57.520
use to give you the unbiased
assignment of probability.
01:15:57.520 --> 01:16:01.160
Now this actually goes back to
what I said at the beginning.
01:16:01.160 --> 01:16:04.940
That there's two ways of
assigning probabilities,
01:16:04.940 --> 01:16:08.710
either objectively by actually
doing lots of measurement,
01:16:08.710 --> 01:16:09.980
or subjectivity.
01:16:09.980 --> 01:16:12.160
So this is really
formalizing what
01:16:12.160 --> 01:16:14.630
this objective procedure means.
01:16:14.630 --> 01:16:16.810
So you put in all
of the information
01:16:16.810 --> 01:16:20.680
that you have, the number
of states, any constraints.
01:16:20.680 --> 01:16:23.310
And then you maximize
entropy that we
01:16:23.310 --> 01:16:29.010
defined what it
was to get the best
01:16:29.010 --> 01:16:33.880
maximal entropy for the
assignment of probabilities
01:16:33.880 --> 01:16:36.122
consistent with
things that you know.
01:16:39.038 --> 01:16:43.350
You probably recognize this form
as kind of a Boltzmann weight
01:16:43.350 --> 01:16:46.760
that comes up again and
again in statistical physics.
01:16:46.760 --> 01:16:50.570
And that is again natural,
because there are constraints,
01:16:50.570 --> 01:16:52.310
such as the average
value of energy,
01:16:52.310 --> 01:16:54.240
average value of the
number of particles,
01:16:54.240 --> 01:16:59.470
et cetera, that consistent
with maximizing their entropy,
01:16:59.470 --> 01:17:01.940
give you forms such as this.
01:17:01.940 --> 01:17:04.850
So you can see that
a lot of concepts
01:17:04.850 --> 01:17:09.960
that we will later on be
using in statistical physics
01:17:09.960 --> 01:17:14.130
are already embedded in these
discussions of probability.
01:17:14.130 --> 01:17:18.170
And we've also seen how the
large N aspect comes about,
01:17:18.170 --> 01:17:19.480
et cetera.
01:17:19.480 --> 01:17:22.270
So we now have the
probabilistic tools.
01:17:22.270 --> 01:17:26.185
And from next
time, we will go on
01:17:26.185 --> 01:17:28.550
to define the
degrees of freedom.
01:17:28.550 --> 01:17:33.110
What are the units that we
are going to be talking about?
01:17:33.110 --> 01:17:37.370
And how to assign them some
kind of a probabilistic picture.
01:17:37.370 --> 01:17:40.180
And then build on into
statistical mechanics.
01:17:40.180 --> 01:17:41.069
Yes.
01:17:41.069 --> 01:17:43.065
AUDIENCE: So here,
you write the letter i
01:17:43.065 --> 01:17:46.558
to represent, in this case, the
results of a random die roll,
01:17:46.558 --> 01:17:49.959
that you can replace it with any
function of a random variable.
01:17:49.959 --> 01:17:50.750
PROFESSOR: Exactly.
01:17:50.750 --> 01:17:54.775
So I could have, maybe
rather than giving me
01:17:54.775 --> 01:17:58.135
the average value of the number
that was appearing on the face,
01:17:58.135 --> 01:18:00.145
they would have given
me the average inverse.
01:18:03.870 --> 01:18:05.290
And then I would have had this.
01:18:08.060 --> 01:18:09.560
I could have had
multiple things.
01:18:09.560 --> 01:18:12.530
So maybe somebody else
measures something else.
01:18:12.530 --> 01:18:14.760
And then my general
form would be
01:18:14.760 --> 01:18:19.460
e to the minus beta
measurement of type one,
01:18:19.460 --> 01:18:22.980
minus beta 2 measurement
of type two, et cetera.
01:18:22.980 --> 01:18:25.975
And the rest of thing over
here is clearly just a constant
01:18:25.975 --> 01:18:28.170
of proportionality
that I would need
01:18:28.170 --> 01:18:29.698
to adjust for the normalization.
01:18:33.970 --> 01:18:34.950
OK?
01:18:34.950 --> 01:18:38.211
So that's it for today.