WEBVTT

00:00:01.540 --> 00:00:03.910
The following content is
provided under a Creative

00:00:03.910 --> 00:00:05.300
Commons license.

00:00:05.300 --> 00:00:07.510
Your support will help
MIT OpenCourseWare

00:00:07.510 --> 00:00:11.600
continue to offer high quality
educational resources for free.

00:00:11.600 --> 00:00:14.140
To make a donation or to
view additional materials

00:00:14.140 --> 00:00:18.100
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:18.100 --> 00:00:19.310
at ocw.mit.edu.

00:00:22.672 --> 00:00:24.130
WILLIAM GREEN: So
today we're going

00:00:24.130 --> 00:00:31.920
to talk about Bayesian prior
estimation and prior estimation

00:00:31.920 --> 00:00:32.500
in general.

00:00:36.830 --> 00:00:44.830
So the last time we were
writing down the expressions

00:00:44.830 --> 00:00:52.540
for the probability of observing
a mean measurement if you

00:00:52.540 --> 00:00:54.470
know what the model is.

00:00:54.470 --> 00:00:55.730
So let's try to do that again.

00:00:55.730 --> 00:01:04.840
So suppose I have a model
that predicts some observable

00:01:04.840 --> 00:01:06.390
and it depends on
some knobs, and it

00:01:06.390 --> 00:01:09.850
depends on some parameters.

00:01:09.850 --> 00:01:15.480
And suppose because I
have great powers of faith

00:01:15.480 --> 00:01:18.670
that I believe this model is
100% correct with every core

00:01:18.670 --> 00:01:20.440
of my being.

00:01:20.440 --> 00:01:22.620
And also because I have
tremendous confidence in all

00:01:22.620 --> 00:01:25.810
the people who
built my apparatus,

00:01:25.810 --> 00:01:28.210
and the knobs that I
turn actually correspond

00:01:28.210 --> 00:01:31.030
to the real values, and I
have tremendous confidence

00:01:31.030 --> 00:01:34.320
in all the literature that
reports the parameter values.

00:01:34.320 --> 00:01:37.920
And so I'm absolutely certain
that this is the truth.

00:01:37.920 --> 00:01:41.410
So we'll start from a position
of absolute certainty,

00:01:41.410 --> 00:01:45.229
and then we'll degrade into
doubt as the collector goes on.

00:01:45.229 --> 00:01:47.020
So let's start from
the position of someone

00:01:47.020 --> 00:01:51.665
who has absolute faith
that this is the truth.

00:01:51.665 --> 00:01:52.900
Is true.

00:01:55.470 --> 00:01:58.090
So I have a model, I
really believe this model.

00:01:58.090 --> 00:02:01.330
So for example, I
believe that the kilogram

00:02:01.330 --> 00:02:04.810
weight in the SI
Institute in Paris

00:02:04.810 --> 00:02:07.510
weighs exactly one kilogram.

00:02:07.510 --> 00:02:09.800
I believe that with
every core of my being.

00:02:09.800 --> 00:02:13.390
I'm completely confident
that model is correct.

00:02:13.390 --> 00:02:15.921
So there are some things
I'm really confident on.

00:02:15.921 --> 00:02:16.420
That's one.

00:02:19.180 --> 00:02:22.410
And maybe guys have some
things you really believe, too.

00:02:22.410 --> 00:02:24.160
So let's go with things
we really believe.

00:02:28.780 --> 00:02:32.950
So I plan to conduct
some experiments that

00:02:32.950 --> 00:02:37.880
measure this observable and
are related to this model.

00:02:37.880 --> 00:02:45.865
And so I'm going to do 10
repeats of measuring y.

00:02:52.630 --> 00:02:57.101
So I'm going to get to the
kilogram blob that's in Paris,

00:02:57.101 --> 00:02:59.350
and I'm going to stick it
on my really expensive scale

00:02:59.350 --> 00:03:00.880
that I really believe
is great, and I'm

00:03:00.880 --> 00:03:02.060
going to measure its weight.

00:03:02.060 --> 00:03:03.520
And then I'm going to
put it back, and then

00:03:03.520 --> 00:03:05.370
put it back, and put it
back, and put it back.

00:03:05.370 --> 00:03:06.850
And I'm going to get another
really great scale that I

00:03:06.850 --> 00:03:09.516
really believe is great, and I'm
going to measure it there, too.

00:03:09.516 --> 00:03:11.350
So I've got a lot of
repeats of measuring

00:03:11.350 --> 00:03:13.270
the weight of this
kilogram, and I

00:03:13.270 --> 00:03:15.950
believe it's really a kilogram.

00:03:15.950 --> 00:03:18.340
But the stupid measurements
don't say a kilogram.

00:03:18.340 --> 00:03:25.070
They say, you know, 1.0003,
0.99995, all kinds of numbers

00:03:25.070 --> 00:03:27.610
not equal to one kilogram.

00:03:27.610 --> 00:03:33.280
So now I'm going to
try to figure out

00:03:33.280 --> 00:03:36.340
what the probability is
that it would have measured

00:03:36.340 --> 00:03:38.180
some particular value y.

00:03:38.180 --> 00:03:58.230
So what is the probability
that my experimental

00:03:58.230 --> 00:04:01.960
is between some value,
say, y and y plus dy.

00:04:04.990 --> 00:04:06.420
So that's a question for you.

00:04:06.420 --> 00:04:07.930
So what's the probability?

00:04:19.700 --> 00:04:20.915
Sorry, what?

00:04:20.915 --> 00:04:22.590
[INAUDIBLE]

00:04:22.590 --> 00:04:29.704
OK, so we think that
the probability that y

00:04:29.704 --> 00:04:37.815
is in this interval given
that the model is true.

00:04:43.050 --> 00:04:45.420
And I know the theta
values perfectly,

00:04:45.420 --> 00:04:49.220
and I know the x
values perfectly,

00:04:49.220 --> 00:04:51.590
is equal to some
integral of what?

00:04:55.100 --> 00:04:59.820
The bounds integral, probably
y to y plus dy, believe that?

00:05:02.520 --> 00:05:03.791
What's the integrand?

00:05:11.660 --> 00:05:13.220
Sorry, what?

00:05:13.220 --> 00:05:16.040
AUDIENCE: The probability
[INAUDIBLE] AC function of y.

00:05:16.040 --> 00:05:17.540
WILLIAM GREEN:
Right, so what is it?

00:05:20.071 --> 00:05:22.043
AUDIENCE: [INAUDIBLE]

00:05:22.043 --> 00:05:23.760
You wrote it down
last time, I think.

00:05:33.260 --> 00:05:36.340
So this [INAUDIBLE] is large?

00:05:36.340 --> 00:05:37.380
Standard normal, right?

00:05:37.380 --> 00:05:42.880
So it should be one
over sigma root of 2 pi.

00:06:00.220 --> 00:06:01.090
Does that sound OK?

00:06:06.510 --> 00:06:07.010
I mean here.

00:06:10.794 --> 00:06:12.674
It's probably the
same, it's fine.

00:06:20.578 --> 00:06:21.673
Yep.

00:06:21.673 --> 00:06:23.048
AUDIENCE: What
does that notation

00:06:23.048 --> 00:06:26.602
mean, if your model is true?

00:06:26.602 --> 00:06:29.060
WILLIAM GREEN: So this means,
given that the model is true,

00:06:29.060 --> 00:06:31.434
and I know these data values
are exactly certain numbers,

00:06:31.434 --> 00:06:34.040
and the x values are actually
certain numbers, what's

00:06:34.040 --> 00:06:35.870
the probability
that I would make

00:06:35.870 --> 00:06:40.220
a measurement whose average
would fall in this interval?

00:06:40.220 --> 00:06:42.139
So this line means
given that this is true,

00:06:42.139 --> 00:06:43.430
what's the probability of that?

00:06:48.070 --> 00:06:50.020
OK, is this right?

00:06:50.020 --> 00:06:52.572
Is this surprising?

00:06:52.572 --> 00:06:54.412
This is OK?

00:06:54.412 --> 00:06:55.370
So this is what I mean.

00:06:55.370 --> 00:06:58.160
So we say that our probability
distribution converges

00:06:58.160 --> 00:07:00.462
to a Gaussian distribution,
this is what we expect.

00:07:00.462 --> 00:07:02.170
So we expect it to
have been large enough

00:07:02.170 --> 00:07:03.003
for this to be true.

00:07:06.130 --> 00:07:07.692
Yeah?

00:07:07.692 --> 00:07:08.650
This is very important.

00:07:08.650 --> 00:07:10.511
This is like the whole
course, actually.

00:07:10.511 --> 00:07:12.510
This is the whole section,
is this one equation.

00:07:12.510 --> 00:07:15.340
So I just wanted to make sure
you really get what this says.

00:07:15.340 --> 00:07:16.930
And if you don't
like the integral,

00:07:16.930 --> 00:07:18.721
you can make dy really
small, and then it's

00:07:18.721 --> 00:07:19.650
just this times dy.

00:07:23.506 --> 00:07:24.006
OK?

00:07:28.850 --> 00:07:31.050
Actually, this notation's
like [INAUDIBLE] I

00:07:31.050 --> 00:07:33.220
think I should do this way.

00:07:33.220 --> 00:07:35.492
I should do this.

00:07:35.492 --> 00:07:36.200
This is a number.

00:07:38.731 --> 00:07:39.980
Let's get rid of the integral.

00:07:39.980 --> 00:07:42.240
Let's make dy really small.

00:07:42.240 --> 00:07:43.774
I'll make it [INAUDIBLE].

00:07:48.596 --> 00:07:49.220
That all right?

00:07:52.470 --> 00:07:55.510
So this is the probability
density that we would observe,

00:07:55.510 --> 00:07:59.820
this is the experimental value
y that we observe from the mean,

00:07:59.820 --> 00:08:05.204
and this is the little with
of our tiny little interval.

00:08:05.204 --> 00:08:06.350
Is that all right?

00:08:10.570 --> 00:08:11.070
Yes?

00:08:11.070 --> 00:08:14.353
AUDIENCE: So is sigma
the [INAUDIBLE] on there?

00:08:14.353 --> 00:08:16.231
WILLIAM GREEN:
Ah, what is sigma?

00:08:16.231 --> 00:08:17.230
That's a great question.

00:08:17.230 --> 00:08:18.729
We didn't write
down what sigma was.

00:08:18.729 --> 00:08:19.423
What is sigma?

00:08:19.423 --> 00:08:21.235
AUDIENCE: Standard deviation?

00:08:21.235 --> 00:08:23.526
WILLIAM GREEN: It's not the
standard deviation exactly.

00:08:23.526 --> 00:08:25.650
Standard deviation
of the mean, right?

00:08:25.650 --> 00:08:28.170
So there's two sigmas.

00:08:28.170 --> 00:08:32.049
We have the sigma of
y, of the measurements,

00:08:32.049 --> 00:08:42.559
and that's equal to average
value of y squared minus.

00:08:49.430 --> 00:08:51.680
So we just figure that for
how many experiments we do,

00:08:51.680 --> 00:08:54.030
we just compute the average
of y squared, the average y,

00:08:54.030 --> 00:08:55.180
subtract them.

00:08:55.180 --> 00:08:56.340
That's the variance.

00:08:56.340 --> 00:08:58.550
And then sigma that I
used in that equation

00:08:58.550 --> 00:09:04.580
there is 1 over n times sigma y.

00:09:04.580 --> 00:09:08.160
And we call this the
variation of the mean,

00:09:08.160 --> 00:09:11.987
it's the uncertainty
in the mean value of y.

00:09:11.987 --> 00:09:13.445
And the central
limits theorem said

00:09:13.445 --> 00:09:15.230
that as long as n
gets really large,

00:09:15.230 --> 00:09:18.500
we expect that this
should converge to this.

00:09:18.500 --> 00:09:21.690
And we talked last time
about how when n get bigger,

00:09:21.690 --> 00:09:24.710
these averages don't really
change when it gets big.

00:09:24.710 --> 00:09:25.910
They're just the average.

00:09:28.580 --> 00:09:30.620
But this number declined
as n gets big because

00:09:30.620 --> 00:09:34.390
of this one over n formula.

00:09:34.390 --> 00:09:39.170
And to understand that,
suppose I measure the weight,

00:09:39.170 --> 00:09:43.080
and I measure, it should
be around one kilogram.

00:09:43.080 --> 00:09:46.430
But in fact my measurements
are all over here.

00:09:46.430 --> 00:09:50.610
Lots of measurements.

00:09:50.610 --> 00:09:54.020
So they have a variance
something like this.

00:09:54.020 --> 00:09:56.060
But if I make a plot of--

00:09:59.650 --> 00:10:03.370
as I run, I compute
the running average.

00:10:03.370 --> 00:10:08.470
So when I run the
first two points,

00:10:08.470 --> 00:10:10.960
I get some average value here.

00:10:10.960 --> 00:10:15.370
After I run 27 points more,
the average value is here.

00:10:15.370 --> 00:10:18.700
After I run 1,000 repeats,
the average value is here.

00:10:18.700 --> 00:10:20.117
It's getting pretty
close to this,

00:10:20.117 --> 00:10:21.616
and the uncertainty
in this number's

00:10:21.616 --> 00:10:23.350
getting smaller and
smaller as I'm doing

00:10:23.350 --> 00:10:25.120
better and better averages.

00:10:25.120 --> 00:10:26.900
The average more
and more repeats.

00:10:26.900 --> 00:10:29.054
Does that make sense?

00:10:29.054 --> 00:10:31.590
OK.

00:10:31.590 --> 00:10:37.910
So from this key equation, I
can derive a lot of things.

00:10:37.910 --> 00:10:39.510
And it depends what
you want to do.

00:10:39.510 --> 00:10:44.220
So one thing people do a lot is
it was called model validation.

00:10:50.060 --> 00:10:51.180
And what does this mean?

00:10:51.180 --> 00:10:54.890
It means I have a model,
I believe it's true.

00:10:54.890 --> 00:10:57.420
I have some parameters,
I believe they're true.

00:10:57.420 --> 00:10:59.420
But there are some
foolish skeptics out there

00:10:59.420 --> 00:11:01.400
who don't have the
faith that I do.

00:11:01.400 --> 00:11:04.520
And they think that my model's
baloney, or my parameter

00:11:04.520 --> 00:11:06.440
values are wrong, or something.

00:11:06.440 --> 00:11:10.520
And so to prove I'm right, I'm
going to make some experiments.

00:11:10.520 --> 00:11:12.020
And I'm going to
show that I make

00:11:12.020 --> 00:11:16.340
a plot that looks like the
experiment and model agree.

00:11:16.340 --> 00:11:18.795
Some of you might have done
this in your life, yes?

00:11:18.795 --> 00:11:21.920
Everybody might make a
parity plot or something.

00:11:21.920 --> 00:11:24.830
You've seen these things before.

00:11:24.830 --> 00:11:28.919
Now, this is like a
confidence builder.

00:11:28.919 --> 00:11:30.710
You're trying to get
the skeptics out there

00:11:30.710 --> 00:11:33.560
to believe that
there's some evidence

00:11:33.560 --> 00:11:38.120
to back up your faith that
this model is perfect.

00:11:38.120 --> 00:11:40.490
And what you really
want to know is

00:11:40.490 --> 00:11:45.200
like, if the measurement
that I measure,

00:11:45.200 --> 00:11:49.050
the average for my 10,000
repeated measurements,

00:11:49.050 --> 00:11:54.720
I expect that this quantity
should be pretty big somehow

00:11:54.720 --> 00:11:56.294
in some way.

00:11:56.294 --> 00:11:57.710
By then quantitatively
saying what

00:11:57.710 --> 00:12:02.124
that means exactly what's a
good fit, what's a bad fit,

00:12:02.124 --> 00:12:04.040
this is actually kind
of a difficult question,

00:12:04.040 --> 00:12:05.510
and we'll come back to this one.

00:12:05.510 --> 00:12:09.470
But that's a very common
use of this equation

00:12:09.470 --> 00:12:13.200
is to try to do validation.

00:12:13.200 --> 00:12:15.400
Now because it's
kind of complicated,

00:12:15.400 --> 00:12:17.250
most people don't
actually do it.

00:12:17.250 --> 00:12:19.530
So instead what they
do is they just plot

00:12:19.530 --> 00:12:22.860
some data points, and they
plot your model curve.

00:12:22.860 --> 00:12:26.040
And as long as they look
good, then you're done.

00:12:26.040 --> 00:12:30.990
So that's the normal way that
it's done in the literature

00:12:30.990 --> 00:12:31.810
currently.

00:12:31.810 --> 00:12:33.810
But of course, that's
completely unquantitative.

00:12:33.810 --> 00:12:36.690
It doesn't really say whether
the model and the data

00:12:36.690 --> 00:12:39.750
really agree, it just means they
look sort of like each other.

00:12:39.750 --> 00:12:41.615
So that's like a human
qualitative thing.

00:12:41.615 --> 00:12:42.990
Now, if the purpose
of validation

00:12:42.990 --> 00:12:48.270
is just to convince humans,
then you've done the purpose.

00:12:48.270 --> 00:12:50.670
Now, if your purpose is to
try to quantitatively say

00:12:50.670 --> 00:12:51.630
something, then you
really have to get

00:12:51.630 --> 00:12:53.610
into this equation,
which usually is not done

00:12:53.610 --> 00:12:56.770
but would be the right
thing to do for validation.

00:12:56.770 --> 00:13:01.740
Now the alternative view
is disproving a model.

00:13:08.636 --> 00:13:14.200
But I just say that there's
several ways this can happen.

00:13:14.200 --> 00:13:16.500
You can try to disprove a
model, but you might also

00:13:16.500 --> 00:13:20.648
show that the theta
values are incorrect.

00:13:27.360 --> 00:13:31.780
Or you might show that
the experiment is wrong.

00:13:41.910 --> 00:13:43.410
These are all
possibilities, reasons

00:13:43.410 --> 00:13:48.330
why the model and the data
might not agree with each other.

00:13:48.330 --> 00:13:52.820
So this equation, it only holds
if the model is really true,

00:13:52.820 --> 00:13:55.470
if the parameter values
are all perfectly correct,

00:13:55.470 --> 00:14:00.790
if we know exactly what all
the knob values are perfectly.

00:14:00.790 --> 00:14:02.700
If any of those
things are not true,

00:14:02.700 --> 00:14:05.240
then you should have
some discrepancy,

00:14:05.240 --> 00:14:07.270
and there should be
a way to show it.

00:14:07.270 --> 00:14:08.820
And really what
you're showing is

00:14:08.820 --> 00:14:13.650
that you'd observed
some y that is

00:14:13.650 --> 00:14:15.270
very unlikely to be observed.

00:14:15.270 --> 00:14:18.720
So probably observing that
y is very extremely unlikely

00:14:18.720 --> 00:14:20.470
if all these other
things were true.

00:14:20.470 --> 00:14:24.210
So if all these things are true,
and you compute this value,

00:14:24.210 --> 00:14:26.490
and this value's
very tiny, then it

00:14:26.490 --> 00:14:29.910
makes you think that
it's unlikely that you

00:14:29.910 --> 00:14:31.140
would have observed that.

00:14:31.140 --> 00:14:33.180
And therefore, you might try to
use that as an argument to say

00:14:33.180 --> 00:14:34.290
that something must be wrong.

00:14:34.290 --> 00:14:36.840
The model's wrong, parameters
are wrong, the knobs are wrong,

00:14:36.840 --> 00:14:39.180
something's wrong.

00:14:39.180 --> 00:14:40.970
My y values are wrong.

00:14:40.970 --> 00:14:42.870
It could be any of those things.

00:14:42.870 --> 00:14:46.599
So this is often the most
exciting papers to publish.

00:14:46.599 --> 00:14:49.140
You publish a paper, you take
some model that a lot of people

00:14:49.140 --> 00:14:50.030
believe.

00:14:50.030 --> 00:14:52.810
You tell them they're full of
baloney, it's completely wrong.

00:14:52.810 --> 00:14:56.940
My great experiment shows
you are completely wrong.

00:14:56.940 --> 00:14:59.495
And so you'll see a
lot of these in Nature.

00:14:59.495 --> 00:15:00.870
I should warn you,
a lot of those

00:15:00.870 --> 00:15:06.780
get retracted later, a very
high retraction rate in Nature.

00:15:06.780 --> 00:15:08.250
Because they want
to publish papers

00:15:08.250 --> 00:15:11.760
like that that show that the
common view is incorrect,

00:15:11.760 --> 00:15:12.900
and sometimes it's true.

00:15:12.900 --> 00:15:14.430
But oftentimes the common
view is actually correct,

00:15:14.430 --> 00:15:15.730
and there's something
wrong with the experiment,

00:15:15.730 --> 00:15:18.146
or the interpretation, or how
they computed this equation,

00:15:18.146 --> 00:15:19.040
or whatever.

00:15:19.040 --> 00:15:21.770
And so actually it turns out the
common view is perfectly fine,

00:15:21.770 --> 00:15:26.280
and it's just that the foolish
authors went off on a tangent.

00:15:26.280 --> 00:15:28.080
And then they have
to six months later

00:15:28.080 --> 00:15:29.790
publish a retraction,
by the way, sorry,

00:15:29.790 --> 00:15:31.322
paper was completely wrong.

00:15:31.322 --> 00:15:32.530
And so you see a lot of that.

00:15:36.570 --> 00:15:39.210
So that's a second
kind of thing.

00:15:39.210 --> 00:15:42.870
And we'll talk more about
that a little bit later, too.

00:15:42.870 --> 00:15:51.920
And then another thing is
I'll relax my assumptions.

00:15:51.920 --> 00:15:56.360
So I'll say, well, I'm sure
that the model is true,

00:15:56.360 --> 00:15:59.540
and I'm sure that my knob
settings are perfect,

00:15:59.540 --> 00:16:01.530
and I know what they are.

00:16:01.530 --> 00:16:04.460
But I'm not really sure
about all the parameters.

00:16:04.460 --> 00:16:06.710
And therefore I want
to use the experiment

00:16:06.710 --> 00:16:10.134
to try to refine
parameter values.

00:16:19.580 --> 00:16:22.910
So I'm trying to take my y's
that I measure and somehow

00:16:22.910 --> 00:16:26.630
infer something
about the thetas.

00:16:26.630 --> 00:16:29.680
And this is a very
common thing to do.

00:16:29.680 --> 00:16:33.570
So in my group we've tried to
measure the rate coefficient

00:16:33.570 --> 00:16:35.220
for a reaction.

00:16:35.220 --> 00:16:37.297
We believe there is
value of that theta,

00:16:37.297 --> 00:16:39.630
and in fact, we probably have
an estimate of what it is.

00:16:39.630 --> 00:16:41.540
But we're not sure
of the exact number,

00:16:41.540 --> 00:16:43.790
and we'd like to do an
experiment to refine the number

00:16:43.790 --> 00:16:45.600
and get it more
accurately determined.

00:16:45.600 --> 00:16:49.150
So that's another
useful thing to do.

00:16:49.150 --> 00:16:54.100
And this leads into two somewhat
different points of view

00:16:54.100 --> 00:16:55.870
about this.

00:16:55.870 --> 00:16:59.760
One you've probably done already
called least squares fitting.

00:16:59.760 --> 00:17:00.820
That's one view.

00:17:00.820 --> 00:17:03.580
And the other is
this Bayesian view

00:17:03.580 --> 00:17:06.119
that I'll tell you about next.

00:17:06.119 --> 00:17:10.609
So there's sort of
A and B. There's

00:17:10.609 --> 00:17:14.920
one that I'll call Bayesian,
and one I'll call least squares.

00:17:18.764 --> 00:17:20.680
They're sort of related
to each other, but not

00:17:20.680 --> 00:17:22.544
exactly the same conceptually.

00:17:22.544 --> 00:17:23.710
So I'll try to explain that.

00:17:27.069 --> 00:17:30.190
So the Bayesian view
is probabilistic,

00:17:30.190 --> 00:17:38.250
so it's actually pretty
straightforward to write down.

00:17:38.250 --> 00:17:49.170
Remember that we wrote that
the probability of A and B

00:17:49.170 --> 00:17:52.880
is equal to the
probability of A times

00:17:52.880 --> 00:17:58.980
the probability of B
given A, and it's also

00:17:58.980 --> 00:18:02.646
equal to the probability
of B times the probability

00:18:02.646 --> 00:18:07.650
of A given B. And
what we have here is

00:18:07.650 --> 00:18:09.440
one of these conditional
probabilities,

00:18:09.440 --> 00:18:11.330
if the thetas have
a certain value,

00:18:11.330 --> 00:18:13.290
this is a certain probability.

00:18:13.290 --> 00:18:16.410
So I should be able to
use that formula somehow.

00:18:16.410 --> 00:18:21.770
So I can write down that
the probability of measuring

00:18:21.770 --> 00:18:35.920
y given theta is equal to
the probability of y times

00:18:35.920 --> 00:18:43.280
the probability of
theta given y divided

00:18:43.280 --> 00:18:49.960
by the probability of theta.

00:18:49.960 --> 00:18:52.460
So I just took this formula,
and I plugged in y's and thetas

00:18:52.460 --> 00:18:54.710
instead of A's and B's.

00:18:54.710 --> 00:18:57.770
So I said, these two
are equal to each other.

00:18:57.770 --> 00:18:59.510
Rearranged it so then
I can rewrite this.

00:18:59.510 --> 00:19:01.700
This is the way we have
it in here, probably

00:19:01.700 --> 00:19:03.800
of measuring y given theta.

00:19:03.800 --> 00:19:05.370
Let's flip it around.

00:19:05.370 --> 00:19:11.000
So probability of theta
given that we measured

00:19:11.000 --> 00:19:21.740
y is equal to the
probability of theta times

00:19:21.740 --> 00:19:28.430
the probability of observing
y if theta was true

00:19:28.430 --> 00:19:31.823
divided by the probability of y.

00:19:36.725 --> 00:19:37.850
Terrible handwriting there.

00:19:41.470 --> 00:19:43.147
That's just algebra.

00:19:43.147 --> 00:19:44.480
So this is what we want to know.

00:19:44.480 --> 00:19:46.646
We want to know, what's the
probability distribution

00:19:46.646 --> 00:19:48.090
of the parameter values y?

00:19:48.090 --> 00:19:50.490
Because some of
them are uncertain.

00:19:50.490 --> 00:19:52.680
Now, before we started
the experiment,

00:19:52.680 --> 00:19:56.220
we had some idea of
what the ranges were

00:19:56.220 --> 00:19:59.340
for all the primary values.

00:19:59.340 --> 00:20:01.620
Like I'm trying to measure
a rate coefficient.

00:20:01.620 --> 00:20:06.000
I know from experience with
other similar reactions,

00:20:06.000 --> 00:20:08.550
from a quantum
chemistry calculation,

00:20:08.550 --> 00:20:12.420
from some indirect evidence,
from some other more

00:20:12.420 --> 00:20:14.910
complicated experiment,
I have some idea

00:20:14.910 --> 00:20:17.274
that this rate coefficient
has to be in a certain range.

00:20:17.274 --> 00:20:18.690
Now, it could be
pretty uncertain.

00:20:18.690 --> 00:20:21.150
It might be five orders
of magnitude uncertain.

00:20:21.150 --> 00:20:23.580
But I know it's not less than 0.

00:20:23.580 --> 00:20:26.730
I know it can't be faster
than the diffusion limit, how

00:20:26.730 --> 00:20:28.590
fast things can come together.

00:20:28.590 --> 00:20:30.905
So for sure I know some
range, and oftentimes I

00:20:30.905 --> 00:20:32.640
know a much narrower
range than that.

00:20:32.640 --> 00:20:35.700
So I have some information
about these parameter values

00:20:35.700 --> 00:20:36.709
before I even start.

00:20:36.709 --> 00:20:38.250
Some of the parameters
of the model I

00:20:38.250 --> 00:20:39.960
know perfectly, or pretty well.

00:20:39.960 --> 00:20:43.090
So you know maybe there's a
Planck's constant, or the heat

00:20:43.090 --> 00:20:45.090
of formation of one of
my chemicals or something

00:20:45.090 --> 00:20:46.590
like that shows
up in the numbers,

00:20:46.590 --> 00:20:48.860
and I might know that parameter
really pretty accurately.

00:20:48.860 --> 00:20:51.026
Whereas the particular rate
coefficient I care about

00:20:51.026 --> 00:20:53.940
is the thing I really
don't know very well.

00:20:53.940 --> 00:20:57.180
So some of these have tight
probability distributions

00:20:57.180 --> 00:21:00.720
ahead of time, and some
of them have loose ones.

00:21:00.720 --> 00:21:01.900
And this thing has a name.

00:21:01.900 --> 00:21:02.816
It's called the prior.

00:21:05.670 --> 00:21:09.864
And it's our prior information
before we did the experiment.

00:21:09.864 --> 00:21:11.780
And this one, after we've
done the experiment,

00:21:11.780 --> 00:21:12.770
we're going to change it.

00:21:12.770 --> 00:21:14.395
So we're going to
say previously people

00:21:14.395 --> 00:21:17.474
thought that the parameters all
lied in these certain ranges.

00:21:17.474 --> 00:21:18.890
And now I'm going
to get a tighter

00:21:18.890 --> 00:21:20.973
range, because I have some
additional experimental

00:21:20.973 --> 00:21:22.050
information.

00:21:22.050 --> 00:21:23.430
So this is called the posterior.

00:21:28.650 --> 00:21:31.440
This means before,
it means after.

00:21:31.440 --> 00:21:33.330
So this is what I know
about parameter values

00:21:33.330 --> 00:21:36.920
before and after the experiment.

00:21:36.920 --> 00:21:41.050
This is the formula
that I have over there.

00:21:41.050 --> 00:21:44.624
It's a probability that if the
thetas had a certain value,

00:21:44.624 --> 00:21:46.290
probably would have
observed what I saw.

00:21:46.290 --> 00:21:46.790
Yeah?

00:21:46.790 --> 00:21:50.234
AUDIENCE: Which one
refers to which?

00:21:50.234 --> 00:21:53.420
WILLIAM GREEN: Sorry,
this is the prior,

00:21:53.420 --> 00:21:54.616
this is the posterior.

00:21:59.450 --> 00:22:02.115
And those of you who are
paying attention to notation

00:22:02.115 --> 00:22:03.740
realize I'm not doing
this very nicely.

00:22:03.740 --> 00:22:06.639
Because these are
continuous variables,

00:22:06.639 --> 00:22:08.180
and I'm writing
capital PRs, and they

00:22:08.180 --> 00:22:10.880
should be not be capital PRs,
it should be probably density

00:22:10.880 --> 00:22:12.516
functions instead.

00:22:12.516 --> 00:22:13.640
So let's rewrite it nicely.

00:22:18.140 --> 00:22:23.370
So the probability
of theta given y,

00:22:23.370 --> 00:22:26.780
probability density is equal
to the probability density

00:22:26.780 --> 00:22:37.422
of theta initially times the
probability of y given theta,

00:22:37.422 --> 00:22:41.470
[INAUDIBLE] density divided by
the probability density of--

00:22:44.900 --> 00:22:47.420
all right?

00:22:47.420 --> 00:22:48.920
And what I just
basically did was

00:22:48.920 --> 00:22:51.378
this is the correct equation
the previous other one was all

00:22:51.378 --> 00:22:54.807
multiplied by d theta dy, it
shouldn't be done that way.

00:22:54.807 --> 00:22:55.390
So this is OK?

00:22:59.640 --> 00:23:02.100
Now this is the
prior information

00:23:02.100 --> 00:23:04.670
I have about the
parameter values.

00:23:04.670 --> 00:23:08.010
I know that they have to
fall into some ranges.

00:23:08.010 --> 00:23:11.084
And really all I'm doing is I'm
correcting that information.

00:23:11.084 --> 00:23:13.500
I'm improving the information
to tighten the distribution.

00:23:13.500 --> 00:23:18.910
So initially I know that
my rate constant, here's

00:23:18.910 --> 00:23:20.700
my rate constant.

00:23:20.700 --> 00:23:24.727
I know that it's got to
be greater than zero,

00:23:24.727 --> 00:23:27.060
and I don't think it's really
down there at zero anyway.

00:23:27.060 --> 00:23:28.351
I think it's somewhere in here.

00:23:28.351 --> 00:23:29.670
I really don't know much.

00:23:29.670 --> 00:23:31.740
And I really don't think it's
all up at the diffusion limit,

00:23:31.740 --> 00:23:33.740
and no way it's higher
than the diffusion limit.

00:23:33.740 --> 00:23:35.910
So that's my initial
information that I

00:23:35.910 --> 00:23:41.340
have about the probability
distribution of K.

00:23:41.340 --> 00:23:43.800
So it's the rate
coefficient I want to know,

00:23:43.800 --> 00:23:45.480
and I know it's
bigger than zero,

00:23:45.480 --> 00:23:47.480
and I know it's
less than infinity.

00:23:47.480 --> 00:23:49.500
And actually I know there's
some physical limit,

00:23:49.500 --> 00:23:51.291
it can't be higher than
something or other.

00:23:53.509 --> 00:23:55.300
And you can do this
for any problem, right?

00:23:55.300 --> 00:23:56.990
I give you any
parameter, you should

00:23:56.990 --> 00:23:58.909
be able to tell me
something about it.

00:23:58.909 --> 00:24:00.950
You might be uncertain by
20 orders of magnitude,

00:24:00.950 --> 00:24:02.866
but at least you have
an error bar some width.

00:24:02.866 --> 00:24:04.742
It can't be anything, right?

00:24:04.742 --> 00:24:06.950
A lot of parameters have to
be positive, for example.

00:24:06.950 --> 00:24:08.390
You know that.

00:24:08.390 --> 00:24:10.017
And you usually know something.

00:24:10.017 --> 00:24:12.100
you might not think you
know anything, but you do,

00:24:12.100 --> 00:24:13.683
you actually do know
before you start.

00:24:13.683 --> 00:24:17.110
So you actually know, this is
the P of theta to start with.

00:24:19.880 --> 00:24:23.605
And after I've done
the experiment,

00:24:23.605 --> 00:24:25.980
hopefully you're going to know
more information about it.

00:24:25.980 --> 00:24:29.610
I might know that
this quantity here

00:24:29.610 --> 00:24:33.120
is going to be like a
Gaussian distribution.

00:24:33.120 --> 00:24:35.980
It might have a kind of
goofball dependence on theta.

00:24:35.980 --> 00:24:38.110
I should comment that.

00:24:38.110 --> 00:24:42.740
Notice how theta
appears inside F.

00:24:42.740 --> 00:24:44.680
So theta's up in the exponent.

00:24:44.680 --> 00:24:49.520
It's sort of inside a Gaussian,
but it's like processed by F

00:24:49.520 --> 00:24:53.240
and so the observable might have
a pretty goofball dependence

00:24:53.240 --> 00:24:55.240
on this rate coefficient.

00:24:55.240 --> 00:24:58.640
So this thing could
be some weird thing.

00:24:58.640 --> 00:25:03.549
But for sure, when I change
theta so this changes a lot,

00:25:03.549 --> 00:25:05.340
it's going to make a
pretty big difference.

00:25:05.340 --> 00:25:08.700
Because up inside the
exponent of a Gaussian,

00:25:08.700 --> 00:25:11.940
so it's going to drop
off a lot somewhere.

00:25:11.940 --> 00:25:15.370
So I should get something
that looks something like this

00:25:15.370 --> 00:25:18.210
maybe for my experiment.

00:25:18.210 --> 00:25:24.060
So this one is P of K
initially, the prior.

00:25:24.060 --> 00:25:28.010
This one is P of yk.

00:25:31.910 --> 00:25:34.010
And what this equation
says is I want

00:25:34.010 --> 00:25:37.330
to multiply those two together.

00:25:37.330 --> 00:25:39.240
And so I'm going to
multiply this times this,

00:25:39.240 --> 00:25:50.180
and I'm going to get
some new thing that's

00:25:50.180 --> 00:25:55.432
something like that when
I multiply this time that.

00:25:55.432 --> 00:25:58.340
Is that OK?

00:25:58.340 --> 00:26:03.090
And so that's my new
numerator of this equation.

00:26:03.090 --> 00:26:05.226
Now this denominator
doesn't make too much sense.

00:26:05.226 --> 00:26:06.600
This says, what's
the probability

00:26:06.600 --> 00:26:13.035
that I measured the mean
I measured, given nothing?

00:26:13.035 --> 00:26:14.910
So this is sort of like
the prior probability

00:26:14.910 --> 00:26:16.750
that I would have
measured it or something.

00:26:16.750 --> 00:26:17.833
I don't know what this is.

00:26:17.833 --> 00:26:22.100
So instead what people do,
is they say, forget this.

00:26:22.100 --> 00:26:27.234
But instead, let's multiply
this by a constant that's

00:26:27.234 --> 00:26:29.400
going to normalize it to
make it probability density

00:26:29.400 --> 00:26:31.370
so that it integrates to one.

00:26:34.370 --> 00:26:36.950
So that's the way
Bayes' theorem is used.

00:26:36.950 --> 00:26:38.540
This is called
Bayesian analysis.

00:26:42.140 --> 00:26:43.850
And so what it's
telling you is how

00:26:43.850 --> 00:26:49.220
to take your experimental
information as expressed

00:26:49.220 --> 00:26:59.010
in this formula and use all
your previous information

00:26:59.010 --> 00:27:03.030
about the parameters,
put them all together,

00:27:03.030 --> 00:27:06.490
now we have a cumulative
information about everything.

00:27:06.490 --> 00:27:09.760
So we have some parameters
that came into our problem

00:27:09.760 --> 00:27:12.370
into my experiment,
but from previous work,

00:27:12.370 --> 00:27:14.630
I also knew something
about those parameters.

00:27:14.630 --> 00:27:16.330
Now I put it all
together and I get

00:27:16.330 --> 00:27:19.180
a new value of
probability distribution

00:27:19.180 --> 00:27:21.020
of those parameters.

00:27:21.020 --> 00:27:23.530
And if my expert
was really good,

00:27:23.530 --> 00:27:27.237
it would make this really
tight [WHOOSHING SOUND]..

00:27:27.237 --> 00:27:29.070
And then when I multiply
these two together,

00:27:29.070 --> 00:27:32.470
it's going to make
this really sharp,

00:27:32.470 --> 00:27:35.380
and we have a really
good value of k.

00:27:35.380 --> 00:27:37.250
So that's like the
ideal case if I

00:27:37.250 --> 00:27:39.440
have a really great,
well-designed experiment

00:27:39.440 --> 00:27:44.090
executed perfectly with great
precision, then I can do this.

00:27:44.090 --> 00:27:46.214
More generally, when I
don't think about it,

00:27:46.214 --> 00:27:47.630
I get some
distribution like this.

00:27:50.210 --> 00:27:52.830
I still learn something
compared to what I had before,

00:27:52.830 --> 00:27:54.440
but it might not be much.

00:27:54.440 --> 00:27:57.410
So now I can end up with
some distribution that's

00:27:57.410 --> 00:27:59.630
a little tighter than before.

00:28:04.130 --> 00:28:07.190
So is this OK so far?

00:28:07.190 --> 00:28:11.040
All right, now this
is super simple.

00:28:11.040 --> 00:28:13.200
I didn't have to
solve anything, all

00:28:13.200 --> 00:28:17.120
I had to do was multiply
two distributions together.

00:28:17.120 --> 00:28:20.810
So in some respects, this is
what you should always do.

00:28:20.810 --> 00:28:22.962
All you do is you
take your experiment,

00:28:22.962 --> 00:28:25.170
you multiply the probability
distribution corresponds

00:28:25.170 --> 00:28:27.334
to your experiment
times the prior,

00:28:27.334 --> 00:28:29.750
and you get some posterior,
and that's why new information

00:28:29.750 --> 00:28:31.340
about the distribution.

00:28:31.340 --> 00:28:33.870
And if I have a
distribution like this,

00:28:33.870 --> 00:28:37.370
suppose this is my
new distribution here,

00:28:37.370 --> 00:28:41.240
I can still get it central
value, that's my mean value, k.

00:28:41.240 --> 00:28:44.010
I can get an estimate
of the range of k.

00:28:44.010 --> 00:28:45.780
So I end up with
a k plus or minus

00:28:45.780 --> 00:28:50.870
dk maybe, from just
looking at the plot.

00:28:50.870 --> 00:28:53.670
In fact, I never even have to
evaluate what this constant is

00:28:53.670 --> 00:28:54.540
in order to do this.

00:28:54.540 --> 00:28:57.830
I can just go look at the
plot, see where the peak is,

00:28:57.830 --> 00:29:01.030
figure out the width,
and I can report now

00:29:01.030 --> 00:29:03.460
because in my experiment,
k plus or minus

00:29:03.460 --> 00:29:05.590
dk is more precisely
determined than it was before.

00:29:09.150 --> 00:29:12.840
Now, a practical
challenge with this

00:29:12.840 --> 00:29:15.980
is that theta is usually
a lot of parameters.

00:29:15.980 --> 00:29:19.700
And I only drew the plot
here in one dimension,

00:29:19.700 --> 00:29:23.280
but really it's a
multi-dimensional plot.

00:29:23.280 --> 00:29:27.205
So really what looked like,
suppose I had two parameters.

00:29:27.205 --> 00:29:29.580
I had my k I care about, and
I have some other parameter,

00:29:29.580 --> 00:29:34.870
theta 2, that also it
shows up in my model.

00:29:34.870 --> 00:29:39.170
And say, before
I started, I knew

00:29:39.170 --> 00:29:43.570
theta 2 fell in
this kind of range,

00:29:43.570 --> 00:29:47.950
and I knew k fell in
this kind of range.

00:29:47.950 --> 00:29:51.880
So really before I started,
if I think what it looks like,

00:29:51.880 --> 00:29:57.440
I really had sort of a
blobby rectangular contour

00:29:57.440 --> 00:30:02.460
plot, where I think it's
more likely that the k

00:30:02.460 --> 00:30:05.506
value and the theta 2 value
are somewhere in this range.

00:30:05.506 --> 00:30:08.130
And the most likely one is maybe
somewhere in the middle there.

00:30:08.130 --> 00:30:09.780
But I really didn't know much.

00:30:09.780 --> 00:30:12.720
So it could be anywhere
in this whole blob.

00:30:12.720 --> 00:30:15.960
Now, when I do the experiment,
the experimental value

00:30:15.960 --> 00:30:19.240
depends of both k and theta 2.

00:30:19.240 --> 00:30:22.510
And commonly what'll happen
is that the distribution

00:30:22.510 --> 00:30:25.190
from the experiment--

00:30:25.190 --> 00:30:28.718
need color chalk here.

00:30:28.718 --> 00:30:31.100
Let's get rid of these guys.

00:30:31.100 --> 00:30:33.932
So this is my probably
distribution, there's my prior.

00:30:33.932 --> 00:30:37.850
If I do the experiment, maybe
I'll have something like this.

00:30:37.850 --> 00:30:40.510
That the experiment
says that the guys

00:30:40.510 --> 00:30:47.094
have to be somewhere in
the contour plot like this.

00:30:47.094 --> 00:30:48.510
Because I can get
pretty good fits

00:30:48.510 --> 00:30:50.080
of the data with
different values of k

00:30:50.080 --> 00:30:52.038
as long as I compensate
with the value theta 2.

00:30:54.780 --> 00:30:58.370
Now I multiply these two
dimensional functions.

00:30:58.370 --> 00:31:02.410
The original is a blob
function, and this

00:31:02.410 --> 00:31:05.020
is a stretched out blob.

00:31:05.020 --> 00:31:08.140
And I multiply a stretched
out blob times a fat blob,

00:31:08.140 --> 00:31:09.970
I get some stretched
out blob that

00:31:09.970 --> 00:31:12.170
looks something like the
intersection of these guys.

00:31:12.170 --> 00:31:14.587
And so I end up with some
kind of blob like that.

00:31:14.587 --> 00:31:15.670
I'll draw it really thick.

00:31:15.670 --> 00:31:21.940
So this is my posterior,
some kind of blob like this.

00:31:21.940 --> 00:31:25.060
So now I know a little bit
more about these two parameters

00:31:25.060 --> 00:31:29.370
than I did before I started
because of my experiment.

00:31:29.370 --> 00:31:30.417
Is this OK?

00:31:30.417 --> 00:31:32.500
I really can't say I know
what the real value of k

00:31:32.500 --> 00:31:34.005
is, or the value of theta 2.

00:31:34.005 --> 00:31:36.439
But I know that
combinations of k

00:31:36.439 --> 00:31:38.730
and theta 2 that are sort of
in this range, all of them

00:31:38.730 --> 00:31:40.438
will give me pretty
good fits to my data,

00:31:40.438 --> 00:31:43.500
and also be consistent with all
the previous information I have

00:31:43.500 --> 00:31:46.500
about those parameter values.

00:31:46.500 --> 00:31:48.880
Is that all right?

00:31:48.880 --> 00:31:51.394
Now, I drew it with
two parameters.

00:31:51.394 --> 00:31:53.560
In a lot of models we have,
we have five parameters,

00:31:53.560 --> 00:31:55.643
six parameters, seven
parameters, nine parameters,

00:31:55.643 --> 00:31:56.540
14 parameters.

00:31:56.540 --> 00:31:58.360
We have a lot of parameters.

00:31:58.360 --> 00:32:01.660
And so then we try to
make this plot, even how

00:32:01.660 --> 00:32:05.784
to display the plot is going
to be a little problematic.

00:32:05.784 --> 00:32:06.700
But it's there, right?

00:32:06.700 --> 00:32:10.630
And somehow, we still
narrowed down the hypervolume

00:32:10.630 --> 00:32:12.970
in the parameter
space from whatever

00:32:12.970 --> 00:32:15.580
it was to begin
with to now we know

00:32:15.580 --> 00:32:16.850
something a little bit better.

00:32:16.850 --> 00:32:19.389
We have a narrower range
of the parameters that

00:32:19.389 --> 00:32:21.680
would be consistent with all
the information available,

00:32:21.680 --> 00:32:24.400
including my new experiment.

00:32:24.400 --> 00:32:26.560
And then the next guy
does his experiment,

00:32:26.560 --> 00:32:29.350
and he does an experiment that
shows that these guys have

00:32:29.350 --> 00:32:32.320
to be somewhere in this
range in order to be

00:32:32.320 --> 00:32:34.570
consistent with his experiment.

00:32:34.570 --> 00:32:37.660
And so now I can
narrow down the range

00:32:37.660 --> 00:32:39.490
to be something like that.

00:32:39.490 --> 00:32:41.240
And the next person
does their experiment,

00:32:41.240 --> 00:32:42.540
and they get something
else, and something else,

00:32:42.540 --> 00:32:43.360
and something else.

00:32:43.360 --> 00:32:46.030
And eventually by 2050, we have
a pretty nice determination

00:32:46.030 --> 00:32:48.970
of the parameter values.

00:32:48.970 --> 00:32:51.760
So that's the
advance of science,

00:32:51.760 --> 00:32:55.212
as drawn in chalk by
Professor Green at the board.

00:32:59.890 --> 00:33:02.020
So this is a very
important way to think

00:33:02.020 --> 00:33:04.740
about it, is what you're
doing when you do experiments,

00:33:04.740 --> 00:33:09.730
is you're generally restricting
the range of parameter space

00:33:09.730 --> 00:33:12.030
that's still consistent
with everything.

00:33:12.030 --> 00:33:13.870
And when we mean
consistent, we mean

00:33:13.870 --> 00:33:16.286
that the probability that you
would have observed what you

00:33:16.286 --> 00:33:18.174
did observe is reasonably high.

00:33:18.174 --> 00:33:20.590
We'll still have to come back
to quantitatively figure out

00:33:20.590 --> 00:33:21.796
what reasonably high means.

00:33:25.600 --> 00:33:29.440
Now, when you did this
before when you were kids,

00:33:29.440 --> 00:33:31.450
nobody mentioned the
word Bayes, or Bayesian,

00:33:31.450 --> 00:33:34.630
or conditional
probabilities, right?

00:33:34.630 --> 00:33:38.077
So they just said, oh, just
do a least squares fit.

00:33:38.077 --> 00:33:39.410
How many of you did that before?

00:33:42.930 --> 00:33:45.660
So somebody told me
before, forget this stuff,

00:33:45.660 --> 00:33:47.790
we're going to never
even mention this stuff.

00:33:47.790 --> 00:33:49.581
We're just going to do
a least squares fit.

00:33:56.300 --> 00:33:59.060
Now, where did the least
squares fit idea came from?

00:33:59.060 --> 00:34:01.670
It came from looking at
this formula and saying,

00:34:01.670 --> 00:34:06.020
you know, this is the deviations
between the experiment

00:34:06.020 --> 00:34:11.719
and the model prediction,
and I weigh them somehow,

00:34:11.719 --> 00:34:13.061
and I have the square.

00:34:13.061 --> 00:34:14.810
And that's the thing
I want to make small.

00:34:14.810 --> 00:34:20.239
If I have a high probability
that what I observed really

00:34:20.239 --> 00:34:23.659
happened, or the probably
I'm going to observe this,

00:34:23.659 --> 00:34:25.550
it's got to be that
these guys have to be

00:34:25.550 --> 00:34:26.580
reasonably close to each other.

00:34:26.580 --> 00:34:27.610
They're really different.

00:34:27.610 --> 00:34:29.485
And it's going to be
very small, because it's

00:34:29.485 --> 00:34:30.770
inside an exponential.

00:34:30.770 --> 00:34:32.659
And if those guys
are really different,

00:34:32.659 --> 00:34:35.170
and the squared thing
is really large,

00:34:35.170 --> 00:34:37.210
then the probability
is incredibly small

00:34:37.210 --> 00:34:39.290
that I would have observed that.

00:34:39.290 --> 00:34:42.679
So we think that this
thing should be small.

00:34:42.679 --> 00:34:47.600
And in fact, if I want to get
the very best fit I can get,

00:34:47.600 --> 00:34:50.840
which means like the probability
was the highest of what

00:34:50.840 --> 00:34:54.460
I observed in the real
observation or something,

00:34:54.460 --> 00:34:56.492
then if I'm free to adjust
one of these thetas,

00:34:56.492 --> 00:34:57.950
I can adjust the
theta, try to make

00:34:57.950 --> 00:35:01.057
this thing like equal to
zero, or small as I can.

00:35:01.057 --> 00:35:03.390
So that's where the concept
of least squares comes from.

00:35:07.290 --> 00:35:11.360
Now, when you're
doing least squares,

00:35:11.360 --> 00:35:14.236
you almost always have
multiple parameters,

00:35:14.236 --> 00:35:16.610
and therefore you're going to
have to have multiple data.

00:35:16.610 --> 00:35:20.234
And they can't just be
a repeat of one number.

00:35:20.234 --> 00:35:22.400
Can't be your data, it's
not sufficient to determine

00:35:22.400 --> 00:35:23.870
the parameters.

00:35:23.870 --> 00:35:25.710
So normally when you
do an experiment,

00:35:25.710 --> 00:35:27.700
you have to change the knobs.

00:35:27.700 --> 00:35:30.660
We have to make measurements in
a couple different conditions.

00:35:30.660 --> 00:35:31.970
Like for example, kinetics.

00:35:31.970 --> 00:35:35.120
You often want the Arrhenius
A factor and the EA.

00:35:35.120 --> 00:35:37.092
And so I got to
run the experiment

00:35:37.092 --> 00:35:39.050
in more than one temperature
or I'm never going

00:35:39.050 --> 00:35:39.960
to be able to figure that out.

00:35:39.960 --> 00:35:42.080
So I have to change the
temperature in my reactor.

00:35:42.080 --> 00:35:43.829
Make some measurements
at one temperature,

00:35:43.829 --> 00:35:46.310
and make some measurements
at a different temperature.

00:35:46.310 --> 00:35:49.040
And for almost everything in
life that you want to measure,

00:35:49.040 --> 00:35:50.540
you're going to have to do this.

00:35:50.540 --> 00:35:52.940
You vary the concentration
of your enzyme

00:35:52.940 --> 00:35:55.796
if you want to see how the
enzyme kinetics depends

00:35:55.796 --> 00:35:57.170
on something. you
can't just keep

00:35:57.170 --> 00:35:59.630
running exactly the same
condition over and over.

00:35:59.630 --> 00:36:01.657
You'll get that
number really precise,

00:36:01.657 --> 00:36:03.740
but it's not enough
information to really fill out

00:36:03.740 --> 00:36:05.280
the parameters in your model.

00:36:05.280 --> 00:36:08.870
So you're going to have to run
several different experiments

00:36:08.870 --> 00:36:10.640
with different knob settings.

00:36:10.640 --> 00:36:13.920
Also, normally we don't just
measure one quantity, one

00:36:13.920 --> 00:36:15.170
observable in each experiment.

00:36:15.170 --> 00:36:17.790
We usually try to measure
as many things as we can.

00:36:17.790 --> 00:36:19.527
So we actually have
several observables

00:36:19.527 --> 00:36:21.860
at each knob setting, and we
have several knob settings,

00:36:21.860 --> 00:36:23.152
so we have quite a lot of data.

00:36:23.152 --> 00:36:25.485
And each one of those is
repeated a whole bunch of times

00:36:25.485 --> 00:36:28.400
so that we're confident that we
can use this Gaussian formula.

00:36:28.400 --> 00:36:39.120
And so what we really have is
the pi-th observable measured

00:36:39.120 --> 00:36:47.150
at the l-th knob position.

00:36:47.150 --> 00:36:48.740
Well, I'm sorry,
l's not good either,

00:36:48.740 --> 00:36:51.684
it's used in your notes
for something else.

00:36:51.684 --> 00:36:54.460
M, there you go.

00:36:54.460 --> 00:36:56.000
The m-th knob position.

00:36:56.000 --> 00:36:59.380
Now, normally you have several
knobs, so that's a factor.

00:36:59.380 --> 00:37:03.720
And we have a lot of observables
we can make at each position.

00:37:03.720 --> 00:37:08.340
So this thing is a measurement.

00:37:08.340 --> 00:37:12.650
And we repeated this multiple
times so I can get the average.

00:37:12.650 --> 00:37:20.960
And we're also going to have a
corresponding sigma I M, which

00:37:20.960 --> 00:37:25.230
is the variance of the mean.

00:37:25.230 --> 00:37:27.350
So it's variance, that's
going to be divided

00:37:27.350 --> 00:37:30.020
by the square root of
the number of repeats

00:37:30.020 --> 00:37:33.560
for that particular experiment
and that particular observable.

00:37:33.560 --> 00:37:36.260
So this is your
incoming data set,

00:37:36.260 --> 00:37:44.750
and you also have your model
which predicts y model,

00:37:44.750 --> 00:37:54.878
it predicts the observable i for
the sequel to fi of xk theta.

00:37:59.559 --> 00:38:01.100
So if you have
certain knob settings,

00:38:01.100 --> 00:38:04.500
like certain temperature, and
you have your parameter values,

00:38:04.500 --> 00:38:07.610
then you can calculate
what the model thinks

00:38:07.610 --> 00:38:09.895
should be the observable
value, and then you

00:38:09.895 --> 00:38:14.570
can actually measure it
and measure its variance.

00:38:14.570 --> 00:38:15.960
so that's the normal situation.

00:38:15.960 --> 00:38:19.250
And now you want to figure
out, are there some values

00:38:19.250 --> 00:38:24.789
of the theta that make the
model and the data agree?

00:38:24.789 --> 00:38:26.580
And that's the least
squares fitting thing.

00:38:26.580 --> 00:38:32.093
So what we can define
as a new quantity,

00:38:32.093 --> 00:38:43.170
weight of the residual vector
EJ, which is defined to be jk,

00:38:43.170 --> 00:38:45.220
to be consistent with
Joe Scott's notes.

00:38:45.220 --> 00:38:47.822
AUDIENCE: Is k the same as m?

00:38:47.822 --> 00:38:49.280
WILLIAM GREEN: M
is in oppositions,

00:38:49.280 --> 00:38:50.823
I'll tell you what
k is in a second.

00:39:00.000 --> 00:39:02.890
m, sorry.

00:39:02.890 --> 00:39:03.780
Too many indices.

00:39:17.390 --> 00:39:21.240
OK, so this is the residual
between the model position.

00:39:21.240 --> 00:39:25.500
And now-- oh man, I'm
sorry, [INAUDIBLE]..

00:39:25.500 --> 00:39:31.730
K is an index over i and m.

00:39:31.730 --> 00:39:34.127
So k is just going to
list all the data you got.

00:39:34.127 --> 00:39:36.210
Some of the data came from
the same knob settings,

00:39:36.210 --> 00:39:37.835
some came from
different knob settings.

00:39:37.835 --> 00:39:38.372
Yeah?

00:39:38.372 --> 00:39:42.660
AUDIENCE: So is x the m
the y model i [INAUDIBLE]??

00:39:42.660 --> 00:39:43.910
WILLIAM GREEN: Thank you, yes.

00:39:49.110 --> 00:39:57.480
y model i, I guess
this is now k.

00:40:06.280 --> 00:40:08.005
And so k is one of
these indices that

00:40:08.005 --> 00:40:10.630
carry-- you can bind two indices
together and put them together

00:40:10.630 --> 00:40:12.940
just like you did
in your PD problems.

00:40:12.940 --> 00:40:13.440
All right.

00:40:17.020 --> 00:40:20.541
Now, I wrote down this sigma.

00:40:20.541 --> 00:40:22.540
But actually if you're
measuring multiple things

00:40:22.540 --> 00:40:24.070
at the same
experiment, you should

00:40:24.070 --> 00:40:26.500
expect them to be correlated.

00:40:26.500 --> 00:40:29.010
So really what we
should worry about

00:40:29.010 --> 00:40:34.930
is the c, the covariance matrix,
that we defined last time.

00:40:34.930 --> 00:40:38.750
So you should also
compute that thing.

00:40:38.750 --> 00:40:45.170
And so what you should expect
is the probability density

00:40:45.170 --> 00:40:51.800
that we would measure
any particular residuals

00:40:51.800 --> 00:40:53.900
if the model is true.

00:40:53.900 --> 00:40:56.842
And if we have these
certain parameters, theta,

00:40:56.842 --> 00:41:02.715
this should be equal to
2 pi negative k over 2

00:41:02.715 --> 00:41:10.200
the determinant of c
negative 1.2 exponential

00:41:10.200 --> 00:41:22.180
of negative 1/2, epsilon
transpose c inverse epsilon.

00:41:22.180 --> 00:41:26.671
So this is the multi measurement
version of the same equation

00:41:26.671 --> 00:41:27.170
here.

00:41:31.420 --> 00:41:35.410
So this is the
quantity that we think

00:41:35.410 --> 00:41:40.150
should be small if we
have good parameter values

00:41:40.150 --> 00:41:42.019
and we did a good experiment.

00:41:42.019 --> 00:41:43.810
Actually, even when we
did bad experiments,

00:41:43.810 --> 00:41:46.210
still should be small if we
have good parameter values.

00:41:48.990 --> 00:41:51.360
And that's because the c's,
if we did a bad experiment,

00:41:51.360 --> 00:41:53.330
we'll have a high
variance or something,

00:41:53.330 --> 00:41:55.780
then we should see the c's
will give us weightings

00:41:55.780 --> 00:41:57.420
that will reflect that.

00:41:57.420 --> 00:41:58.580
Yeah?

00:41:58.580 --> 00:42:00.980
AUDIENCE: [INAUDIBLE]

00:42:00.980 --> 00:42:02.900
WILLIAM GREEN: Is that--

00:42:02.900 --> 00:42:07.700
AUDIENCE: So you have the next
[INAUDIBLE] K [INAUDIBLE]..

00:42:07.700 --> 00:42:13.410
WILLIAM GREEN: Oh I'm sorry,
this is the capital K,

00:42:13.410 --> 00:42:17.810
this is the number
of data points.

00:42:17.810 --> 00:42:24.748
So little k is equal
to 1 to capital K.

00:42:24.748 --> 00:42:28.140
AUDIENCE: So does capital K
count for both experiments?

00:42:28.140 --> 00:42:32.330
WILLIAM GREEN: It's the
number of distinct data values

00:42:32.330 --> 00:42:34.740
after you've already
averaged over repeats.

00:42:34.740 --> 00:42:39.960
So you do m experiments, at each
experiment you measure capital

00:42:39.960 --> 00:42:42.270
I observables.

00:42:42.270 --> 00:42:45.860
So it's like m times I, so
K. If you measured everything

00:42:45.860 --> 00:42:50.010
in every experiment,
it's equal to I times m.

00:42:59.060 --> 00:43:04.130
Now there's two ways that people
approach this in literature.

00:43:04.130 --> 00:43:05.720
The fancy way is
you say, you know,

00:43:05.720 --> 00:43:09.490
this covariance matrix comes
in in a pretty important way

00:43:09.490 --> 00:43:11.870
into this probability
distribution function.

00:43:11.870 --> 00:43:14.270
And so maybe I need to worry
a lot about whether I really

00:43:14.270 --> 00:43:16.500
know the covariance matrix.

00:43:16.500 --> 00:43:22.440
And my uncertainty in the
mean drops pretty fast

00:43:22.440 --> 00:43:26.020
as I do averaging, but
I'm not so confident

00:43:26.020 --> 00:43:29.350
that my answer in the
covariance matrix was small.

00:43:29.350 --> 00:43:31.980
So what people do
sometimes is they'll

00:43:31.980 --> 00:43:40.940
try to vary both c and theta,
and try to get a best fit where

00:43:40.940 --> 00:43:41.750
they're varying c.

00:43:41.750 --> 00:43:43.458
But then they have
additional constraints

00:43:43.458 --> 00:43:46.190
on c that c has to satisfy the
equations you gave last time

00:43:46.190 --> 00:43:49.384
about how you calculate the
covariance matrix from data.

00:43:49.384 --> 00:43:51.050
And so I was saying,
well, I want this c

00:43:51.050 --> 00:43:54.050
to personify these
equations pretty well,

00:43:54.050 --> 00:44:02.930
but true covariance of
the world of the system

00:44:02.930 --> 00:44:04.370
is not the same
as what I actually

00:44:04.370 --> 00:44:08.460
measure by just measuring, say,
five repeats of an experiment.

00:44:08.460 --> 00:44:10.722
And so I might
want to vary the c.

00:44:10.722 --> 00:44:14.120
You try to vary the c, turns out
to be kind of complicated math,

00:44:14.120 --> 00:44:15.770
so not many people do it.

00:44:15.770 --> 00:44:17.646
Even though conceptually
it makes some sense,

00:44:17.646 --> 00:44:19.686
you should worry about
the fact you're not really

00:44:19.686 --> 00:44:20.810
sure about the covariance.

00:44:20.810 --> 00:44:22.950
So what a lot of
people do is they say,

00:44:22.950 --> 00:44:25.460
let's just use the c that's
computed from the formulas

00:44:25.460 --> 00:44:27.180
I gave you last
time experimentally.

00:44:27.180 --> 00:44:34.190
So just say, let's just take c
experimental, put them in here.

00:44:34.190 --> 00:44:36.600
And now this is a constant.

00:44:36.600 --> 00:44:40.440
And now the only thing
that varies in this problem

00:44:40.440 --> 00:44:43.290
is thetas which come
into the epsilons.

00:44:43.290 --> 00:44:47.290
Because the epsilons
depend on theta.

00:44:47.290 --> 00:44:49.980
And so in that
case, I can just try

00:44:49.980 --> 00:44:53.390
to maximize this probability.

00:44:53.390 --> 00:44:55.990
And what that happens
to do is to minimize

00:44:55.990 --> 00:44:59.330
this quantity in the exponent.

00:44:59.330 --> 00:45:02.650
And so all I need to
do is say, for example,

00:45:02.650 --> 00:45:11.500
theta best is equal to arg
[INAUDIBLE] theta epsilon

00:45:11.500 --> 00:45:16.737
of theta c epsilon.

00:45:22.550 --> 00:45:24.550
And so this is the least
squares fitting problem

00:45:24.550 --> 00:45:26.320
that you guys have
probably done before.

00:45:26.320 --> 00:45:27.695
And probably what
you did was you

00:45:27.695 --> 00:45:30.580
assumed I had perfectly
uncorrelated data,

00:45:30.580 --> 00:45:32.170
and all my errors were the same.

00:45:32.170 --> 00:45:35.380
And so c, which is the identity
matrix, and I took it out.

00:45:35.380 --> 00:45:36.521
Probably did that before?

00:45:36.521 --> 00:45:39.290
Yeah, OK.

00:45:39.290 --> 00:45:42.710
That's pretty dangerous
to do, I'd say.

00:45:42.710 --> 00:45:45.400
What people do a lot, which is
a little bit less dangerous,

00:45:45.400 --> 00:45:47.480
is at least say,
well, you know, when

00:45:47.480 --> 00:45:53.540
I measure the concentration
of species x by GC,

00:45:53.540 --> 00:45:57.020
I have an error bar
of plus or minus 5%.

00:45:57.020 --> 00:46:00.080
And when I measure
the temperature

00:46:00.080 --> 00:46:02.525
with my thermocouple,
I have an error bar

00:46:02.525 --> 00:46:04.790
of plus or minus 2 degrees.

00:46:04.790 --> 00:46:08.360
And so the variances of these
guys should be a lot different,

00:46:08.360 --> 00:46:11.180
temperature and GC signal.

00:46:11.180 --> 00:46:14.895
And therefore I definitely need
to weight my deviation somehow.

00:46:14.895 --> 00:46:16.520
And it's really what
you do is you keep

00:46:16.520 --> 00:46:19.551
the diagonal entries of this.

00:46:19.551 --> 00:46:20.300
That's often done.

00:46:20.300 --> 00:46:23.570
And we just forget the fact
that they might be covariant.

00:46:23.570 --> 00:46:25.040
But if you've done
the experiments,

00:46:25.040 --> 00:46:26.310
you actually do have
enough information

00:46:26.310 --> 00:46:27.840
to compute this thing anyway,
so you might as well just

00:46:27.840 --> 00:46:29.068
use the experimental value.

00:46:33.860 --> 00:46:35.340
So this is the
least squares thing.

00:46:35.340 --> 00:46:38.502
And let's think, what
the heck is this doing?

00:46:38.502 --> 00:46:40.960
We're saying, all of a sudden
we grabbed all the parameters

00:46:40.960 --> 00:46:42.626
in the model, which
might include things

00:46:42.626 --> 00:46:45.940
like the molecular weight
of hydrogen or something.

00:46:45.940 --> 00:46:49.120
And we can find the
very best values

00:46:49.120 --> 00:46:52.445
that would make our data match
the experiment as best as

00:46:52.445 --> 00:46:52.945
possible.

00:46:55.630 --> 00:46:57.820
And in some sense,
that's great, we

00:46:57.820 --> 00:47:00.310
know the best values and
parameters for our experiment.

00:47:00.310 --> 00:47:02.170
But of course, if we vary the
molecular weight of hydrogen,

00:47:02.170 --> 00:47:04.330
it's going to screw up
somebody else's experiment.

00:47:04.330 --> 00:47:05.830
Because somebody else
did some other experiment

00:47:05.830 --> 00:47:07.913
that depended on the
molecular weight of hydrogen,

00:47:07.913 --> 00:47:09.980
and they had to get
some other value

00:47:09.980 --> 00:47:12.400
to match their experiment.

00:47:12.400 --> 00:47:15.820
So in these parameter
set, anything

00:47:15.820 --> 00:47:17.380
I do to vary those
parameters, I got

00:47:17.380 --> 00:47:20.209
to watch out that maybe
some of those parameters

00:47:20.209 --> 00:47:22.500
are involved with somebody
else's model and [INAUDIBLE]

00:47:22.500 --> 00:47:24.010
some other experiments.

00:47:24.010 --> 00:47:27.230
And I'm not really free
to vary them all freely.

00:47:27.230 --> 00:47:28.850
So this is the idea
from the Bayesian

00:47:28.850 --> 00:47:32.020
of having the
prior intimation is

00:47:32.020 --> 00:47:35.500
so you know some of the ranges
on these thetas already,

00:47:35.500 --> 00:47:38.454
and some of them you might know
really sharp distributions,

00:47:38.454 --> 00:47:40.870
like the molecular weight of
hydrogen. You might know that

00:47:40.870 --> 00:47:43.070
to a lot of decimal places.

00:47:43.070 --> 00:47:46.810
And so when people
do this, normally you

00:47:46.810 --> 00:47:49.120
don't vary all of the thetas.

00:47:49.120 --> 00:47:51.450
Usually what you do is
you select a set of thetas

00:47:51.450 --> 00:47:55.020
that you feel free to vary
because they're so uncertain,

00:47:55.020 --> 00:47:58.380
and other thetas that you think,
oh, I better not touch them.

00:47:58.380 --> 00:48:01.716
Because if I adjust
them, I may go

00:48:01.716 --> 00:48:03.840
to crazy values that are
inconsistent with somebody

00:48:03.840 --> 00:48:05.275
else's experiment.

00:48:05.275 --> 00:48:07.150
So a lot of times like
the molecular weights,

00:48:07.150 --> 00:48:08.080
you would not touch them.

00:48:08.080 --> 00:48:09.704
You would just say,
I got to just stick

00:48:09.704 --> 00:48:12.640
to the recommended
values and the tables.

00:48:12.640 --> 00:48:14.890
I'm not free to vary the
molecular weight of hydrogen,

00:48:14.890 --> 00:48:16.640
even though if I did
it would make my data

00:48:16.640 --> 00:48:18.460
match my experiment better.

00:48:18.460 --> 00:48:20.390
Makes my model
and the experiment

00:48:20.390 --> 00:48:22.520
match more precisely.

00:48:22.520 --> 00:48:25.450
So deciding which
parameters to vary in this

00:48:25.450 --> 00:48:27.490
is a really crucial thing.

00:48:30.610 --> 00:48:35.800
And that's a lot
of the art of doing

00:48:35.800 --> 00:48:39.430
this has to do with that issue.

00:48:39.430 --> 00:48:42.130
Also, you don't have
to keep the thetas

00:48:42.130 --> 00:48:43.310
in the form you have them.

00:48:43.310 --> 00:48:44.560
You could do a transformation.

00:48:44.560 --> 00:48:49.420
So you could change to, say,
W's that's equal to, say,

00:48:49.420 --> 00:48:52.540
some matrix times the
thetas, and I could express

00:48:52.540 --> 00:48:54.760
the equation in
terms of the W's.

00:48:54.760 --> 00:48:57.640
So I could transform my original
representational parameters

00:48:57.640 --> 00:48:59.670
as some other parameters.

00:48:59.670 --> 00:49:03.530
And often times,
your experiment might

00:49:03.530 --> 00:49:06.230
be really good at determining
some of these W's, even

00:49:06.230 --> 00:49:09.800
if it might be incapable of
determining any of the thetas.

00:49:09.800 --> 00:49:13.830
So you often might know
some linear combination

00:49:13.830 --> 00:49:15.830
of parameters, or maybe
not linear combinations,

00:49:15.830 --> 00:49:18.590
some non-linear
combination of parameters

00:49:18.590 --> 00:49:22.790
might actually be determinable
very well from your experiment,

00:49:22.790 --> 00:49:24.950
even though you can't
determine things separately.

00:49:24.950 --> 00:49:27.290
And this gets into the idea
of dimensionless numbers.

00:49:27.290 --> 00:49:30.680
So your experiment might depend
on some dimensionless number

00:49:30.680 --> 00:49:32.241
very sensitively.

00:49:32.241 --> 00:49:34.490
And you can be quite confident
from your external data

00:49:34.490 --> 00:49:36.657
what the value of that
dimensionless number must be.

00:49:36.657 --> 00:49:38.656
But if you look inside
the dimensionless number,

00:49:38.656 --> 00:49:40.350
it depends on a lot
of different things.

00:49:40.350 --> 00:49:41.990
And you might not have
any information about them

00:49:41.990 --> 00:49:42.770
separately.

00:49:42.770 --> 00:49:44.600
All you know is about
your experiment just

00:49:44.600 --> 00:49:48.410
tells you the value of that
one parameter very accurately.

00:49:48.410 --> 00:49:51.080
So this is another
big part of the art

00:49:51.080 --> 00:49:54.240
of doing the model
versus data is setting up

00:49:54.240 --> 00:49:58.050
your model in terms of
parameters that you really

00:49:58.050 --> 00:50:00.900
can determine, and getting
out all the ones you can't

00:50:00.900 --> 00:50:02.370
determine and fixing them.

00:50:02.370 --> 00:50:04.469
So we're really going
to generally change do

00:50:04.469 --> 00:50:05.260
this kind of thing.

00:50:05.260 --> 00:50:13.050
But we're going to say
that some thetas are fixed,

00:50:13.050 --> 00:50:21.780
and also we might change to
a different representation,

00:50:21.780 --> 00:50:26.780
change to W's instead.

00:50:26.780 --> 00:50:27.950
Yeah?

00:50:27.950 --> 00:50:30.800
AUDIENCE: Can you explain
where this transform--

00:50:30.800 --> 00:50:32.470
I don't really know
what's up with--

00:50:32.470 --> 00:50:34.220
WILLIAM GREEN: Yeah,
let's get an example.

00:50:34.220 --> 00:50:39.640
Suppose I was doing a reactor
that had A equilibrium of B.

00:50:39.640 --> 00:50:41.520
And I was really
interested in kf,

00:50:41.520 --> 00:50:44.720
the forward rate for A going
to B. I'm a kineticist,

00:50:44.720 --> 00:50:48.100
I love to know A
goes to B. However,

00:50:48.100 --> 00:50:50.210
if I setup the
experiment wrong, it

00:50:50.210 --> 00:50:53.140
might be that this readction
ran all the way to equilibrium.

00:50:53.140 --> 00:50:54.890
And what I see in the
products is actually

00:50:54.890 --> 00:50:57.620
just the equilibrium
ratio of A to B.

00:50:57.620 --> 00:51:03.630
So what I'm measuring might
be something that's dependent

00:51:03.630 --> 00:51:06.110
really on kf over kr, and
that might be the quantity

00:51:06.110 --> 00:51:07.136
I can really determine.

00:51:07.136 --> 00:51:08.635
Because that's
equilibrium constant.

00:51:11.160 --> 00:51:13.030
If I didn't think
about it, I could just

00:51:13.030 --> 00:51:16.950
try to have the model fitting
procedure, just optimize

00:51:16.950 --> 00:51:18.760
to find the very
best value of kf.

00:51:18.760 --> 00:51:21.700
And in that situation, it
might have a lot of trouble,

00:51:21.700 --> 00:51:24.130
because it might be
quite indeterminant what

00:51:24.130 --> 00:51:27.074
the kf is, because really all
that matters is the ratio.

00:51:27.074 --> 00:51:28.490
Also I think about
this some more,

00:51:28.490 --> 00:51:33.080
suppose I run at
short times, and I

00:51:33.080 --> 00:51:34.580
measure the time dependence.

00:51:34.580 --> 00:51:37.230
What I'm really
measuring is kf plus kr.

00:51:37.230 --> 00:51:39.490
Do you remember we
did the analysis of A

00:51:39.490 --> 00:51:43.930
goes to B, one of the
early homework problems?

00:51:43.930 --> 00:51:48.680
The time cost was actually kf
plus kr, not kf separately.

00:51:48.680 --> 00:51:51.210
And so if I measure
the exponential decay

00:51:51.210 --> 00:51:53.507
time constant, I'm really
determining kf plus kr,

00:51:53.507 --> 00:51:55.340
I might be able to
determine that very well.

00:51:55.340 --> 00:51:57.170
Actually, in my lab, I can
do a great job for this.

00:51:57.170 --> 00:51:58.670
I have an instrument
that can measure

00:51:58.670 --> 00:52:00.378
the time constant of
the exponential of k

00:52:00.378 --> 00:52:02.540
really precisely, but
it's determining the sum.

00:52:02.540 --> 00:52:05.170
It's not determining either
one of them separately.

00:52:05.170 --> 00:52:07.045
And I might have to do
a separate experiment,

00:52:07.045 --> 00:52:09.200
say a thermo-experiment
to get the ratio.

00:52:09.200 --> 00:52:12.000
And then from the two I can put
them together and get the two

00:52:12.000 --> 00:52:13.914
values distinctly.

00:52:13.914 --> 00:52:15.330
So this will be
an example of this

00:52:15.330 --> 00:52:24.760
would be a W. My W is kf plus
kr, the matrix would be 1001,

00:52:24.760 --> 00:52:26.410
something like that.

00:52:26.410 --> 00:52:29.220
1-1, something like that.

00:52:29.220 --> 00:52:32.860
Where I add these
two guys, kf, kr.

00:52:32.860 --> 00:52:36.280
These are my two
parameters, 1 plus 1.

00:52:36.280 --> 00:52:39.690
And I can determine W1
now very accurately.

00:52:39.690 --> 00:52:44.820
sorry, this is m, this is W.

00:52:44.820 --> 00:52:50.570
So now in terms of W, this has
two parameters now, W1 and W2.

00:52:50.570 --> 00:52:52.790
I can't determine W2
from my experiment,

00:52:52.790 --> 00:52:54.497
but I can determine
W1 really well.

00:52:54.497 --> 00:52:56.330
So then when I do the
least squares fitting,

00:52:56.330 --> 00:52:58.010
I should vary W1.

00:52:58.010 --> 00:53:00.350
I can fix it it for
my experimental data,

00:53:00.350 --> 00:53:02.400
and just leave W2
fixed at some value.

00:53:02.400 --> 00:53:05.470
I can't do anything about it.

00:53:05.470 --> 00:53:06.291
That all right?

00:53:09.060 --> 00:53:11.630
Now, do you get the difference
in these two points of view?

00:53:11.630 --> 00:53:16.650
This is like, two
completely different ways

00:53:16.650 --> 00:53:18.702
to look at the problem.

00:53:18.702 --> 00:53:20.660
You can think about it
as, these parameters are

00:53:20.660 --> 00:53:23.150
free for me to vary,
and I just have

00:53:23.150 --> 00:53:25.975
to be careful to select the
ones I'm really free to vary.

00:53:25.975 --> 00:53:28.100
And that's the least squares
fitting point of view.

00:53:28.100 --> 00:53:32.421
Or I could say, I'm not
really determining anything

00:53:32.421 --> 00:53:34.670
in particular, all I'm doing
is taking the whole range

00:53:34.670 --> 00:53:36.545
of uncertainty that we
have about parameters,

00:53:36.545 --> 00:53:41.080
and by my experiment, I narrowed
it down in the Bayesian view.

00:53:41.080 --> 00:53:42.790
So it's the two
different points of view.

00:53:42.790 --> 00:53:46.040
To do this one, I need to
make sure I have enough data

00:53:46.040 --> 00:53:48.204
to determine something.

00:53:48.204 --> 00:53:49.620
So I have to have
enough determine

00:53:49.620 --> 00:53:52.490
some parameter, at
least one, otherwise

00:53:52.490 --> 00:53:54.212
there's no point in doing this.

00:53:54.212 --> 00:53:56.420
This one I can do even if
I can't determine anything,

00:53:56.420 --> 00:54:00.230
because I could still narrow
down the range of parameters.

00:54:00.230 --> 00:54:04.490
But this might be harder
to report in a table.

00:54:04.490 --> 00:54:08.120
Because all I have at the end is
a new probably density function

00:54:08.120 --> 00:54:11.130
of multiple parameters.

00:54:11.130 --> 00:54:11.820
All right?

00:54:11.820 --> 00:54:13.470
OK, we're done.

00:54:13.470 --> 00:54:16.220
See you guys on Friday.