1
00:00:05,580 --> 00:00:20,180
[MUSIC PLAYING]

2
00:00:20,180 --> 00:00:23,920
PROFESSOR: Last time, we took a
look at an explicit control

3
00:00:23,920 --> 00:00:27,280
evaluator for Lisp, and that
bridged the gap between all

4
00:00:27,280 --> 00:00:30,460
these high-level languages
like Lisp and the query

5
00:00:30,460 --> 00:00:33,330
language and all of that stuff,
bridged the gap between

6
00:00:33,330 --> 00:00:36,640
that and a conventional
register machine.

7
00:00:36,640 --> 00:00:40,140
And in fact, you can think of
the explicit control evaluator

8
00:00:40,140 --> 00:00:44,650
either as, say, the code for a
Lisp interpreter if you wanted

9
00:00:44,650 --> 00:00:47,680
to implement it in the assembly
language of some

10
00:00:47,680 --> 00:00:50,120
conventional register transfer
machine, or, if you like, you

11
00:00:50,120 --> 00:00:52,770
can think of it as the microcode
of some machine

12
00:00:52,770 --> 00:00:55,340
that's going to be specially
designed to run Lisp.

13
00:00:55,340 --> 00:00:58,160
In either case, what we're
doing is we're taking a

14
00:00:58,160 --> 00:01:01,790
machine that speaks some
low-level language, and we're

15
00:01:01,790 --> 00:01:05,250
raising the machine to a
high-level language like Lisp

16
00:01:05,250 --> 00:01:08,230
by writing an interpreter.

17
00:01:08,230 --> 00:01:21,160
So for instance, here,
conceptually, is a special

18
00:01:21,160 --> 00:01:23,910
purpose machine for computing
factorials.

19
00:01:23,910 --> 00:01:29,000
It takes in five and
puts out 120.

20
00:01:29,000 --> 00:01:32,060
And what this special purpose
machine is is actually a Lisp

21
00:01:32,060 --> 00:01:38,410
interpreter that's configured
itself to run factorials,

22
00:01:38,410 --> 00:01:39,880
because you fit into
it a description of

23
00:01:39,880 --> 00:01:42,410
the factorial machine.

24
00:01:42,410 --> 00:01:43,610
So that's what an
interpreter is.

25
00:01:43,610 --> 00:01:47,320
It configures itself to emulate
a machine whose

26
00:01:47,320 --> 00:01:50,120
description you read in.

27
00:01:50,120 --> 00:01:52,110
Now, inside the Lisp
interpreter, what's that?

28
00:01:52,110 --> 00:01:54,860
Well, that might be your general
register language

29
00:01:54,860 --> 00:01:59,500
interpreter that configures
itself to behave like a Lisp

30
00:01:59,500 --> 00:02:01,360
interpreter, because you
put in a whole bunch of

31
00:02:01,360 --> 00:02:03,410
instructions in register
language.

32
00:02:03,410 --> 00:02:07,070
This is the explicit
control evaluator.

33
00:02:07,070 --> 00:02:09,300
And then it also has some sort
of library, a library of

34
00:02:09,300 --> 00:02:11,620
primitive operators and Lisp
operations and all sorts of

35
00:02:11,620 --> 00:02:12,780
things like that.

36
00:02:12,780 --> 00:02:17,350
That's the general strategy
of interpretation.

37
00:02:17,350 --> 00:02:19,420
And the point is, what we're
doing is we're writing an

38
00:02:19,420 --> 00:02:24,060
interpreter to raise the machine
to the level of the

39
00:02:24,060 --> 00:02:25,430
programs that we
want to write.

40
00:02:25,430 --> 00:02:27,850
Well, there's another strategy,
a different one,

41
00:02:27,850 --> 00:02:29,030
which is compilation.

42
00:02:29,030 --> 00:02:31,090
Compilation's a little
bit different.

43
00:02:31,090 --> 00:02:37,720
Here--here we might have
produced a special purpose

44
00:02:37,720 --> 00:02:44,430
machine for, for computing
factorials, starting with some

45
00:02:44,430 --> 00:02:46,450
sort of machine that speaks
register language, except

46
00:02:46,450 --> 00:02:47,870
we're going to do a different
strategy.

47
00:02:47,870 --> 00:02:51,680
We take our factorial program.

48
00:02:51,680 --> 00:02:53,780
We use that as the source
code into a compiler.

49
00:02:53,780 --> 00:02:57,090
What the compiler will do is
translate that factorial

50
00:02:57,090 --> 00:02:59,926
program into some register
machine language.

51
00:02:59,926 --> 00:03:03,110
And this will now be not the
explicit control evaluator for

52
00:03:03,110 --> 00:03:04,990
Lisp, this will be some
register language for

53
00:03:04,990 --> 00:03:06,760
computing factorials.

54
00:03:06,760 --> 00:03:10,460
So this is the translation
of that.

55
00:03:10,460 --> 00:03:14,690
That will go into some sort of
loader which will combine this

56
00:03:14,690 --> 00:03:17,520
code with code selected from the
library to do things like

57
00:03:17,520 --> 00:03:19,970
primitive multiplication.

58
00:03:19,970 --> 00:03:23,190
And then we'll produce a load
module which configures the

59
00:03:23,190 --> 00:03:25,760
register language machine
to be a special

60
00:03:25,760 --> 00:03:28,320
purpose factorial machine.

61
00:03:28,320 --> 00:03:29,905
So that's a, that's a
different strategy.

62
00:03:29,905 --> 00:03:33,740
In interpretation, we're raising
the machine to the

63
00:03:33,740 --> 00:03:35,360
level of our language,
like Lisp.

64
00:03:35,360 --> 00:03:38,580
In compilation, we're taking our
program and lowering it to

65
00:03:38,580 --> 00:03:42,040
the language that's spoken
by the machine.

66
00:03:42,040 --> 00:03:44,280
Well, how do these two
strategies compare?

67
00:03:44,280 --> 00:03:48,890
The compiler can produce code
that will execute more

68
00:03:48,890 --> 00:03:50,140
efficiently.

69
00:03:52,490 --> 00:03:56,820
The essential reason for that is
that if you think about the

70
00:03:56,820 --> 00:04:02,870
register operations that are
running, the interpreter has

71
00:04:02,870 --> 00:04:05,880
to produce register operations
which, in principle, are going

72
00:04:05,880 --> 00:04:10,260
to be general enough to execute
any Lisp procedure.

73
00:04:10,260 --> 00:04:12,680
Whereas the compiler only has
to worry about producing a

74
00:04:12,680 --> 00:04:16,029
special bunch of register
operations for, for doing the

75
00:04:16,029 --> 00:04:20,209
particular Lisp procedure
that you've compiled.

76
00:04:20,209 --> 00:04:23,340
Or another way to say that is
that the interpreter is a

77
00:04:23,340 --> 00:04:26,940
general purpose simulator, that
when you read in a Lisp

78
00:04:26,940 --> 00:04:29,820
procedure, then those can
simulate the program described

79
00:04:29,820 --> 00:04:31,160
by that, by that procedure.

80
00:04:31,160 --> 00:04:33,290
So the interpreter is worrying
about making a general purpose

81
00:04:33,290 --> 00:04:36,170
simulator, whereas the compiler,
in effect, is

82
00:04:36,170 --> 00:04:37,930
configuring the thing to
be the machine that the

83
00:04:37,930 --> 00:04:40,000
interpreter would have
been simulating.

84
00:04:40,000 --> 00:04:41,340
So the compiler can be faster.

85
00:04:52,830 --> 00:04:57,100
On the other hand, the
interpreter is a nicer

86
00:04:57,100 --> 00:04:59,340
environment for debugging.

87
00:04:59,340 --> 00:05:02,200
And the reason for that is that
we've got the source code

88
00:05:02,200 --> 00:05:02,960
actually there.

89
00:05:02,960 --> 00:05:03,740
We're interpreting it.

90
00:05:03,740 --> 00:05:06,010
That's what we're
working with.

91
00:05:06,010 --> 00:05:07,880
And we also have the
library around.

92
00:05:07,880 --> 00:05:10,150
See, the interpreter--the
library sitting there is part

93
00:05:10,150 --> 00:05:11,140
of the interpreter.

94
00:05:11,140 --> 00:05:13,660
The compiler only pulls out from
the library what it needs

95
00:05:13,660 --> 00:05:14,830
to run the program.

96
00:05:14,830 --> 00:05:18,710
So if you're in the middle of
debugging, and you might like

97
00:05:18,710 --> 00:05:21,730
to write a little extra program
to examine some run

98
00:05:21,730 --> 00:05:24,450
time data structure or to
produce some computation that

99
00:05:24,450 --> 00:05:25,990
you didn't think of when you
wrote the program, the

100
00:05:25,990 --> 00:05:28,390
interpreter can do that
perfectly well, whereas the

101
00:05:28,390 --> 00:05:29,670
compiler can't.

102
00:05:29,670 --> 00:05:31,850
So there are sort of dual,
dual advantages.

103
00:05:31,850 --> 00:05:34,720
The compiler will produce code
that executes faster.

104
00:05:34,720 --> 00:05:39,030
The interpreter is a better
environment for debugging.

105
00:05:39,030 --> 00:05:43,520
And most Lisp systems end up
having both, end up being

106
00:05:43,520 --> 00:05:45,860
configured so you have an
interpreter that you use when

107
00:05:45,860 --> 00:05:46,930
you're developing your code.

108
00:05:46,930 --> 00:05:49,060
Then you can speed it
up by compiling.

109
00:05:49,060 --> 00:05:51,720
And very often, you can arrange
that compiled code and

110
00:05:51,720 --> 00:05:54,810
interpreted code can
call each other.

111
00:05:54,810 --> 00:05:55,700
We'll see how to do that.

112
00:05:55,700 --> 00:05:56,950
That's not hard.

113
00:06:01,040 --> 00:06:02,290
In fact, the way we'll--

114
00:06:04,390 --> 00:06:06,580
in the compiler we're going to
make, the way we'll arrange

115
00:06:06,580 --> 00:06:08,952
for compiled coding and
interpreted code to call, to

116
00:06:08,952 --> 00:06:12,220
call each other, is that we'll
have the compiler use exactly

117
00:06:12,220 --> 00:06:14,320
the same register conventions
as the interpreter.

118
00:06:18,680 --> 00:06:23,900
Well, the idea of a compiler is
very much like the idea of

119
00:06:23,900 --> 00:06:25,490
an interpreter or evaluator.

120
00:06:25,490 --> 00:06:27,070
It's the same thing.

121
00:06:27,070 --> 00:06:31,460
See, the evaluator walks over
the code and performs some

122
00:06:31,460 --> 00:06:33,840
register operations.

123
00:06:33,840 --> 00:06:37,040
That's what we did yesterday.

124
00:06:37,040 --> 00:06:39,700
Well, the compiler essentially
would like to walk over the

125
00:06:39,700 --> 00:06:44,000
code and produce the register
operations that the evaluator

126
00:06:44,000 --> 00:06:48,890
would have done were it
evaluating the thing.

127
00:06:48,890 --> 00:06:52,000
And that gives us a model
for how to implement a

128
00:06:52,000 --> 00:06:57,150
zeroth-order compiler, a
very bad compiler but

129
00:06:57,150 --> 00:06:58,330
essentially a compiler.

130
00:06:58,330 --> 00:07:00,900
A model for doing that is you
just take the evaluator, you

131
00:07:00,900 --> 00:07:04,970
run it over the code, but
instead of executing the

132
00:07:04,970 --> 00:07:07,550
actual operations, you
just save them away.

133
00:07:07,550 --> 00:07:08,820
And that's your compiled code.

134
00:07:08,820 --> 00:07:10,140
So let me give you an
example of that.

135
00:07:15,130 --> 00:07:15,770
Suppose we're going to
compile--suppose we want to

136
00:07:15,770 --> 00:07:18,010
compile the expression f of x.

137
00:07:25,100 --> 00:07:28,175
So let's assume that we've got
f of x in the x register and

138
00:07:28,175 --> 00:07:30,170
something in the environment
register.

139
00:07:30,170 --> 00:07:31,745
And now imagine starting
up the evaluator.

140
00:07:34,560 --> 00:07:36,370
Well, it looks at the expression
and it sees that

141
00:07:36,370 --> 00:07:38,000
it's an application.

142
00:07:38,000 --> 00:07:43,730
And it branches to a place in
the evaluator code we saw

143
00:07:43,730 --> 00:07:44,980
called ev-application.

144
00:07:47,230 --> 00:07:48,190
And then it begins.

145
00:07:48,190 --> 00:07:50,560
It stores away the operands and
unev, and then it's going

146
00:07:50,560 --> 00:07:53,030
to put the operator in exp,
and it's going to go

147
00:07:53,030 --> 00:07:54,410
recursively evaluate it.

148
00:07:54,410 --> 00:07:56,385
That's the process that
we walk through.

149
00:07:56,385 --> 00:07:58,360
And if you start looking at
the code, you start seeing

150
00:07:58,360 --> 00:08:00,200
some register operations.

151
00:08:00,200 --> 00:08:03,370
You see assign to unev the
operands, assign to exp the

152
00:08:03,370 --> 00:08:05,520
operator, save the environment,
generate

153
00:08:05,520 --> 00:08:06,770
that, and so on.

154
00:08:10,310 --> 00:08:16,220
Well, if we look on the overhead
here, we can see, we

155
00:08:16,220 --> 00:08:20,860
can see those operations
starting to be produced.

156
00:08:20,860 --> 00:08:24,130
Here's sort of the first real
operation that the evaluator

157
00:08:24,130 --> 00:08:24,910
would have done.

158
00:08:24,910 --> 00:08:27,980
It pulls the operands out of the
exp register and assigns

159
00:08:27,980 --> 00:08:31,340
it to unev. And then it assigns
something to the

160
00:08:31,340 --> 00:08:34,240
expression register, and it
saves continue, and it saves

161
00:08:34,240 --> 00:08:34,740
env.

162
00:08:34,740 --> 00:08:38,049
And all I'm doing here is
writing down the register

163
00:08:38,049 --> 00:08:41,130
assignments that the evaluator
would have done in

164
00:08:41,130 --> 00:08:42,010
executing that code.

165
00:08:42,010 --> 00:08:44,280
And can zoom out a little bit.

166
00:08:44,280 --> 00:08:49,430
Altogether, there are about
19 operations there.

167
00:08:49,430 --> 00:08:52,650
And this is the--this will be
the piece of code up until the

168
00:08:52,650 --> 00:08:56,230
point where the evaluator
branches off to

169
00:08:56,230 --> 00:08:57,940
apply-dispatch.

170
00:08:57,940 --> 00:09:00,110
And in fact, in this compiler,
we're not going to worry about

171
00:09:00,110 --> 00:09:01,450
apply-dispatch at all.

172
00:09:01,450 --> 00:09:02,672
We're going to have
everything--we're going to

173
00:09:02,672 --> 00:09:06,160
have both interpreted code
and compiled code.

174
00:09:06,160 --> 00:09:08,670
Always evaluate procedures,
always apply procedures by

175
00:09:08,670 --> 00:09:10,240
going to apply-dispatch.

176
00:09:10,240 --> 00:09:12,720
That will easily allow
interpreted code and compiled

177
00:09:12,720 --> 00:09:13,970
code to call each other.

178
00:09:18,330 --> 00:09:21,220
Well, in principle, that's
all we need to do.

179
00:09:21,220 --> 00:09:22,620
You just run the evaluator.

180
00:09:22,620 --> 00:09:24,320
So the compiler's a lot
like the evaluator.

181
00:09:24,320 --> 00:09:26,890
You run it, except it stashes
away these operations instead

182
00:09:26,890 --> 00:09:29,480
of actually executing them.

183
00:09:29,480 --> 00:09:32,680
Well, that's not, that's
not quite true.

184
00:09:32,680 --> 00:09:36,370
There's only one little
lie in that.

185
00:09:36,370 --> 00:09:40,480
What you have to worry about is
if you have a, a predicate.

186
00:09:40,480 --> 00:09:44,200
If you have some kind of test
you want to do, obviously, at

187
00:09:44,200 --> 00:09:47,000
the point when you're compiling
it, you don't know

188
00:09:47,000 --> 00:09:49,490
which branch of these--of a
conditional like this you're

189
00:09:49,490 --> 00:09:51,400
going to do.

190
00:09:51,400 --> 00:09:55,010
So you can't say which one the
evaluator would have done.

191
00:09:55,010 --> 00:09:57,190
So all you do there
is very simple.

192
00:09:57,190 --> 00:09:58,985
You compile both branches.

193
00:09:58,985 --> 00:10:02,050
So you compile a structure
that looks like this.

194
00:10:02,050 --> 00:10:08,430
That'll compile into something
that says, the code, the code

195
00:10:08,430 --> 00:10:18,140
for P. And it puts its results
in, say, the val register.

196
00:10:18,140 --> 00:10:21,680
So you walk the interpreter over
the predicate and make

197
00:10:21,680 --> 00:10:24,770
sure that the result would
go into the val register.

198
00:10:24,770 --> 00:10:30,790
And then you compile an
instruction that says, branch

199
00:10:30,790 --> 00:10:38,670
if, if val is true, to a place
we'll call label one.

200
00:10:44,950 --> 00:10:49,792
Then we, we will put the
code for B to walk the

201
00:10:49,792 --> 00:10:54,040
interpreter--walk the
interpreter over B. And then

202
00:10:54,040 --> 00:10:58,070
go to put in an instruction
that says, go to the next

203
00:10:58,070 --> 00:11:03,820
thing, whatever, whatever was
supposed to happen after this

204
00:11:03,820 --> 00:11:04,920
thing was done.

205
00:11:04,920 --> 00:11:06,900
You put in that instruction.

206
00:11:06,900 --> 00:11:08,280
And here you put label one.

207
00:11:11,521 --> 00:11:19,860
And here you put the
code for A. And you

208
00:11:19,860 --> 00:11:25,870
put go to next thing.

209
00:11:31,420 --> 00:11:33,090
So that's how you treat
a conditional.

210
00:11:33,090 --> 00:11:35,890
You generate a little
block like that.

211
00:11:35,890 --> 00:11:40,550
And other than that, this
zeroth-order compiler is the

212
00:11:40,550 --> 00:11:42,310
same as the evaluator.

213
00:11:42,310 --> 00:11:44,380
It's just stashing away the
instructions instead of

214
00:11:44,380 --> 00:11:46,380
executing them.

215
00:11:46,380 --> 00:11:48,140
That seems pretty simple,
but we've gained

216
00:11:48,140 --> 00:11:50,120
something by that.

217
00:11:50,120 --> 00:11:51,360
See, already that's
going to be more

218
00:11:51,360 --> 00:11:53,630
efficient than the evaluator.

219
00:11:53,630 --> 00:11:58,030
Because, if you watch the
evaluator run, it's not only

220
00:11:58,030 --> 00:12:01,410
generating the register
operations we wrote down, it's

221
00:12:01,410 --> 00:12:04,740
also doing things to decide
which ones to generate.

222
00:12:04,740 --> 00:12:08,480
So the very first thing it does,
say, here for instance,

223
00:12:08,480 --> 00:12:13,470
is go do some tests and decide
that this is an application,

224
00:12:13,470 --> 00:12:15,930
and then branch off to the
place that, that handles

225
00:12:15,930 --> 00:12:16,780
applications.

226
00:12:16,780 --> 00:12:18,870
In other words, what the
evaluator's doing is

227
00:12:18,870 --> 00:12:23,720
simultaneously analyzing the
code to see what to do, and

228
00:12:23,720 --> 00:12:25,580
running these operations.

229
00:12:25,580 --> 00:12:25,960
And when you--

230
00:12:25,960 --> 00:12:28,960
if you run the evaluator a
million times, that analysis

231
00:12:28,960 --> 00:12:31,870
phase happens a million times,
whereas in the compiler, it's

232
00:12:31,870 --> 00:12:33,650
happened once, and then you
just have the register

233
00:12:33,650 --> 00:12:34,900
operations themselves.

234
00:12:39,730 --> 00:12:42,310
Ok, that's a, a zeroth-order
compiler, but it is a

235
00:12:42,310 --> 00:12:44,550
wretched, wretched compiler.

236
00:12:44,550 --> 00:12:47,200
It's really dumb.

237
00:12:47,200 --> 00:12:52,040
Let's--let's go back and, and
look at this overhead.

238
00:12:52,040 --> 00:12:54,170
So look at look at some
of the operations

239
00:12:54,170 --> 00:12:56,020
this thing is doing.

240
00:12:56,020 --> 00:13:01,030
We're supposedly looking
at the operations and

241
00:13:01,030 --> 00:13:03,710
interpreting f of x.

242
00:13:03,710 --> 00:13:05,220
Now, look here what
it's doing.

243
00:13:05,220 --> 00:13:10,360
For example, here it
assigns to exp the

244
00:13:10,360 --> 00:13:13,850
operator in fetch of exp.

245
00:13:13,850 --> 00:13:16,290
But see, there's no reason to
do that, because this is--

246
00:13:16,290 --> 00:13:21,290
the compiler knows that the
operator, fetch of exp, is f

247
00:13:21,290 --> 00:13:23,310
right here.

248
00:13:23,310 --> 00:13:25,850
So there's no reason why this
instruction should say that.

249
00:13:25,850 --> 00:13:29,580
It should say, we'll
assign to exp, f.

250
00:13:29,580 --> 00:13:32,000
Or in fact, you don't
need exp at all.

251
00:13:32,000 --> 00:13:33,670
There's no reason it should
have exp at all.

252
00:13:33,670 --> 00:13:35,170
What, what did exp
get used for?

253
00:13:35,170 --> 00:13:43,190
Well, if we come down here,
we're going to assign to val,

254
00:13:43,190 --> 00:13:48,620
look up the stuff in exp
in the environment.

255
00:13:48,620 --> 00:13:50,800
So what we really should do is
get rid of the exp register

256
00:13:50,800 --> 00:13:53,290
altogether, and just change
this instruction to say,

257
00:13:53,290 --> 00:13:57,600
assign to val, look up the
variable value of the symbol f

258
00:13:57,600 --> 00:13:58,850
in the environment.

259
00:14:01,100 --> 00:14:04,800
Similarly, back up here, we
don't need unev at all,

260
00:14:04,800 --> 00:14:08,260
because we know what the
operands of fetch of exp are

261
00:14:08,260 --> 00:14:09,150
for this piece of code.

262
00:14:09,150 --> 00:14:10,630
It's the, it's the list x.

263
00:14:13,270 --> 00:14:19,660
So in some sense, you don't
want unev and exp at all.

264
00:14:19,660 --> 00:14:22,690
See, what they really are in
some sense, those aren't

265
00:14:22,690 --> 00:14:24,330
registers of the actual machine

266
00:14:24,330 --> 00:14:25,230
that's supposed to run.

267
00:14:25,230 --> 00:14:28,180
Those are registers that have to
do with arranging the thing

268
00:14:28,180 --> 00:14:30,760
that can simulate
that machine.

269
00:14:30,760 --> 00:14:34,890
So they're always going to hold
expressions which, from

270
00:14:34,890 --> 00:14:37,330
the compiler's point of view,
are just constants, so can be

271
00:14:37,330 --> 00:14:39,510
put right into the code.

272
00:14:39,510 --> 00:14:41,850
So you can forget about all the
operations worrying about

273
00:14:41,850 --> 00:14:44,000
exp and unev and just
use those constants.

274
00:14:44,000 --> 00:14:48,200
Similarly, again, if we go, go
back and look here, there are

275
00:14:48,200 --> 00:14:50,510
things like assign to
continue eval-args.

276
00:14:53,890 --> 00:14:55,440
Now, that has nothing
to do with anything.

277
00:14:55,440 --> 00:14:59,280
That was just the evaluator
keeping track of where it

278
00:14:59,280 --> 00:15:05,150
should go next, to evaluate the
arguments in some, in some

279
00:15:05,150 --> 00:15:06,920
application.

280
00:15:06,920 --> 00:15:08,690
But of course, that's irrelevant
to the compiler,

281
00:15:08,690 --> 00:15:09,940
because you--

282
00:15:11,470 --> 00:15:15,220
the analysis phase will have
already done that.

283
00:15:15,220 --> 00:15:17,680
So this is completely
irrelevant.

284
00:15:17,680 --> 00:15:20,170
So a lot of these, these
assignments to continue have

285
00:15:20,170 --> 00:15:24,070
not to do where the running
machine is supposed to

286
00:15:24,070 --> 00:15:26,120
continue in keeping track
of its state.

287
00:15:26,120 --> 00:15:28,380
It has to, to do with where the
evaluator analysis should

288
00:15:28,380 --> 00:15:30,080
continue, and those are
completely irrelevant.

289
00:15:30,080 --> 00:15:31,330
So we can get rid of them.

290
00:15:44,330 --> 00:15:46,990
Ok, well, if we, if we simply
do that, make those kinds of

291
00:15:46,990 --> 00:15:51,380
optimizations, get rid, get rid
of worrying about exp and

292
00:15:51,380 --> 00:15:55,030
unev, and get rid of these
irrelevant register

293
00:15:55,030 --> 00:16:01,400
assignments to continue, then we
can take this literal code,

294
00:16:01,400 --> 00:16:05,370
these sort of 19 instructions
that the, that the evaluator

295
00:16:05,370 --> 00:16:08,540
would have done, and
then replace them.

296
00:16:08,540 --> 00:16:09,865
Let's look at the,
at the slide.

297
00:16:13,490 --> 00:16:15,180
Replace them by--we get rid
of about half of them.

298
00:16:18,370 --> 00:16:21,470
And again, this is just sort
of filtering what the

299
00:16:21,470 --> 00:16:23,410
evaluator would have done
by getting rid of

300
00:16:23,410 --> 00:16:25,200
the irrelevant stuff.

301
00:16:25,200 --> 00:16:29,450
And you see, for instance, here
the--where the evaluator

302
00:16:29,450 --> 00:16:32,570
said, assign val, look up
variable value, fetch of exp,

303
00:16:32,570 --> 00:16:35,470
here we have put in
the constant f.

304
00:16:35,470 --> 00:16:37,020
Here we've put in
the constant x.

305
00:16:39,770 --> 00:16:43,860
So there's a, there's a little
better compiler.

306
00:16:43,860 --> 00:16:47,930
It's still pretty dumb.

307
00:16:47,930 --> 00:16:50,560
It's still doing a lot
of dumb things.

308
00:16:50,560 --> 00:16:53,290
Again, if we go look at the
slide again, look at the very

309
00:16:53,290 --> 00:17:00,150
beginning here, we see a save
the environment, assign

310
00:17:00,150 --> 00:17:03,430
something to the val register,
and restore the environment.

311
00:17:03,430 --> 00:17:05,030
Where'd that come from?

312
00:17:05,030 --> 00:17:08,200
That came from the evaluator
back here saying, oh, I'm in

313
00:17:08,200 --> 00:17:11,160
the middle of evaluating
an application.

314
00:17:11,160 --> 00:17:15,940
So I'm going to recursively
call eval dispatch.

315
00:17:15,940 --> 00:17:18,170
So I'd better save the thing I'm
going to need later, which

316
00:17:18,170 --> 00:17:19,849
is the environment.

317
00:17:19,849 --> 00:17:21,609
This was the result
of recursively

318
00:17:21,609 --> 00:17:23,520
calling eval dispatch.

319
00:17:23,520 --> 00:17:26,540
It was evaluating the symbol
f in that case.

320
00:17:26,540 --> 00:17:28,900
Then it came back from eval
dispatch, restored the

321
00:17:28,900 --> 00:17:31,380
environment.

322
00:17:31,380 --> 00:17:35,290
But in fact, the actual thing
it ended up doing in the

323
00:17:35,290 --> 00:17:38,740
evaluation is not going to hurt
the environment at all.

324
00:17:38,740 --> 00:17:40,890
So there's no reason to be
saving the environment and

325
00:17:40,890 --> 00:17:42,170
restoring the environment
here.

326
00:17:46,020 --> 00:17:53,690
Similarly, here I'm saving the
argument list. That's a piece

327
00:17:53,690 --> 00:17:56,560
of the argument evaluation
loop, saving the argument

328
00:17:56,560 --> 00:17:58,090
list, and here you restore it.

329
00:17:58,090 --> 00:18:01,510
But the actual thing that you
ended up doing didn't trash

330
00:18:01,510 --> 00:18:04,090
the argument list. So there
was no reason to save it.

331
00:18:08,690 --> 00:18:14,415
So another way to say, another
way to say that is that the,

332
00:18:14,415 --> 00:18:19,923
the evaluator has to be
maximally pessimistic, because

333
00:18:19,923 --> 00:18:22,050
as far from its point of view
it's just going off to

334
00:18:22,050 --> 00:18:23,180
evaluate something.

335
00:18:23,180 --> 00:18:26,200
So it better save what it's
going to need later.

336
00:18:26,200 --> 00:18:28,700
But once you've done the
analysis, the compiler is in a

337
00:18:28,700 --> 00:18:32,140
position to say, well, what
actually did I need to save?

338
00:18:32,140 --> 00:18:35,410
And doesn't need to do any--
it doesn't need to be as

339
00:18:35,410 --> 00:18:38,060
careful as the evaluator,
because it knows what it

340
00:18:38,060 --> 00:18:39,950
actually needs.

341
00:18:39,950 --> 00:18:44,240
Well, in any case, if we do that
and eliminate all those

342
00:18:44,240 --> 00:18:48,110
redundant saves and restores,
then we can

343
00:18:48,110 --> 00:18:49,400
get it down to this.

344
00:18:49,400 --> 00:18:52,810
And you see there are actually
only three instructions that

345
00:18:52,810 --> 00:18:56,230
we actually need, down from the
initial 11 or so, or the

346
00:18:56,230 --> 00:19:00,070
initial 20 or so in
the original one.

347
00:19:00,070 --> 00:19:03,260
And that's just saying, of those
register operations,

348
00:19:03,260 --> 00:19:04,870
which ones did we
actually need?

349
00:19:09,490 --> 00:19:11,950
Let me just sort of summarize
that in another way, just to

350
00:19:11,950 --> 00:19:13,450
show you in a little
better picture.

351
00:19:16,010 --> 00:19:18,690
Here's a picture of starting--

352
00:19:18,690 --> 00:19:20,530
This is looking at all the
saves and restores.

353
00:19:23,770 --> 00:19:26,300
So here's the expression, f
of x, and then this traces

354
00:19:26,300 --> 00:19:30,940
through, on the bottom here,
the various places in the

355
00:19:30,940 --> 00:19:38,160
evaluator that were passed when
the evaluation happened.

356
00:19:38,160 --> 00:19:40,250
And then here, here
you see arrows.

357
00:19:40,250 --> 00:19:42,320
Arrow down means
register saved.

358
00:19:42,320 --> 00:19:43,690
So the first thing that
happened is the

359
00:19:43,690 --> 00:19:46,860
environment got saved.

360
00:19:46,860 --> 00:19:48,305
And over here, the environment
got restored.

361
00:19:52,380 --> 00:19:56,220
And these-- so there are all the
pairs of stack operations.

362
00:19:56,220 --> 00:19:59,462
Now, if you go ahead and say,
well, let's remember that we

363
00:19:59,462 --> 00:20:02,070
don't--that unev, for instance,
is a completely

364
00:20:02,070 --> 00:20:03,320
useless register.

365
00:20:07,550 --> 00:20:09,820
And if we use the constant
structure of the code, well,

366
00:20:09,820 --> 00:20:11,770
we don't need, we don't need
to save unev. We don't need

367
00:20:11,770 --> 00:20:13,020
unev at all.

368
00:20:16,220 --> 00:20:18,790
And then, depending on how we
set up the discipline of

369
00:20:18,790 --> 00:20:22,610
the--of calling other things
that apply, we may or may not

370
00:20:22,610 --> 00:20:23,860
need to save continue.

371
00:20:27,360 --> 00:20:28,800
That's the first step I did.

372
00:20:28,800 --> 00:20:30,116
And then we can look and see
what's actually, what's

373
00:20:30,116 --> 00:20:32,960
actually needed.

374
00:20:32,960 --> 00:20:36,300
See, we don't-- didn't really
need to save env or

375
00:20:36,300 --> 00:20:38,536
cross-evaluating f, because
it wouldn't, it

376
00:20:38,536 --> 00:20:40,040
wouldn't trash it.

377
00:20:40,040 --> 00:20:46,720
So if we take advantage of that,
and see the evaluation

378
00:20:46,720 --> 00:20:52,280
of f here, doesn't really need
to worry about, about hurting

379
00:20:52,280 --> 00:20:57,560
env. And similarly, the
evaluation of x here, when the

380
00:20:57,560 --> 00:21:00,140
evaluator did that it said, oh,
I'd better preserve the

381
00:21:00,140 --> 00:21:03,320
function register around that,
because I might need it later.

382
00:21:03,320 --> 00:21:07,140
And I better preserve
the argument list.

383
00:21:07,140 --> 00:21:09,280
Whereas the compiler is now in
a position to know, well, we

384
00:21:09,280 --> 00:21:10,690
didn't really need
to save-- to do

385
00:21:10,690 --> 00:21:12,730
those saves and restores.

386
00:21:12,730 --> 00:21:15,520
So in fact, all of the stack
operations done by the

387
00:21:15,520 --> 00:21:18,900
evaluator turned out to be
unnecessary or overly

388
00:21:18,900 --> 00:21:19,670
pessimistic.

389
00:21:19,670 --> 00:21:21,390
And the compiler is in a
position to know that.

390
00:21:27,470 --> 00:21:29,980
Well that's the basic idea.

391
00:21:29,980 --> 00:21:32,600
We take the evaluator, we
eliminate the things that you

392
00:21:32,600 --> 00:21:34,450
don't need, that in some sense
have nothing to do with the

393
00:21:34,450 --> 00:21:38,480
compiler at all, just the
evaluator, and then you see

394
00:21:38,480 --> 00:21:40,460
which stack operations
are unnecessary.

395
00:21:40,460 --> 00:21:44,490
That's the basic structure of
the compiler that's described

396
00:21:44,490 --> 00:21:45,130
in the book.

397
00:21:45,130 --> 00:21:48,620
Let me just show you
how that examples a

398
00:21:48,620 --> 00:21:51,280
little bit too simple.

399
00:21:51,280 --> 00:21:53,500
To see how you, how you actually
save a lot, let's

400
00:21:53,500 --> 00:21:55,765
look at a little bit more
complicated expression.

401
00:21:58,330 --> 00:22:03,542
F of G of X and 1.

402
00:22:03,542 --> 00:22:06,410
And I'm not going to go
through all the code.

403
00:22:06,410 --> 00:22:09,830
There's a, there's a
fair pile of it.

404
00:22:09,830 --> 00:22:13,410
I think there are, there are
something like 16 pairs of

405
00:22:13,410 --> 00:22:15,440
register saves and restores
as the evaluator

406
00:22:15,440 --> 00:22:17,270
walks through that.

407
00:22:17,270 --> 00:22:20,680
Here's a diagram of them.

408
00:22:20,680 --> 00:22:21,060
Let's see.

409
00:22:21,060 --> 00:22:24,210
You see what's going on.

410
00:22:24,210 --> 00:22:25,530
You start out by--the evaluator
says, oh, I'm about

411
00:22:25,530 --> 00:22:26,480
to do an application.

412
00:22:26,480 --> 00:22:28,010
I'll preserve the environment.

413
00:22:28,010 --> 00:22:30,261
I'll restore it here.

414
00:22:30,261 --> 00:22:33,900
Then I'm about to do
the first operand.

415
00:22:36,790 --> 00:22:38,970
Here it recursively goes
to the evaluator.

416
00:22:38,970 --> 00:22:41,370
The evaluator says, oh, this is
an application, I'll save

417
00:22:41,370 --> 00:22:44,090
the environment, do the operator
of that combination,

418
00:22:44,090 --> 00:22:46,740
restore it here.

419
00:22:46,740 --> 00:22:51,720
This save--this restore matches
that save. And so on.

420
00:22:51,720 --> 00:22:53,740
There's unev here, which turns
out to be completely

421
00:22:53,740 --> 00:22:57,240
unnecessary, continues getting
bumped around here.

422
00:22:57,240 --> 00:23:01,040
The function register is
getting, getting saved across

423
00:23:01,040 --> 00:23:05,330
the first operands, across
the operands.

424
00:23:05,330 --> 00:23:06,680
All sorts of things
are going on.

425
00:23:06,680 --> 00:23:09,090
But if you say, well, what of
those really were the business

426
00:23:09,090 --> 00:23:12,770
of the compiler as opposed to
the evaluator, you get rid of

427
00:23:12,770 --> 00:23:14,320
a whole bunch.

428
00:23:14,320 --> 00:23:19,500
And then on top of that, if
you say things like, the

429
00:23:19,500 --> 00:23:24,520
evaluation of F doesn't hurt the
environment register, or

430
00:23:24,520 --> 00:23:30,500
simply looking up the symbol X,
you don't have to protect

431
00:23:30,500 --> 00:23:34,570
the function register
against that.

432
00:23:34,570 --> 00:23:36,044
So you come down to just
a couple of, a

433
00:23:36,044 --> 00:23:37,530
couple of pairs here.

434
00:23:40,280 --> 00:23:42,160
And still, you can do
a little better.

435
00:23:42,160 --> 00:23:44,962
Look what's going on here with
the environment register.

436
00:23:44,962 --> 00:23:51,350
The environment register comes
along and says, oh, here's a

437
00:23:51,350 --> 00:23:52,600
combination.

438
00:23:54,280 --> 00:23:58,580
This evaluator, by the way,
doesn't know anything about G.

439
00:23:58,580 --> 00:24:02,330
So here it says, so it says,
I'd better save the

440
00:24:02,330 --> 00:24:05,610
environment register, because
evaluating G might be some

441
00:24:05,610 --> 00:24:07,960
arbitrary piece of code that
would trash it, and I'm going

442
00:24:07,960 --> 00:24:12,360
to need it later, after
this argument, for

443
00:24:12,360 --> 00:24:15,540
doing the second argument.

444
00:24:15,540 --> 00:24:20,580
So that's why this one didn't go
away, because the compiler

445
00:24:20,580 --> 00:24:22,550
made no assumptions about
what G would do.

446
00:24:22,550 --> 00:24:26,170
On the other hand, if you look
at what the second argument

447
00:24:26,170 --> 00:24:27,710
is, that's just looking
up one.

448
00:24:27,710 --> 00:24:30,810
That doesn't need this
environment register.

449
00:24:30,810 --> 00:24:32,070
So there's no reason
to save it.

450
00:24:32,070 --> 00:24:35,020
So in fact, you can get
rid of that one, too.

451
00:24:35,020 --> 00:24:38,290
And from this whole pile of, of
register operations, if you

452
00:24:38,290 --> 00:24:40,840
simply do a little bit of
reasoning like that, you get

453
00:24:40,840 --> 00:24:45,170
down to, I think, just two pairs
of saves and restores.

454
00:24:45,170 --> 00:24:47,870
And those, in fact, could go
away further if you, if you

455
00:24:47,870 --> 00:24:56,650
knew something about G.

456
00:24:56,650 --> 00:24:59,250
So again, the general idea is
that the reason the compiler

457
00:24:59,250 --> 00:25:01,430
can be better is that the
interpreter doesn't know what

458
00:25:01,430 --> 00:25:03,310
it's about to encounter.

459
00:25:03,310 --> 00:25:05,740
It has to be maximally
pessimistic in saving things

460
00:25:05,740 --> 00:25:07,750
to protect itself.

461
00:25:07,750 --> 00:25:10,820
The compiler only has
to deal with what

462
00:25:10,820 --> 00:25:13,410
actually had to be saved.

463
00:25:13,410 --> 00:25:15,620
And there are two reasons
that something might

464
00:25:15,620 --> 00:25:17,920
not have to be saved.

465
00:25:17,920 --> 00:25:20,100
One is that what you're
protecting it against, in

466
00:25:20,100 --> 00:25:22,700
fact, didn't trash the register,
like it was just a

467
00:25:22,700 --> 00:25:24,210
variable look-up.

468
00:25:24,210 --> 00:25:26,730
And the other one is, that the
thing that you were saving it

469
00:25:26,730 --> 00:25:30,800
for might turn out not
to actually need it.

470
00:25:30,800 --> 00:25:34,370
So those are the two basic
pieces of knowledge that the

471
00:25:34,370 --> 00:25:37,010
compiler can take advantage
of in making

472
00:25:37,010 --> 00:25:38,260
the code more efficient.

473
00:25:44,570 --> 00:25:45,820
Let's break for questions.

474
00:25:51,280 --> 00:25:54,410
AUDIENCE: You kept saying that
the uneval register, unev

475
00:25:54,410 --> 00:25:56,350
register didn't need
to be used at all.

476
00:25:56,350 --> 00:25:57,660
Does that mean that you
could just map a

477
00:25:57,660 --> 00:25:58,590
six-register machine?

478
00:25:58,590 --> 00:26:00,220
Or is that, in this particular
example, it

479
00:26:00,220 --> 00:26:01,860
didn't need to be used?

480
00:26:01,860 --> 00:26:05,480
PROFESSOR: For the compiler,
you could generate code for

481
00:26:05,480 --> 00:26:07,580
the six-register, five, right?

482
00:26:07,580 --> 00:26:08,930
Because that exp
goes away also.

483
00:26:11,750 --> 00:26:14,700
Assuming--yeah, you can get
rid of both exp and unev,

484
00:26:14,700 --> 00:26:17,380
because, see, those are data
structures of the evaluator.

485
00:26:17,380 --> 00:26:19,600
Those are all things that would
be constants from the

486
00:26:19,600 --> 00:26:21,410
point of view of the compiler.

487
00:26:21,410 --> 00:26:24,730
The only thing is this
particular compiler is set up

488
00:26:24,730 --> 00:26:29,330
so that interpreted code and
compiled code can coexist.

489
00:26:29,330 --> 00:26:34,330
So the way to think about it is,
is maybe you build a chip

490
00:26:34,330 --> 00:26:37,420
which is the evaluator, and what
the compiler might do is

491
00:26:37,420 --> 00:26:39,920
generate code for that chip.

492
00:26:39,920 --> 00:26:41,550
It just wouldn't use two
of the registers.

493
00:26:51,158 --> 00:26:53,326
All right, let's take a break.

494
00:26:53,326 --> 00:27:28,576
[MUSIC PLAYING]

495
00:27:28,576 --> 00:27:32,900
We just looked at what the
compiler is supposed to do.

496
00:27:32,900 --> 00:27:36,700
Now let's very briefly look
at how, how this gets

497
00:27:36,700 --> 00:27:38,120
accomplished.

498
00:27:38,120 --> 00:27:39,600
And I'm going to give
no details.

499
00:27:39,600 --> 00:27:42,580
There's, there's a giant pile of
code in the book that gives

500
00:27:42,580 --> 00:27:43,440
all the details.

501
00:27:43,440 --> 00:27:46,150
But what I want to do is
just show you the, the

502
00:27:46,150 --> 00:27:49,590
essential idea here.

503
00:27:49,590 --> 00:27:51,450
Worry about the details
some other time.

504
00:27:51,450 --> 00:27:55,420
Let's imagine that we're
compiling an expression that

505
00:27:55,420 --> 00:27:57,650
looks like there's some
operator, and

506
00:27:57,650 --> 00:27:58,900
there are two arguments.

507
00:28:03,660 --> 00:28:06,310
Now, the--

508
00:28:06,310 --> 00:28:08,940
what's the code that the
compiler should generate?

509
00:28:08,940 --> 00:28:12,630
Well, first of all, it should
recursively go off and compile

510
00:28:12,630 --> 00:28:14,192
the operator.

511
00:28:14,192 --> 00:28:18,650
So it says, I'll compile
the operator.

512
00:28:21,250 --> 00:28:26,600
And where I'm going to need that
is to be in the function

513
00:28:26,600 --> 00:28:28,400
register, eventually.

514
00:28:28,400 --> 00:28:30,830
So I'll compile some
instructions that will compile

515
00:28:30,830 --> 00:28:37,640
the operator and end up
with the result in

516
00:28:37,640 --> 00:28:38,890
the function register.

517
00:28:45,420 --> 00:28:49,770
The next thing it's going to do,
another piece is to say,

518
00:28:49,770 --> 00:28:55,140
well, I have to compile
the first argument.

519
00:28:55,140 --> 00:28:58,100
So it calls itself
recursively.

520
00:28:58,100 --> 00:29:03,010
And let's say the result
will go into val.

521
00:29:09,150 --> 00:29:11,460
And then what it's going to need
to do is start setting up

522
00:29:11,460 --> 00:29:25,060
the argument list. So it'll say,
assign to argl cons of

523
00:29:25,060 --> 00:29:27,160
fetch-- so it generates this
literal instruction--

524
00:29:27,160 --> 00:29:35,430
fetch of val onto empty list.

525
00:29:35,430 --> 00:29:36,680
However, it might
have to work--

526
00:29:39,590 --> 00:29:43,950
when it gets here, it's going
to need the environment.

527
00:29:43,950 --> 00:29:45,650
It's going to need whatever
environment was here in order

528
00:29:45,650 --> 00:29:49,030
to do this evaluation of
the first argument.

529
00:29:49,030 --> 00:29:54,990
So it has to ensure that the
compilation of this operand,

530
00:29:54,990 --> 00:29:58,610
or it has to protect the
function register against

531
00:29:58,610 --> 00:30:01,220
whatever might happen in the
compilation of this operand.

532
00:30:01,220 --> 00:30:04,820
So it puts a note here and says,
oh, this piece should be

533
00:30:04,820 --> 00:30:12,650
done preserving the environment
register.

534
00:30:17,350 --> 00:30:22,630
Similarly, here, after it gets
done compiling the first

535
00:30:22,630 --> 00:30:25,110
operand, it's going to say,
I better compile--

536
00:30:25,110 --> 00:30:26,740
I'm going to need to know
the environment

537
00:30:26,740 --> 00:30:27,930
for the second operand.

538
00:30:27,930 --> 00:30:30,870
So it puts a little note here,
saying, yeah, this is also

539
00:30:30,870 --> 00:30:41,510
done preserving env. Now it goes
on and says, well, the

540
00:30:41,510 --> 00:30:48,880
next chunk of code is the one
that's going to compile the

541
00:30:48,880 --> 00:30:50,760
second argument.

542
00:30:50,760 --> 00:30:57,840
And let's say it'll compile
it with a targeted to

543
00:30:57,840 --> 00:30:59,360
val, as they say.

544
00:31:03,940 --> 00:31:08,360
And then it'll generate the
literal instruction, building

545
00:31:08,360 --> 00:31:20,860
up the argument list. So it'll
say, assign to argl cons of

546
00:31:20,860 --> 00:31:34,060
the new value it just got onto
the old argument list.

547
00:31:34,060 --> 00:31:37,610
However, in order to have the
old argument list, it better

548
00:31:37,610 --> 00:31:40,440
have arranged that the argument
list didn't get

549
00:31:40,440 --> 00:31:43,510
trashed by whatever
happened in here.

550
00:31:43,510 --> 00:31:46,200
So it puts a little note here
and says, oh, this has to be

551
00:31:46,200 --> 00:31:51,400
done preserving argl.

552
00:31:54,380 --> 00:31:58,090
Now it's got the argument
list set up.

553
00:31:58,090 --> 00:32:02,520
And it's all ready to go
to apply dispatch.

554
00:32:06,450 --> 00:32:10,440
It generates this literal
instruction.

555
00:32:14,990 --> 00:32:19,310
Because now it's got the
arguments in argl and the

556
00:32:19,310 --> 00:32:22,360
operator in fun, but wait, it's
only got the operator in

557
00:32:22,360 --> 00:32:27,520
fun if it had ensured that
this block of code didn't

558
00:32:27,520 --> 00:32:29,600
trash what was in the
function register.

559
00:32:29,600 --> 00:32:32,090
So it puts a little note here
and says, oh, yes, all this

560
00:32:32,090 --> 00:32:39,460
stuff here had better
be done preserving

561
00:32:39,460 --> 00:32:40,710
the function register.

562
00:32:46,110 --> 00:32:46,210
So that's the little--so when
it starts ticking--so

563
00:32:46,210 --> 00:32:51,510
basically, what the compiler
does is append a whole bunch

564
00:32:51,510 --> 00:32:53,432
of code sequences.

565
00:32:53,432 --> 00:32:58,580
See, what it's got in it is
little primitive pieces of

566
00:32:58,580 --> 00:33:01,940
things, like how to look up
a symbol, how to do a

567
00:33:01,940 --> 00:33:02,560
conditional.

568
00:33:02,560 --> 00:33:05,530
Those are all little
pieces of things.

569
00:33:05,530 --> 00:33:07,340
And then it appends them
together in this sort of

570
00:33:07,340 --> 00:33:08,810
discipline.

571
00:33:08,810 --> 00:33:11,890
So the basic means of combining
things is to append

572
00:33:11,890 --> 00:33:13,140
two code sequences.

573
00:33:21,610 --> 00:33:22,860
That's what's going on here.

574
00:33:25,690 --> 00:33:27,590
And it's a little bit tricky.

575
00:33:27,590 --> 00:33:32,020
The idea is that it appends
two code sequences, taking

576
00:33:32,020 --> 00:33:35,670
care to preserve a register.

577
00:33:35,670 --> 00:33:39,250
So the actual append operation
looks like this.

578
00:33:39,250 --> 00:33:41,230
What it wants to
do is say, if--

579
00:33:41,230 --> 00:33:44,450
here's what it means to append
two code sequences.

580
00:33:44,450 --> 00:33:53,685
So if sequence one
needs register--

581
00:33:53,685 --> 00:33:54,720
I should change this.

582
00:33:54,720 --> 00:33:57,200
Append sequence one
to sequence two,

583
00:33:57,200 --> 00:34:03,815
preserving some register.

584
00:34:08,370 --> 00:34:11,080
Let me say, and.

585
00:34:11,080 --> 00:34:13,719
So it's clear that sequence
one comes first.

586
00:34:13,719 --> 00:34:26,449
So if sequence two needs the
register and sequence one

587
00:34:26,449 --> 00:34:35,230
modifies the register, then
the instructions that the

588
00:34:35,230 --> 00:34:43,380
compiler spits out are,
save the register.

589
00:34:43,380 --> 00:34:44,440
Here's the code.

590
00:34:44,440 --> 00:34:45,280
You generate this code.

591
00:34:45,280 --> 00:34:50,860
Save the register, and then you
put out the recursively

592
00:34:50,860 --> 00:34:53,389
compiled stuff for
sequence one.

593
00:34:53,389 --> 00:34:54,639
And then you restore
the register.

594
00:35:00,440 --> 00:35:04,610
And then you put out the
recursively compiled stuff for

595
00:35:04,610 --> 00:35:07,330
sequence two.

596
00:35:07,330 --> 00:35:09,610
That's in the case where
you need to do it.

597
00:35:09,610 --> 00:35:12,700
Sequence two actually needs the
register, and sequence one

598
00:35:12,700 --> 00:35:15,430
actually clobbers it.

599
00:35:15,430 --> 00:35:16,320
So that's sort of if.

600
00:35:16,320 --> 00:35:25,820
Otherwise, all you spit out is
sequence one followed by

601
00:35:25,820 --> 00:35:28,240
sequence two.

602
00:35:28,240 --> 00:35:31,720
So that's the basic operation
for sticking together these

603
00:35:31,720 --> 00:35:34,490
bits of code fragments,
these bits of

604
00:35:34,490 --> 00:35:36,960
instructions into a sequence.

605
00:35:36,960 --> 00:35:42,840
And you see, from this point
of view, the difference

606
00:35:42,840 --> 00:35:46,840
between the interpreter and the
compiler, in some sense,

607
00:35:46,840 --> 00:35:50,220
is that where the compiler has
these preserving notes, and

608
00:35:50,220 --> 00:35:52,910
says, maybe I'll actually
generate the saves and

609
00:35:52,910 --> 00:35:56,220
restores and maybe I won't,
the interpreter being

610
00:35:56,220 --> 00:35:59,550
maximally pessimistic always has
a save and restore here.

611
00:35:59,550 --> 00:36:04,140
That's the essential
difference.

612
00:36:04,140 --> 00:36:07,620
Well, in order to do this, of
course, the compiler needs

613
00:36:07,620 --> 00:36:10,775
some theory of what code
sequences need

614
00:36:10,775 --> 00:36:12,025
and modifier registers.

615
00:36:14,330 --> 00:36:17,670
So the tiny little fragments
that you put in, like the

616
00:36:17,670 --> 00:36:23,340
basic primitive code fragments,
say, what are the

617
00:36:23,340 --> 00:36:27,120
operations that you do when
you look up a variable?

618
00:36:27,120 --> 00:36:29,630
What are the sequence of things
that you do when you

619
00:36:29,630 --> 00:36:32,900
compile a constant or
apply a function?

620
00:36:32,900 --> 00:36:35,600
Those have little notations in
there about what they need and

621
00:36:35,600 --> 00:36:36,850
what they modify.

622
00:36:38,760 --> 00:36:42,750
So the bottom-level
data structures--

623
00:36:42,750 --> 00:36:44,330
Well, I'll say this.

624
00:36:44,330 --> 00:36:48,070
A code sequence to the compiler
looks like this.

625
00:36:48,070 --> 00:36:50,945
It has the actual sequence
of instructions.

626
00:36:55,780 --> 00:37:00,370
And then, along with
it, there's the set

627
00:37:00,370 --> 00:37:02,195
of registers modified.

628
00:37:10,630 --> 00:37:12,335
And then there's the set
of registers needed.

629
00:37:19,910 --> 00:37:24,310
So that's the information the
compiler has that it draws on

630
00:37:24,310 --> 00:37:25,965
in order to be able to
do this operation.

631
00:37:29,420 --> 00:37:30,650
And where do those come from?

632
00:37:30,650 --> 00:37:34,920
Well, those come from, you might
expect, for the very

633
00:37:34,920 --> 00:37:37,230
primitive ones, we're going
to put them in by hand.

634
00:37:37,230 --> 00:37:39,890
And then, when we combine two
sequences, we'll figure out

635
00:37:39,890 --> 00:37:42,080
what these things should be.

636
00:37:42,080 --> 00:37:48,460
So for example, a very primitive
one, let's see.

637
00:37:48,460 --> 00:37:51,790
How about doing a register
assignment.

638
00:37:51,790 --> 00:37:56,040
So a primitive sequence might
say, oh, it's code fragment.

639
00:37:56,040 --> 00:38:03,050
Its code instruction is assigned
to R1, fetch of R2.

640
00:38:03,050 --> 00:38:05,000
So this is an example.

641
00:38:05,000 --> 00:38:08,510
That might be an example of a
sequence of instructions.

642
00:38:08,510 --> 00:38:13,110
And along with that, it'll
say, oh, what I need to

643
00:38:13,110 --> 00:38:20,670
remember is that that modifies
R1, and then it needs R2.

644
00:38:24,630 --> 00:38:27,640
So when you're first building
this compiler, you put in

645
00:38:27,640 --> 00:38:31,030
little fragments of
stuff like that.

646
00:38:31,030 --> 00:38:37,320
And now, when it combines two
sequences, if I'm going to

647
00:38:37,320 --> 00:38:45,990
combine, let's say, sequence
one, that modifies a bunch of

648
00:38:45,990 --> 00:38:50,950
registers M1, and needs a
bunch of registers N1.

649
00:38:54,940 --> 00:39:00,800
And I'm going to combine
that with sequence two.

650
00:39:00,800 --> 00:39:07,780
That modifies a bunch of
registers M2, and needs a

651
00:39:07,780 --> 00:39:09,570
bunch of registers N2.

652
00:39:12,590 --> 00:39:15,035
Then, well, we can
reason it out.

653
00:39:15,035 --> 00:39:20,230
The new code fragment,
sequence one, and--

654
00:39:20,230 --> 00:39:25,270
followed by sequence two, well,

655
00:39:25,270 --> 00:39:27,760
what's it going to modify?

656
00:39:27,760 --> 00:39:29,380
The things that it will modify
are the things that are

657
00:39:29,380 --> 00:39:33,990
modified either by sequence
one or sequence two.

658
00:39:33,990 --> 00:39:38,380
So the union of these
two sets are what

659
00:39:38,380 --> 00:39:40,530
the new thing modifies.

660
00:39:40,530 --> 00:39:45,620
And then you say, well, what is
this--what registers is it

661
00:39:45,620 --> 00:39:47,870
going to need?

662
00:39:47,870 --> 00:39:50,770
It's going to need the things
that are, first of all, needed

663
00:39:50,770 --> 00:39:52,790
by sequence one.

664
00:39:52,790 --> 00:39:55,250
So what it needs is
sequence one.

665
00:39:55,250 --> 00:39:58,820
And then, well, not quite all of
the ones that are needed by

666
00:39:58,820 --> 00:39:59,760
sequence one.

667
00:39:59,760 --> 00:40:02,910
What it needs are the ones that
are needed by sequence

668
00:40:02,910 --> 00:40:08,070
two that have not been set
up by sequence one.

669
00:40:08,070 --> 00:40:12,880
So it's sort of the union of the
things that sequence two

670
00:40:12,880 --> 00:40:19,370
needs minus the ones that
sequence one modifies.

671
00:40:19,370 --> 00:40:20,910
Because it worries about
setting them up.

672
00:40:24,230 --> 00:40:26,740
So there's the basic structure
of the compiler.

673
00:40:26,740 --> 00:40:30,520
The way you do register
optimizations is you have some

674
00:40:30,520 --> 00:40:34,010
strategies for what needs
to be preserved.

675
00:40:34,010 --> 00:40:35,450
That depends on a
data structure.

676
00:40:35,450 --> 00:40:37,600
Well, it depends on the
operation of what it means to

677
00:40:37,600 --> 00:40:39,080
put things together.

678
00:40:39,080 --> 00:40:44,710
Preserving something, that
depends on knowing what

679
00:40:44,710 --> 00:40:46,200
registers are needed
and modified

680
00:40:46,200 --> 00:40:48,900
by these code fragments.

681
00:40:48,900 --> 00:40:52,820
That depends on having little
data structures, which say, a

682
00:40:52,820 --> 00:40:56,450
code sequence is the actual
instructions, what they modify

683
00:40:56,450 --> 00:40:57,350
and what they need.

684
00:40:57,350 --> 00:40:58,750
That comes from, at
the primitive

685
00:40:58,750 --> 00:41:00,240
level, building it in.

686
00:41:00,240 --> 00:41:02,800
At the primitive level, it's
going to be completely obvious

687
00:41:02,800 --> 00:41:04,850
what something needs
and modifies.

688
00:41:04,850 --> 00:41:08,160
Plus, this particular way that
says, when I build up bigger

689
00:41:08,160 --> 00:41:11,130
ones, here's how I generate
the new set of registers

690
00:41:11,130 --> 00:41:15,010
modified and the new set
of registers needed.

691
00:41:15,010 --> 00:41:16,120
And that's the whole--

692
00:41:16,120 --> 00:41:17,810
well, I shouldn't say that's
the whole thing.

693
00:41:17,810 --> 00:41:21,320
That's the whole thing except
for about 30 pages of details

694
00:41:21,320 --> 00:41:21,860
in the book.

695
00:41:21,860 --> 00:41:28,880
But it is a perfectly usable
rudimentary compiler.

696
00:41:28,880 --> 00:41:31,390
Let me kind of show
you what it does.

697
00:41:31,390 --> 00:41:36,330
Suppose we start out with
recursive factorial.

698
00:41:36,330 --> 00:41:38,590
And these slides are going to
be much too small to read.

699
00:41:38,590 --> 00:41:40,370
I just want to flash through
the code and show you about

700
00:41:40,370 --> 00:41:41,620
how much it is.

701
00:41:44,460 --> 00:41:46,220
That starts out with--here's a
first block of it, where it

702
00:41:46,220 --> 00:41:48,740
compiles a procedure entry and
does a bunch of assignments.

703
00:41:48,740 --> 00:41:53,000
And this thing is basically up
through the part where it sets

704
00:41:53,000 --> 00:41:55,500
up to do the predicate
and test whether

705
00:41:55,500 --> 00:41:56,830
the predicate's true.

706
00:41:56,830 --> 00:41:59,530
The second part is what
results from--

707
00:41:59,530 --> 00:42:04,210
in the recursive call to
fact of n minus one.

708
00:42:04,210 --> 00:42:08,750
And this last part is coming
back from that and then taking

709
00:42:08,750 --> 00:42:09,890
care of the constant case.

710
00:42:09,890 --> 00:42:12,010
So that's about how
much code it

711
00:42:12,010 --> 00:42:13,760
would produce for factorial.

712
00:42:13,760 --> 00:42:18,380
We could make this compiler
much, much better, of course.

713
00:42:18,380 --> 00:42:21,870
The main way we could make
it better is to allow the

714
00:42:21,870 --> 00:42:24,720
compiler to make any assumptions
at all about what

715
00:42:24,720 --> 00:42:26,990
happens when you call
a procedure.

716
00:42:26,990 --> 00:42:30,810
So this compiler, for instance,
doesn't even know,

717
00:42:30,810 --> 00:42:35,030
say, that multiplication
is something that

718
00:42:35,030 --> 00:42:36,030
could be coded in line.

719
00:42:36,030 --> 00:42:37,670
Instead, it sets up this
whole mechanism.

720
00:42:37,670 --> 00:42:38,920
It goes to apply-dispatch.

721
00:42:41,430 --> 00:42:43,900
That's a tremendous waste,
because what you do every time

722
00:42:43,900 --> 00:42:46,060
you go to apply-dispatch is
you have to concept this

723
00:42:46,060 --> 00:42:48,640
argument list, because it's
a very general thing

724
00:42:48,640 --> 00:42:49,170
you're going to.

725
00:42:49,170 --> 00:42:51,510
In any real compiler, of course,
you're going to have

726
00:42:51,510 --> 00:42:53,830
registers for holding
arguments.

727
00:42:53,830 --> 00:42:57,060
And you're going to start
preserving and saving the way

728
00:42:57,060 --> 00:43:00,510
you use those registers
similar to the

729
00:43:00,510 --> 00:43:02,442
same strategy here.

730
00:43:02,442 --> 00:43:06,700
So that's probably the very main
way that this particular

731
00:43:06,700 --> 00:43:08,940
compiler in the book
could be fixed.

732
00:43:08,940 --> 00:43:12,010
There are other things like
looking up variable values and

733
00:43:12,010 --> 00:43:14,010
making more efficient primitive
operations and all

734
00:43:14,010 --> 00:43:14,490
sorts of things.

735
00:43:14,490 --> 00:43:17,260
Essentially, a good Lisp
compiler can absorb an

736
00:43:17,260 --> 00:43:19,780
arbitrary amount of effort.

737
00:43:19,780 --> 00:43:23,820
And probably one of the reasons
that Lisp is slow with

738
00:43:23,820 --> 00:43:27,470
compared to languages like
FORTRAN is that, if you look

739
00:43:27,470 --> 00:43:29,860
over history at the amount of
effort that's gone into

740
00:43:29,860 --> 00:43:32,110
building Lisp compilers, it's
nowhere near the amount of

741
00:43:32,110 --> 00:43:34,520
effort that's gone into
FORTRAN compilers.

742
00:43:34,520 --> 00:43:36,910
And maybe that's something that
will change over the next

743
00:43:36,910 --> 00:43:38,250
couple of years.

744
00:43:38,250 --> 00:43:39,500
OK, let's break.

745
00:43:43,950 --> 00:43:45,200
Questions?

746
00:43:48,370 --> 00:43:49,590
AUDIENCE: One of the very
first classes--

747
00:43:49,590 --> 00:43:52,180
I don't know if it was during
class or after class- you

748
00:43:52,180 --> 00:43:57,040
showed me the, say, addition has
a primitive that we don't

749
00:43:57,040 --> 00:44:00,720
see, and-percent add or
something like that.

750
00:44:00,720 --> 00:44:03,070
Is that because, if you're doing
inline code you'd want

751
00:44:03,070 --> 00:44:08,540
to just do it for two
operators, operands?

752
00:44:08,540 --> 00:44:10,552
But if you had more operands,
you'd want to

753
00:44:10,552 --> 00:44:12,800
do something special?

754
00:44:12,800 --> 00:44:15,290
PROFESSOR: Yeah, you're looking
in the actual scheme

755
00:44:15,290 --> 00:44:15,980
implementation.

756
00:44:15,980 --> 00:44:17,880
There's a plus, and a plus
is some operator.

757
00:44:17,880 --> 00:44:20,630
And then if you go look inside
the code for plus, you see

758
00:44:20,630 --> 00:44:21,440
something called--

759
00:44:21,440 --> 00:44:24,640
I forget-- and-percent plus
or something like that.

760
00:44:24,640 --> 00:44:27,190
And what's going on there is
that particular kind of

761
00:44:27,190 --> 00:44:28,540
optimization.

762
00:44:28,540 --> 00:44:30,520
Because, see, general
plus takes an

763
00:44:30,520 --> 00:44:31,770
arbitrary number of arguments.

764
00:44:34,750 --> 00:44:38,020
So the most general plus says,
oh, if I have an argument

765
00:44:38,020 --> 00:44:42,400
list, I'd better cons it up in
some list and then figure out

766
00:44:42,400 --> 00:44:44,880
how many there were or
something like that.

767
00:44:44,880 --> 00:44:47,820
That's terribly inefficient,
especially since most of the

768
00:44:47,820 --> 00:44:49,200
time you're probably
adding two numbers.

769
00:44:49,200 --> 00:44:52,200
You don't want to really have to
cons this argument list. So

770
00:44:52,200 --> 00:44:57,050
what you'd like to do is build
the code for plus with a bunch

771
00:44:57,050 --> 00:44:58,170
of entries.

772
00:44:58,170 --> 00:45:00,170
So most of what it's
doing is the same.

773
00:45:00,170 --> 00:45:02,630
However, there might be a
special entry that you'd go to

774
00:45:02,630 --> 00:45:04,640
if you knew there were
only two arguments.

775
00:45:04,640 --> 00:45:05,910
And those you'll put
in registers.

776
00:45:05,910 --> 00:45:07,590
They won't be in an argument
list and you won't have to

777
00:45:07,590 --> 00:45:09,080
[UNINTELLIGIBLE].

778
00:45:09,080 --> 00:45:12,570
That's how a lot of
these things work.

779
00:45:12,570 --> 00:45:13,948
OK, let's take a break.

780
00:45:13,948 --> 00:45:15,696
[MUSIC PLAYING]