1
00:00:00,350 --> 00:00:05,060
Today we're going to talk about how to translate
high-level languages into code that computers

2
00:00:05,060 --> 00:00:06,700
can execute.

3
00:00:06,700 --> 00:00:12,020
So far we've seen the Beta ISA, which includes
instructions that control the datapath operations

4
00:00:12,020 --> 00:00:15,880
performed on 32-bit data stored in the registers.

5
00:00:15,880 --> 00:00:20,320
There are also instructions for accessing
main memory and changing the program counter.

6
00:00:20,320 --> 00:00:25,130
The instructions are formatted as opcode,
source, and destination fields that form 32-bit

7
00:00:25,130 --> 00:00:27,810
values in main memory.

8
00:00:27,810 --> 00:00:32,590
To make our lives easier, we developed assembly
language as a way of specifying sequences

9
00:00:32,590 --> 00:00:34,390
of instructions.

10
00:00:34,390 --> 00:00:38,579
Each assembly language statement corresponds
to a single instruction.

11
00:00:38,579 --> 00:00:43,199
As assembly language programmers, we're responsible
for managing which values are in registers

12
00:00:43,199 --> 00:00:48,019
and which are in main memory, and we need
to figure out how to break down complicated

13
00:00:48,019 --> 00:00:52,019
operations,
e.g., accessing an element of an array, into

14
00:00:52,019 --> 00:00:55,710
the right sequence of Beta operations.

15
00:00:55,710 --> 00:01:00,170
We can go one step further and use high-level
languages to describe the computations we

16
00:01:00,170 --> 00:01:01,979
want to perform.

17
00:01:01,979 --> 00:01:06,840
These languages use variables and other data
structures to abstract away the details of

18
00:01:06,840 --> 00:01:11,680
storage allocation and the movement of data
to and from main memory.

19
00:01:11,680 --> 00:01:16,200
We can just refer to a data object by name
and let the language processor handle the

20
00:01:16,200 --> 00:01:17,200
details.

21
00:01:17,200 --> 00:01:22,320
Similarly, we'll write expressions and other
operators such as assignment (=) to efficiently

22
00:01:22,320 --> 00:01:27,320
describe what would require many statements
in assembly language.

23
00:01:27,320 --> 00:01:32,439
Today we're going to dive into how to translate
high-level language programs into code that

24
00:01:32,439 --> 00:01:34,140
will run on the Beta.

25
00:01:34,140 --> 00:01:39,289
Here we see Euclid's algorithm for determining
the greatest common divisor of two numbers,

26
00:01:39,289 --> 00:01:43,420
in this case the algorithm is written in the
C programming language.

27
00:01:43,420 --> 00:01:47,939
We'll be using a simple subset of C as our
example high-level language.

28
00:01:47,939 --> 00:01:52,109
Please see the brief overview of C in the
Handouts section if you'd like an introduction

29
00:01:52,109 --> 00:01:55,429
to C syntax and semantics.

30
00:01:55,429 --> 00:02:00,689
C was developed by Dennis Ritchie at AT&T
Bell Labs in the late 60's and early 70's

31
00:02:00,689 --> 00:02:04,180
to use when implementing the Unix operating
system.

32
00:02:04,180 --> 00:02:08,660
Since that time many new high-level languages
have been introduced providing modern abstractions

33
00:02:08,660 --> 00:02:16,150
like object-oriented programming along with
useful new data and control structures.

34
00:02:16,150 --> 00:02:21,060
Using C allows us describe a computation without
referring to any of the details of the Beta

35
00:02:21,060 --> 00:02:25,920
ISA like registers, specific Beta instructions,
and so on.

36
00:02:25,920 --> 00:02:30,930
The absence of such details means there is
less work required to create the program and

37
00:02:30,930 --> 00:02:37,470
makes it easier for others to read and understand
the algorithm implemented by the program.

38
00:02:37,470 --> 00:02:40,750
There are many advantages to using a high-level
language.

39
00:02:40,750 --> 00:02:46,430
They enable programmers to be very productive
since the programs are concise and readable.

40
00:02:46,430 --> 00:02:49,720
These attributes also make it easy to maintain
the code.

41
00:02:49,720 --> 00:02:55,280
Often it is harder to make certain types of
mistakes since the language allows us to check

42
00:02:55,280 --> 00:03:00,370
for silly errors like storing a string value
into a numeric variable.

43
00:03:00,370 --> 00:03:05,540
And more complicated tasks like dynamically
allocating and deallocating storage can be

44
00:03:05,540 --> 00:03:08,090
completely automated.

45
00:03:08,090 --> 00:03:13,200
The result is that it can take much less time
to create a correct program in a high-level

46
00:03:13,200 --> 00:03:17,720
language than it would it when writing in
assembly language.

47
00:03:17,720 --> 00:03:22,930
Since the high-level language has abstracted
away the details of a particular ISA, the

48
00:03:22,930 --> 00:03:28,010
programs are portable in the sense that we
can expect to run the same code on different

49
00:03:28,010 --> 00:03:30,920
ISAs without having to rewrite the code.

50
00:03:30,920 --> 00:03:35,490
What do we lose by using a high-level language?

51
00:03:35,490 --> 00:03:39,579
Should we worry that we'll pay a price in
terms of the efficiency and performance we

52
00:03:39,579 --> 00:03:42,980
might get by crafting each instruction by
hand?

53
00:03:42,980 --> 00:03:47,329
The answer depends on how we choose to run
high-level language programs.

54
00:03:47,329 --> 00:03:53,050
The two basic execution strategies are "interpretation"
and "compilation".

55
00:03:53,050 --> 00:03:58,230
To interpret a high-level language program,
we'll write a special program called an "interpreter"

56
00:03:58,230 --> 00:04:01,630
that runs on the actual computer, M1.

57
00:04:01,630 --> 00:04:07,431
The interpreter mimics the behavior of some
abstract easy-to-program machine M2 and for

58
00:04:07,431 --> 00:04:14,340
each M2 operation executes sequences of M1
instructions to achieve the desired result.

59
00:04:14,340 --> 00:04:20,370
We can think of the interpreter along with
M1 as an implementation of M2, i.e., given

60
00:04:20,370 --> 00:04:26,710
a program written for M2, the interpreter
will, step-by-step, emulate the effect of

61
00:04:26,710 --> 00:04:28,620
M2 instructions.

62
00:04:28,620 --> 00:04:34,120
We often use several layers of interpretation
when tackling computation tasks.

63
00:04:34,120 --> 00:04:41,240
For example, an engineer may use her laptop
with an Intel CPU to run the Python interpreter.

64
00:04:41,240 --> 00:04:47,140
In Python, she loads the SciPy toolkit, which
provides a calculator-like interface for numerical

65
00:04:47,140 --> 00:04:50,180
analysis for matrices and data.

66
00:04:50,180 --> 00:04:56,650
For each SciPy command, e.g., "find the maximum
value of a dataset", the SciPy tool kit executes

67
00:04:56,650 --> 00:05:01,389
many Python statements, e.g., to loop over
each element of the array, remembering the

68
00:05:01,389 --> 00:05:03,600
largest value.

69
00:05:03,600 --> 00:05:09,920
For each Python statement, the Python interpreter
executes many x86 instructions, e.g., to increment

70
00:05:09,920 --> 00:05:14,310
the loop index and check for loop termination.

71
00:05:14,310 --> 00:05:20,000
Executing a single SciPy command may require
executing of tens of Python statements, which

72
00:05:20,000 --> 00:05:24,000
in turn each may require executing hundreds
of x86 instructions.

73
00:05:24,000 --> 00:05:31,320
The engineer is very happy she didn't have
to write each of those instructions herself!

74
00:05:31,320 --> 00:05:35,820
Interpretation is an effective implementation
strategy when performing a computation once,

75
00:05:35,820 --> 00:05:41,020
or when exploring which computational approach
is most effective before making a more substantial

76
00:05:41,020 --> 00:05:44,470
investment in creating a more efficient implementation.

77
00:05:44,470 --> 00:05:50,270
We'll use a compilation implementation strategy
when we have computational tasks that we need

78
00:05:50,270 --> 00:05:56,210
to execute repeatedly and hence we are willing
to invest more time up-front for more efficiency

79
00:05:56,210 --> 00:05:57,909
in the long-term.

80
00:05:57,909 --> 00:06:02,430
In compilation, we also start with our actual
computer M1.

81
00:06:02,430 --> 00:06:07,639
Then we'll take our high-level language program
P2 and translate it statement-by-statement

82
00:06:07,639 --> 00:06:09,950
into a program for M1.

83
00:06:09,950 --> 00:06:13,330
Note that we're not actually running the P2
program.

84
00:06:13,330 --> 00:06:18,590
Instead we're using it as a template to create
an equivalent P1 program that can execute

85
00:06:18,590 --> 00:06:20,750
directly on M1.

86
00:06:20,750 --> 00:06:25,870
The translation process is called "compilation"
and the program that does the translation

87
00:06:25,870 --> 00:06:28,540
is called a "compiler".

88
00:06:28,540 --> 00:06:34,780
We compile the P2 program once to get the
translation P1, and then we'll run P1 on M1

89
00:06:34,780 --> 00:06:37,300
whenever we want to execute P2.

90
00:06:37,300 --> 00:06:43,480
Running P1 avoids the overhead of having to
process the P2 source and the costs of executing

91
00:06:43,480 --> 00:06:46,409
any intervening layers of interpretation.

92
00:06:46,409 --> 00:06:51,240
Instead of dynamically figuring out the necessary
machine instructions for each P2 statement

93
00:06:51,240 --> 00:06:55,060
as it's encountered,
in effect we've arranged to capture that stream

94
00:06:55,060 --> 00:07:00,600
of machine instructions and save them as a
P1 program for later execution.

95
00:07:00,600 --> 00:07:06,070
If we're willing to pay the up-front costs
of compilation, we'll get more efficient execution.

96
00:07:06,070 --> 00:07:12,420
And, with different compilers, we can arrange
to run P2 on many different machines - M2,

97
00:07:12,420 --> 00:07:17,360
M3, etc. - without having rewrite P2.

98
00:07:17,360 --> 00:07:25,020
So we now have two ways to execute a high-level
language program: interpretation and compilation.

99
00:07:25,020 --> 00:07:28,169
Both allow us to change the original source
program.

100
00:07:28,169 --> 00:07:32,540
Both allow us to abstract away the details
of the actual computer we'll use to run the

101
00:07:32,540 --> 00:07:33,540
program.

102
00:07:33,540 --> 00:07:38,080
And both strategies are widely used in modern
computer systems!

103
00:07:38,080 --> 00:07:42,770
Let's summarize the differences between interpretation
and compilation.

104
00:07:42,770 --> 00:07:47,960
Suppose the statement "x+2" appears in the
high-level program.

105
00:07:47,960 --> 00:07:52,699
When the interpreter processes this statement
it immediately fetches the value of the variable

106
00:07:52,699 --> 00:07:54,760
x and adds 2 to it.

107
00:07:54,760 --> 00:07:58,949
On the other hand, the compiler would generate
Beta instructions that would LD the variable

108
00:07:58,949 --> 00:08:04,360
x into a register and then ADD 2 to that value.

109
00:08:04,360 --> 00:08:09,760
The interpreter is executing each statement
as it's processed and, in fact, may process

110
00:08:09,760 --> 00:08:14,810
and execute the same statement many times
if, e.g., it was in a loop.

111
00:08:14,810 --> 00:08:21,650
The compiler is just generating instructions
to be executed at some later time.

112
00:08:21,650 --> 00:08:26,590
Interpreters have the overhead of processing
the high-level source code during execution

113
00:08:26,590 --> 00:08:31,650
and that overhead may be incurred many times
in loops.

114
00:08:31,650 --> 00:08:37,169
Compilers incur the processing overhead once,
making the eventual execution more efficient.

115
00:08:37,169 --> 00:08:42,549
But during development, the programmer may
have to compile and run the program many times,

116
00:08:42,549 --> 00:08:48,079
often incurring the cost of compilation for
only a single execution of the program.

117
00:08:48,079 --> 00:08:52,790
So the compile-run-debug loop can take more
time.

118
00:08:52,790 --> 00:08:57,709
The interpreter is making decisions about
the data type of x and the type of operations

119
00:08:57,709 --> 00:09:02,490
necessary at run time, i.e., while the program
is running.

120
00:09:02,490 --> 00:09:07,880
The compiler is making those decisions during
the compilation process.

121
00:09:07,880 --> 00:09:09,779
Which is the better approach?

122
00:09:09,779 --> 00:09:14,759
In general, executing compiled code is much
faster than running the code interpretively.

123
00:09:14,759 --> 00:09:19,920
But since the interpreter is making decisions
at run time, it can change its behavior depending,

124
00:09:19,920 --> 00:09:26,129
say, on the type of the data in the variable
X, offering considerable flexibility in handling

125
00:09:26,129 --> 00:09:30,610
different types of data with the same algorithm.

126
00:09:30,610 --> 00:09:34,470
Compilers take away that flexibility in exchange
for fast execution.