# Python Cost Model

Python is a high-level programming language, with many powerful primitives. Analyzing the running time of a Python program requires an understanding of the cost of the various Python primitives.

For example, in Python, you can write:

L = L1 + L2

where L, L1, and L2 are lists; the given statement computes `L` as the concatenation of the two input lists `L1` and `L2`. The running time of this statement will depend on the lengths of lists `L1` and `L2`. (The running time is more-or-less proportional to the sum of those two lengths.)

Our goal in this section is to review various Python primitive operations, and to determine bounds and/or estimates on their running times. Our approach will involve both a review of the relevant Python implementation code, and also some experimentation (analysis of actual running times and interpolating a nice curve through the resulting data points).

## Python Running Time Experiments and Discussion

The running times for various-sized inputs were measured, and then a least-squares fit was used to find the coefficient for the high-order term in the running time. (Which term is high-order was determined by some experimentation; it could have been automated...)

The least-squares fit was designed to minimize the sum of squares of relative error, using `scipy.optimize.leastsq`.

(Earlier version of this program worked with more than the high-order term; they also found coefficients for lower-order terms. But the interpolation routines tended to be poor at distinguishing *n* and *n* lg *n*. Also, it was judged to be more interesting to work with relative error than with absolute error. Finally, it seemed that looking at only the high-order term, and studying only the
relative error, seemed simplest.)

The machine used was an IBM Thinkpad T43p with a 1.86GHz Pentium M processor and 1.5GB RAM.

[This output may have results somewhat different than in the charts below, due to random run-time variations...]

Convert string to integer |
int(s) |
84 * (n/1000)^2 microseconds |
n <= 8000 |
6% rms error |

Convert integer to string |
str(x) |
75 * (n/1000)^2 microseconds |
n <= 8000 |
3% rms error |

Convert integer to hex |
"%x"%x |
2.7 * (n/1000) microseconds |
n <= 64000 |
19% rms error |

Addition (or Subtraction) |
x+y |
0.75 * (n/1000) microseconds |
n <= 512000 |
8% rms error |

Multiplication |
x*y |
13.73 * (n/1000)^1.585 microseconds |
n <= 64000 |
10% rms error |

Division (or Remainder) |
w/x |
47 * (n/1000)^2 microseconds |
n <= 32000 |
6% rms error |

Modular Exponentiation |
pow(x,y,z) |
60000 * (n/1000)^3 microseconds |
n <= 4000 |
8% rms error |

n-th power of two |
2**n |
0.06 microseconds |
n <= 512000 |
10% rms error |

It is perhaps curious that multiplication is implemented using Karatsuba's algorithm, giving an Θ(*n*^{lg 3}) running time, while division uses an Θ(*n*^{2}) algorithm.

Extract a byte from a string |
s[i] |
0.13 microseconds |
n <= 512000 |
29% rms error |

Concatenate |
s+t |
1 * (n/1000) microseconds |
n <= 256000 |
19% rms error |

Extract string of length n/2 |
s[0:n/2] |
0.3 * (n/1000) microseconds |
n <= 256000 |
28% rms error |

Translate a string |
s.translate(s,T) |
3.2 * (n/1000) microseconds |
n <= 256000 |
11% rms error |

Create an empty list |
list() |
0.40 microseconds |
(n=1) |
.5% rms error |

Access |
L[i] |
0.10 microseconds |
n <= 640000 |
3% rms error |

Append |
L.append(0) |
0.24 microseconds |
n <= 640000 |
3% rms error |

Pop |
L.pop() |
0.25 microseconds |
n <= 64000 |
0.5% rms error |

Concatenate |
L+M |
22 * (n/1000) microseconds |
n <= 64000 |
3% rms error |

Slice extraction |
L[0:n/2] |
5.4 * (n/1000) microseconds |
n <= 64000 |
4% rms error |

Copy |
L[:] |
11.5 * (n/1000) microseconds |
n <= 64000 |
10% rms error |

Slice assignment |
L[0:n/2] = P |
11 * (n/1000) microseconds |
n <= 64000 |
4% rms error |

Delete first |
del L[0] |
1.7 * (n/1000) microseconds |
n <= 64000 |
4% rms error |

Reverse |
L.reverse() |
1.3 * (n/1000) microseconds |
n <= 64000 |
10% rms error |

Sort |
L.sort() |
0.0039 * n lg(n) microseconds |
n <= 64000 |
12% rms error |

The **first** time one appends to a list, there is additional cost as the list is copied over and extra space, about 1/8 of the list size, is added to the end. Whenever the extra space is used up, the list is re-allocated into a new array with about 1.125 the
length of the previous version.

Create an empty dictionary |
dict() |
0.36 microseconds |
(n=1) |
0% rms error |

Access |
D[i] |
0.12 microseconds |
n <= 64000 |
3% rms error |

Copy |
D.copy() |
57 * (n/1000) microseconds |
n <= 64000 |
27% rms error |

List items |
D.items() |
0.0096 * n lg(n) microseconds |
n <= 64000 |
14% rms error |

What should the right high-order term be for copy and list items? It seems these should be linear, but the data for both looks somewhat super-linear. We've modelled copy here as linear and list items as `n lg(n)`, but these formulae need further work and exploration.