\documentclass[12pt,twoside]{article}
\usepackage{amsmath}
\newcommand{\profs}{Professors Srini Devadas and Erik Demaine}
\newcommand{\subj}{6.006}
\newlength{\toppush}
\setlength{\toppush}{2\headheight}
\addtolength{\toppush}{\headsep}
\newcommand{\htitle}[3]{\noindent\vspace*{-\toppush}\newline\parbox{6.5in}
{\textit{Introduction to Algorithms: 6.006}\hfill\newline
Massachusetts Institute of Technology \hfill #3\newline
\profs\hfill Handout #1\vspace*{-.5ex}\newline
\mbox{}\hrulefill\mbox{}}\vspace*{1ex}\mbox{}\newline
\begin{center}{\Large\bf #2}\end{center}}
\newcommand{\handout}[3]{\thispagestyle{empty}
\markboth{Handout #1: #2}{Handout #1: #2}
\pagestyle{myheadings}\htitle{#1}{#2}{#3}}
\setlength{\oddsidemargin}{0pt}
\setlength{\evensidemargin}{0pt}
\setlength{\textwidth}{6.5in}
\setlength{\topmargin}{0in}
\setlength{\textheight}{8.5in}
\begin{document}
\handout{3}{Problem Set 1}{Feb 7, 2007}
\setlength{\parindent}{0pt}
\newcommand{\solution}{
\medskip
{\bf Solution:}
}
This problem set is due {\bf Thursday February 21} at {\bf 11:59PM}.
Solutions should be turned in through the course website in PDF form
using \LaTeX\ or scanned handwritten solutions.
A template for writing up solutions in \LaTeX\ is available on the
course website.
Remember, your goal is to communicate. Full credit will be given only
to the correct solution which is described clearly. Convoluted and
obtuse descriptions might receive low marks, even when they are
correct. Also, aim for concise solutions, as it will save you time
spent on write-ups, and also help you conceptualize the key idea of
the problem.
\medskip
\hrulefill
\medskip
Exercises are for extra practice and should not be turned in.
{\bf Exercises:}
\begin{itemize}
\item Exercise 2.3-6 (page 37) from CLRS.
\item Exercise 3.1-3 (page 50) from CLRS.
\item Exercise 3.1-4 (page 50) from CLRS.
\end{itemize}
\hrulefill
\begin{enumerate}
\item {\bf (11 points)} Asymptotic Growth
Rank the following functions by increasing order of growth; that is,
find an arrangement $g_1, g_2, \ldots, g_{11}$ of the functions
satisfying $g_1=O(g_2)$, $g_2=O(g_3)$, \ldots, $g_{10}=O(g_{11})$.
Partition your list into equivalence classes such that $f(n)$ and
$g(n)$ are in the same class if and only if $f(n)=\Theta(g(n))$. All
the logs are in base 2.
\[
\begin{array}{llll}
{n \choose 100},
& 3^n,
& n^{100},\\
~\\
1/n,
& 2^{2n},
& 10^{100}n,\\
~\\
3^{\sqrt{n}},
& 1/5, & 4^n,\\
~\\
n\log n,
& \log(n!). \\
\end{array}
\]
\item {\bf (19 points)} Binary Search
In \emph{Problem Solving With Algorithms And Data Structures
Using Python} by Miller and Ranum, two examples are given of a
binary search algorithm. Both functions take a sorted list of
numbers, \texttt{alist}, and a query, \texttt{item}, and return true
if and only if $\texttt{item} \in \texttt{alist}$. The first
version is iterative (using a loop within a single function call)
and the second is recursive (calling itself with different
arguments). Both versions can be found on the last page of this
problem set.
Let $n = \texttt{len(alist)}$.
\begin{enumerate}
\item {\bf (6 points)} What is the runtime of the iterative version
in terms of $n$, and why? Be sure to state a recurrence relation
and solve it.
\item {\bf (8 points)} What is the runtime of the recursive version
in terms of $n$, and why? Be sure to state a recurrence relation
and solve it.
\item {\bf (5 points)} Explain how you might fix the recursive
version so that it has the same asymptotic running time as the
iterative version (but is still recursive).
\end{enumerate}
\item {\bf (30 points)} Set Intersection
Python has a built in \texttt{set} data structure. A \texttt{set} is
a collection of elements without repetition.
In an interactive Python session, type the following to create an
empty set:
\texttt{s = set()}
To find out what operations are available on sets, type:
\texttt{dir(s)}
Some fundamental operations include \texttt{add}, \texttt{remove},
and \texttt{\_\_contains\_\_} and \texttt{\_\_len\_\_}. Note that
\texttt{\_\_contains\_\_} and \texttt{\_\_len\_\_} are more commonly
called with the syntax \\ \mbox{\texttt{element in set}} and
\texttt{len(set)}. All four of these operations run in constant
time i.e. $O(1)$ time.
For this problem, we will be analyzing the runtime of
\texttt{s.intersection(t)} that takes two sets, $s$ and $t$, and
returns a new set with all the elements that occur in both $s$ and
$t$. We will then use \texttt{intersection} in a new version of the
Document Distance code from the first two lectures.
\begin{enumerate}
\item {\bf (5 points)} Using $\Theta$ notation, make a conjecture
for the asymptotic running time of \texttt{s.intersection(t)} in
terms of the sizes of the sets: $|s|$ and $|t|$. Justify your
conjecture.
HINT: Think about the fundamental operations above.
\item {\bf (10 points)} Determine experimentally the running time of
\texttt{s.intersection(t)}, by running it with different sized
sets. Fill in the following chart. Include in your PDF submission a
snippet of code that determines one of the entries in the chart.
Note: there are a number of ways to time code. You can use the
\texttt{timeit} module (see
\texttt{http://www.diveintopython.org/performance\_tuning/timeit.html}
for a good description of how to use it). Alternatively, if you have
\texttt{ipython} installed (see \texttt{http://ipython.scipy.org}),
you can use their builtin \texttt{timeit} command which is more user
friendly.
\begin{tabular}{|c|c|c|c|c|}\hline
time in $\mu$s & $|s|=10^3$ & $|s|=10^4$ & $|s|=10^5$ & $|s|=10^6$ \\ \hline
$|t|=10^3$ & & & & \\ \hline
$|t|=10^4$ & & & & \\ \hline
$|t|=10^5$ & & & & \\ \hline
$|t|=10^6$ & & & & \\ \hline
\end{tabular}
\item {\bf (5 points)} Give an approximate formula for asymptotic
running time of \\ \texttt{s.intersection(t)} based on your
experiments. How does this compare with your conjecture in part
(a)? If the results differ from your conjecture, make a new
conjecture about the algorithm used.
\item {\bf (10 points)} In the Document Distance problem from the
first two lectures, we compared two documents by counting the
words in each, treating theses counts as vectors, and computing
the angle between these two vectors. For this problem, we will
change the Document Distance code to use a new metric. Now, we
will only care about words that show up in both documents, and we
will ignore the contributions of words that only show up in one
document.
Download \texttt{ps1.py}, \texttt{docdist7.py}, and
\texttt{test-ps1.py} from the class website. \\ \texttt{docdist7.py}
is mostly the same as \texttt{docdist6.py} seen in class, however
it does not implement \texttt{vector\_angle} or
\texttt{inner\_product}; instead, it imports those functions from
\texttt{ps1.py}. Currently, \texttt{ps1.py} contains code copied
straight from \texttt{docdist6.py}, but you will need to modify
this code to implement the new metric.
\begin{itemize}
\item Modify \texttt{inner\_product} to take a third argument,
\texttt{domain}, which will be a \texttt{set} containing the
words in both texts. Modify the code so that it only increases
\texttt{sum} if the word is in \texttt{domain}.
Don't forget to change the documentation string at the top.
\item Modify \texttt{vector\_angle} so that it creates sets of the
words in both \texttt{L1} and \texttt{L2}, takes their
intersection, and uses that intersection when calling
\texttt{inner\_product}.
Again, don't forget to change the docstring at the top.
\end{itemize}
Run \texttt{test-ps1.py} to make sure your modified code
works. The same test suite will be run when you submit
\texttt{ps1.py} to the class website.
Does your code take significantly longer with the new metric? Why
or why not?
Submit \texttt{ps1.py} on the class website. All code submitted
for this class will be checked for accuracy, asymptotic
efficiency, and clarity.
\end{enumerate}
\end{enumerate}
\newpage
Iterative Version:
\begin{verbatim}
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)/2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
\end{verbatim}
Recursive Version:
\begin{verbatim}
def binarySearch(alist, item):
if len(alist) == 0:
return False
else:
midpoint = len(alist)/2
if alist[midpoint]==item:
return True
else:
if item