Recitation 21: Dynamic Programming: Knapsack Problem

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: This recitation discusses the knapsack problem and polynomial time vs. pseudo-polynomial time.

Instructor: Victor Costan

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at

PROFESSOR: How about I propose this? We go through knapsack because it's on the Pset, not knapsack but a variation of it. If we have time, we do a couple variations. If not, maybe not. And if we have time, we do at a distance. Let's see what happens after knapsack.


PROFESSOR: How many people got the difference between polynomial and pseudo polynomial? OK, so we should definitely do knapsack.

AUDIENCE: Constant factors, they don't matter. Oh wait. [INAUDIBLE]. Yeah, they do.

PROFESSOR: So the thing is it's not constant factors. That s there is not a constant.


PROFESSOR: So the knapsack problem. So the way I look at it. You're a thief. You somehow made your way into a vault. That vault has n items.

AUDIENCE: Way better than camping.

PROFESSOR: Each item has a weight of si pounds, and after you get out of the vault, assuming you make it alive and everything, you can sell it for vi dollars on the market, so this is how much money you get out of it, vi. Well, now the problem is the only thing you have with you as you enter that vault is one knapsack that can carry at most s pounds.

If you try to put more stuff in it, so if you try to load it up with more than s pounds, it's going to break as you try to escape from the vault and stuff is going to fall on the floor, and then many laser beams will shred you to pieces, so that's undesirable. So we can only load the knapsack with s pounds. Now, given this restriction, we want to make as much money as possible out of the whole thing, so we want to load up the knapsack optimally.

Does this make sense? Everyone remembers? Good. So two ways to solve it, graphs and dynamic programming. We're going to do both. Who wants to go for graphs first? Who wants-- OK, never mind. Majority has been achieved.

How do we represent this? First off, let's represent the solution. Our solution is which items we chose, right? So we can phrase this as n decisions. Each decision is a true/false decision, and it indicates if you take that item with you or not. So di says, do I take item i?

So now this looks more like a game. You are on a TV, there's a game show, and you get asked n questions. Do you want to take this item with you, yes or no? If you somehow manage to get items that are more than s pounds heavy, you get shot, you don't make it to the end of the game. Otherwise, when you leave the game, you make some money and the money that you make depends on what items you took with you. So now this looks like the game problems that we used to solve with graphs in that you have decisions, those decisions are moves, and they get you through a graph of states.

Now, let's see what's in the state. What do we need to keep track of? First, we need to keep track of what item we're thinking about, right? And then there's something else that we need to keep track of.

AUDIENCE: How much capacity we have.

PROFESSOR: Yep, very good. So the reason I need that is when I'm deciding whether I'm taking an item or not, I want to know if I'm going to get shot or not. If I take too much stuff, I'm not going to make it, so I need to know if this item fits in my backpack or not. That's equivalent to knowing how much weight I have left.

So s pounds. Let's say this is capacity, how much weight left in my backpack's capacity. This is equivalent to how much weight I have accumulated, the total weight of my items. Weight of the items I have taken so far. So the sum of the weights of the taken items.

AUDIENCE: [INAUDIBLE] left. Isn't it the total minus?

PROFESSOR: Well, I'm saying they're equivalent, so if you know one, you can compute the other. They're not equal. Knowing one lets you know the other one, but this is more useful in terms of putting the graph together, I claim. We'll see if that is true or not as we try to put the graph together. So what's a node? What's an edge?

AUDIENCE: An edge is putting in one more item, and a node is a state.

PROFESSOR: So what the weight going to be on an edge? If the edge means I'm taking an item, then the weight is going to be what?

AUDIENCE: The weight of the item you're adding.

AUDIENCE: The earning associated with that.

PROFESSOR: Yeah. I like this better because in the end, my goal is to maximize earnings, right? And I'm going to feed my graph to a shortest path algorithm. Maximize earnings, minimize path weight. They're the same issue, flip sign. So I'm going to say this is the value of the item almost. Does this work or not?

Negative value of the item. I have to flip the sign to turn it from a maximization problem to a minimization problem. So shortest path will give me the shortest path. That's going to correspond to making the least amount of money if I don't add this minus sign. So what's a node?

AUDIENCE: Is it a unique set of items?


AUDIENCE: A unique set of items.

PROFESSOR: OK. So the state in a node is going to have i, which is the item that I'm looking at, because this way, I'm going to have one node for each item. And what else did I say I need to keep track of?

AUDIENCE: The weight of the taken items.

PROFESSOR: Yep. Weight of taken. So this is sort of like the gas problem where you have to keep track of how much gas you have as well as where you are on the map. Sorry if it brings bad memories. Let's talk about an example so that we can draw a graph for it and see what things look like. So say we have three items, and say our backpack has five pounds.

And my three items are a golden statue, value $10, weight, four pounds. These are 1800 dollars when money actually used to be worth something. Crystal ball, you can sell this for $4, and it weighs two pounds, and someone in the previous section wanted a fountain pen, so we're going to use a fountain pen that's worth $7. Culture is worth a lot of money, right? And weighs three pounds. Really ancient fountain pen. I don't know how people wrote with it. So how do we draw the graph for this?

AUDIENCE: We'd make a tree where each level the tree is take item i, or don't take item i. Make a big binary tree. Does that make sense?

PROFESSOR: So if each level of the tree says whether I'm taking an item or not, and then I have two descendants, then that's going to be 2 to the number of items. So that's going to be exponential in the number of items, whereas we're going to end up in a solution where the running time is proportional to the number of items.


PROFESSOR: I mean, it makes sense in some cases, if you have fractional costs or something. By the way, all these weights are integers. Sorry I didn't mention that. My bad. But I like the idea of having some sort of levels based on items, so let's say I'm going to have a starting node, and then I'm going to have some sort of layer for item one, some sort of layer for item two, and some sort of layer for item three, some sort of vertical layer.

How many nodes do I have in a layer? One node for each possible weight because I promised that in a node, I keep track of the item that I'm considering and the weight of the items I've taken so far. How many possible weights do I have here?

AUDIENCE: Three weights.

PROFESSOR: So my backpack holds five pounds, so what are the possible sums I can get? How many?

AUDIENCE: 4, 2, 3, 5.

PROFESSOR: So far, I heard 2 3, 4, 5, 6. Who wants to bid more?

AUDIENCE: Not more than 6.

PROFESSOR: So the answer is 6, right? The possible weights are from 0, which is an empty knapsack, until weight 5, which is a full knapsack. So weight 0, 1, 2, 3, 4, 5, 6, and I'm going to start drawing the nodes.

AUDIENCE: You could have weight 10.

PROFESSOR: We're never going to fill up a knapsack with more than 5. Otherwise, we're going to die.

AUDIENCE: Why do we have 6, then?

PROFESSOR: Because I can't thank. Thank you. So this node, the first node. Weight in the backpack, 0. We're looking at item one. It's connected to the starting node. What outgoing edges do I have? What do I do with item one? I can take it or not take it.

AUDIENCE: You connect the edges 4, 2, 3. Am I answering a different question?

AUDIENCE: If it's item one, shouldn't it have a weight?

PROFESSOR: So those have two numbers in them, i and j. Sorry. They're in the wrong order here. 1, 0. So I'm looking at item one. So far, I have zero pounds in my backpack. Looking at item two with zero pounds, looking at item three with zero pounds. Looking at item one with one pound in my backpack, item two with one pound, item three with one pound, so on and so forth.

So if I'm looking at item one, and I have zero pounds in my backpack so far, what happens if I don't take item one? Where do I end up?


PROFESSOR: Yeah. So I'm looking at item two after I'm done deciding, what do I do with item one, and if I don't take item one, then I'm not going to have anything in my backpack. So this edge corresponds to we don't just take one. What if I want to take item one? Where do I land?


PROFESSOR: All right. 2, 3. 2, 4. So if I did take item one, then I had zero pounds in my backpack before. Now I have four pounds, and I'm still considering item two. What about edge weights? What's the weight of the edge that says I don't take item one? What's the weight of the edge that says I do take item one?


AUDIENCE: Minus 10.

PROFESSOR: All right. 0 and minus 10. That minus lets us get the shortest path. Now, suppose I'm in 2, 0. What are the outgoing edges?

AUDIENCE: It's the same as making a giant binary tree, right?

PROFESSOR: It's not going to be a tree because I can have multiple paths from a root to some node.

AUDIENCE: But this looks like it's taking the same complexity as building a tree if you were to carefully build the tree so that you don't have unfeasible solutions on the tree.

PROFESSOR: Then you're actually doing dynamic programming. You're not building the tree if you're doing that. You're still converging to this. You're probably thinking this and you said binary decision tree, or at least that's what I understood. You're probably thinking of the right thing. We can see if we're thinking the same thing when we end up looking at the running time. Yes?

AUDIENCE: The goal here is to have the most valuable items possible in the bag in terms of money?


PROFESSOR: So the negative 10 weight that you have there was because that item was $10.

PROFESSOR: Yep. And in the end, we're going to give the graph to the shortest path algorithm. So yes, speaking of the goal, what's the answer here, by the way?


PROFESSOR: 11. How do I get 11?

AUDIENCE: By getting the last two items.

PROFESSOR: So I take the ball and the pen and I get 11, right? Everyone on the same page? So I'm at item two. I have zero pounds in my backpack. What are my outgoing edges?

AUDIENCE: To 3, 0 and to 3, 2. Wait, sorry. That's the third item.

PROFESSOR: No, second item. You're good.

AUDIENCE: Shouldn't it be 3, 3 because the third item has three pounds?

PROFESSOR: I said I'm looking at the second item and then I'm deciding where I'm going. So this graph has a problem in that when I get to the third item, what do I do? So I need one more layer here, which means that I'm done.

AUDIENCE: I was saying that it's the outgoing edge from 2, 0, it's the weight of the taken item, so if you decide to take the third item, shouldn't the outgoing edge go to 3, 3, not 3, 2?

PROFESSOR: But when I'm at layer two, I'm only looking at item two. So I'm looking at the items in sequence. First I have to decide, am I taking item one? Then I decide, am I taking item two, and then I'm deciding, am I taking item three? While I'm here, I don't see item three. I only see item two.

AUDIENCE: Aren't those weights on the left side, though?

PROFESSOR: Yeah. These are backpack weights. So down here, these are pounds and these are items. What are the weights on the edges?

0 and negative 1.

PROFESSOR: OK. How about this other node, 2, 4? What are the edges coming out of it.



PROFESSOR: So if I decide I'm not going to take item two, but I already have four pounds in my backpack, I'm still going to end up with four pounds in my backpack, and I don't get anything out of it. And?

AUDIENCE: That's it.

PROFESSOR: And that's it, because if I try to take the second item, that would put me overweight, so the edge would go out of the graph. Therefore, it doesn't exist.

AUDIENCE: You can remove it, and then you go back to 3, 0, right? Is that not allowed?

PROFESSOR: No, because when I'm here, I don't know how I got here. I don't know if I had item one or not. The benefits of doing it this way is here, I'm just looking at item two. I do not know how I got there. I do not care what I'm going to do later. I'm just looking at one item and making one decision based on that. Does this make sense? Where do I get my answer from?

AUDIENCE: Can you put a final node on the right side?

PROFESSOR: OK, so one way of doing it is that I'm going to have a final note on the right side and connecting everything to it.

AUDIENCE: Do you need the done nodes right there, or could you have just skipped done and connected 3 to that?

PROFESSOR: So I can do that, but then it's going to be hard to reason about edges. This makes it easier to reason about edges because when I'm looking at this layer, I'm still going to have edges deciding whether I take item three or not. So it would be confusing to have all the edges pointing to the same place. I can, it's just that the graph would look more confusing. By the way, if I'm at 3, 2, what are the outgoing edges from 3, 2?

AUDIENCE: It's just all horizontal. You can't take anymore.

PROFESSOR: So I can be done with two pounds, and that means I get 0 or I can take item three, and then where do I land?

AUDIENCE: Negative 7.

PROFESSOR: Does this make sense now somewhat?

AUDIENCE: So what about this dynamic programming style?

PROFESSOR: We'll get there in a bit. Before, let's see what's the running time for this. So one way of finishing this up is we connect everything to one destination node. Another way of doing it is that we connect the source nodes to everything here with edges of cost 0 and say, well, this is how much weight we're going to waste. So if my solution path goes here and then goes somewhere through the graph, it means that I'm going to waste one pound of capacity, so my backpack is going to have four pounds when I'm done.

AUDIENCE: Would it be [INAUDIBLE] or infinity?

PROFESSOR: What should the weight be? Good question.

AUDIENCE: This is not possible, right? To get from s to 1, 1.

PROFESSOR: So I'm saying that if I get from s to 1, 1, this means that I'm wasting a pound. My backpack will have four pounds of stuff and then one pound of capacity will go to waste.

AUDIENCE: Isn't there another way to waste it on the right side if you end at the top? Or not at the bottom, basically.


AUDIENCE: So you're not double counting in terms of losing stuff? You're representing losing weight on the left side and the right side.

PROFESSOR: Well, if I represent losing weight on the left side, then the advantage is that I have a single destination node. What is it?

AUDIENCE: That dot.

PROFESSOR: So this is the source. The destination is this guy. I can waste capacity here, so that means I can say that on this side, I don't need to waste stuff. On this side, I need to arrive at done 5. That's the advantage of doing it this way, aside from the fact that it's going to map to dynamic programming.

There are many ways of doing it. I can choose to connect the source to 1, 0, and then look at all the paths here and choose the best one. I can connect the source to everything here with paths of some weight, which you guys still need to tell me what it is, and then I can look at the answer here.


PROFESSOR: Thank you. So this is another way. And yet the third way would be to only connect the source to the first node, and then connect all these done nodes to another node with cost 0, and this is equivalent to this. The reason we're doing it this way is because this maps to dynamic programming.


PROFESSOR: Yeah. It maps closely to how the DP runs.

AUDIENCE: I mean is it possible to do the DP such that you do it the other way as well?

PROFESSOR: Yeah. You can flip the DP around, too. How are we doing with this? So then I hope the running time analysis will go really fast. How many vertices?

AUDIENCE: v or three.

PROFESSOR: OK, so v is?


PROFESSOR: n layers, right? n plus 1 layers, actually.

AUDIENCE: n times weight mass.

PROFESSOR: n times capacity, right? So it's actually n plus 1 times capacity plus 1, but I don't want to deal with that, so I'm going to use order of so that I don't have to deal with that. How many edges going out of a vertex?


PROFESSOR: Yep. So at most, two edges per vertex. So how many edges total?


PROFESSOR: What shortest path algorithm am I going to use?

AUDIENCE: Bellman-Ford.

AUDIENCE: Topological sort BFS.

PROFESSOR: I'm going to choose the second one because it runs faster. So Bellman-Ford's running time is V times E. If I use the DAG algorithm, that means topological sort and then a DFS. That's V plus E. So please, whenever you have a dynamic programming graph, this is the answer. It's never anything else. It's never Dijkstra. It's never Bellman-Ford. It's always this. And this means?


PROFESSOR: So this is the running time of the graph solution.

AUDIENCE: If you use BFS, do you need to put in the dummy nodes for the edge weights?

PROFESSOR: If you use BFS. Yes. So then the running time would be bigger than V plus E.

AUDIENCE: Right. So then the running time depends on [INAUDIBLE] multiply by s again?

PROFESSOR: This isn't doing BFS. This is doing the shortest path algorithm for direct acyclic graphs. It's the DAG shortest path.


PROFESSOR: Were you here before Thanksgiving?


PROFESSOR: Right before? You missed the algorithm. It's in the Dijkstra lecture notes.

AUDIENCE: But basically, you don't have to put in all the dummy nodes?

PROFESSOR: No. So you do a topological sort, and then you only consider a node once, and you look at all the edges coming through it and make a decision based on that. So this is the running time, this is the algorithm, this is the solution.

What if we do dynamic programming instead? Instead of nodes, we have sub-problems. A sub-problem maps to a node. The same things that we decided about this data that we store in a node, the same things are going to apply to the state that we have in a sub-problem. So what are the sub-problems? First up, how many sub-problems am I going to have?



PROFESSOR: I just said that one sub-problem will map to a vertex in the graph. ns vertices, ns sub-problems, right? Come on, guys. Bear with me. Get more cookies. Are the cookies still there? Please pass them around and eat them. Yes?

AUDIENCE: One sub-problem would be getting the most amount of money for that little weight. At one pound [INAUDIBLE], how much [INAUDIBLE].

PROFESSOR: I like that, so let's start writing. What is the maximum value that I can get using a certain weight? If I label the nodes indices i, j, I'm also going to label the sub-problems using i, j. So I'm going to say that sub-problem i, j is, what's the maximum amount of money I can get by using a weight of at most j, and you said something about items, so we have to figure out which items. Which items? Anyone else can help her. You already did the hard work, so you have most of the stuff filled in. Feel free to jump in.

AUDIENCE: Greater than or equal to i.

PROFESSOR: OK. Sounds good. We're going to number the items starting from 0 this time because we're going to have a dynamic programming table, and that means we like zero base indexing. So the items are going to be 0, 1, and 2. 0 is the statue, 1 is the ball, and 2 is the pen. This means we're going to look at suffixes of items. First, the empty set, no item, then only the pen, then the ball and the pen, and then the statue, the ball, and the pen. And we're also going to consider how much weight we have in our backpack.

So let me write this again. 0, 1, 2 here, and weights 0, 1, 2, 3, 4, 5, and this is going to be our DP table. Now, how are we going to compute this? DP of i, j is?

AUDIENCE: Maximum.

PROFESSOR: All right, a maximum of something. Good. Perfect way to start. How many decisions do we have?


PROFESSOR: Two. It's the same as in the graph problem, right? Once we solve that, this should be pretty easy because it's really very similar. So the two decisions are, do I take item i, do I not take item i? If I don't take item i, what situation do I land in? So suppose I don't take item i.

AUDIENCE: i plus 1.

PROFESSOR: OK. i plus 1. So I have to make up my answer using items i plus 1 all the way through n minus 1, and how much weight do they have to have?


PROFESSOR: Yep. Now suppose I do take item i. What happens?

AUDIENCE: i plus-- no, no, no.

PROFESSOR: You're thinking of the right thing. We're making some money, right?

AUDIENCE: Value i.

PROFESSOR: Value i. So this is how much money we're making. Good.

AUDIENCE: Plus dp i plus 1 j minus weight i.

PROFESSOR: Make sense? OK Now I only have to do one more thing to make sure this code doesn't crash.

AUDIENCE: I'm sorry. What's the third term for?

PROFESSOR: This is I'm making some money, right? This means I'm going to have to look at the items starting from i plus 1 and i minus 1. And now I took item i, so that means I have si pounds in my backpack. So the remaining items must weigh j minus si pounds because I'm going to add the si pounds for this item, and in total, I'm going to have at most j pounds. Make some money, lose some capacity.


PROFESSOR: Yep. This is what happens when you move from the graph to dynamic programming, or you can draw the graph the other way around.

AUDIENCE: So could you have also done plus si, and then done something else as well?

PROFESSOR: Sorry. Plus si--

AUDIENCE: If you did plus si and then you did some other check case that was different.

PROFESSOR: Well, I do need to make a check.

AUDIENCE: Yes. So that's greater than or equal to 0.

PROFESSOR: This thing, right? Because otherwise, the code is going to crash. So j minus si is greater or equal to 0. I think this means j is greater than or equal to si, right?


PROFESSOR: OK. At least for me, the natural order when I have the graph is to consider the moves that by making in the game. I'm answering the questions one by one. Do I take this item? Do I not take this item?

When I'm in the DP, when I make a decision, I want to have the full information on that decision. Whether I take the item or not depends on what would happen with the items following it. Representing it this way allows me to make the decision right here.

AUDIENCE: If we'd done prefixes rather than suffixes, we could have done it the same.

PROFESSOR: It would have been exactly the same as there. So we taught dynamic programming over many years, and it turns out that people understand suffixes better than prefixes, and the graph format makes more sense that way, so that's why we're doing it this way. It's for your own sake. Trust me. Or at least we think so.

AUDIENCE: Where does suffix come in? I know what it means.

PROFESSOR: These guys. These are the sub-problems. Nothing, item two, items one, two, zero, one, two, so it's a suffix of the set of items.

AUDIENCE: How come you're going backwards rather than forwards? It doesn't actually matter if you go backwards or forwards?

PROFESSOR: Yep. In the end, we're going to look at everything. So we need two things. We need to know where is the answer going to be, and we need to know what are the initial conditions. Actually, I lied. We also need a topological sort. So where do we want to start?

AUDIENCE: First one.

PROFESSOR: Where is the answer? Yes?

AUDIENCE: The lower right hand corner. I mean, if you have all the [INAUDIBLE] nodes. Oh, we don't have a Done column, do we?

PROFESSOR: No. Well, so the answer will have considered all the items, right?

AUDIENCE: Top left.

PROFESSOR: So which i? Almost. So the top left corner means I looked at all the items and I have an empty knapsack.

AUDIENCE: Oh, so the max of all of the first column.

PROFESSOR: So the max of all columns would be the answer if the weight here would be equal, but I said the weight has to be smaller or equal, so the answer is easier. I don't have to do a max.

AUDIENCE: Bottom left corner.

PROFESSOR: I can only look here. So this means that I will use a weight of at most s, because this is s, and I will have considered items 0 and larger, so all the items. This is where the answer is. And in general terms, that means DP of what and what? So what's the bottom left corner?



AUDIENCE: Can you explain again why it needs to be DP 0, s.

PROFESSOR: I think it's easier to look at 0, s and see how it maps to the sub-problem, and then convince yourself that this is the answer, so let's do that. DP of 0, s means i is 0, j is s. So this is going to tell me what is the maximum amount of dollars I can get by using a weight of at most s-- agrees with the initial problem-- and using items i or greater than i, or 0 or more, so items 0 through n minus 1. These are all the items. This is the maximum capacity, so this maps to the initial problem. And that's why I did this. That's why I don't have an equal sign here.

AUDIENCE: And if you did have an equal sign, then you would be changing your sub-problems?

PROFESSOR: The recursion stays the same, but I have to look at all these to get the maximum. Someone suggested that.

AUDIENCE: If you put an equals there, you're going to have to do something to the DP such that the equals actually is captivated, because you can't just arbitrarily--

PROFESSOR: What do you think of this?

AUDIENCE: Something there is going to change, isn't it?

PROFESSOR: No. I think this works. This stays the same no matter what the sign is. The only thing that changes is where is the answer and what are the initial conditions.

AUDIENCE: Oh. So there's something else that changes.

PROFESSOR: Yep. Initial conditions. Good. I like that you're thinking about that because the next problem changes pretty much that. The sign changes and then these change. What's a good topological sort? This should be pretty easy. If I want to generate a topological sort, what variables do I iterate over, and in what order?


PROFESSOR: OK. And you're pointing at n, right?

AUDIENCE: n down to 0.

PROFESSOR: Yep. And then the only one left is j, so i in, and then j goes from where to where?

AUDIENCE: j goes to s.

AUDIENCE: It can go either direction.

PROFESSOR: Can it? What do the dependencies look like? When I'm computing i, j, I want i plus 1 and j minus-- so I'm looking at lower j's, higher i's and lower j's.

AUDIENCE: But it's always higher i's. Doesn't that mean in this for loop that you're already going to have calculated the next right column, so it doesn't matter where the column-- I may be wrong.

PROFESSOR: You're right. Fine. OK, both orders work. I just looked at this because I wanted to have a nice, simple-- but fine, both work.

AUDIENCE: Wait. You mean reversing j works?

PROFESSOR: Yes, because we're iterating over i before iterating over j. Of course, this makes more sense. Just because they're both good answers doesn't mean you should choose the one that requires more thinking. I always like the answer that requires the least amount of thinking to prove that it's correct.

AUDIENCE: Does that mean you're starting with a full--

PROFESSOR: I start here.

AUDIENCE: But if you're going from s down to 0, then you're starting down at the corner, which means you have a full knapsack, so that doesn't make sense.

PROFESSOR: So the argument was when I compute this, I'm looking one right, i plus 1j, and then I'm looking somewhere here. And I've already computed this column, so it doesn't matter if I go up or down. And that argument is correct because first I'm going to compute this column, then this column, then this column, then this column.

By the way, what's this column? This column is the initial condition. Now that we figured out the topological sort, let's try to compute values here and see what the initial condition should be. So if I want to compute DP of 2, 0, that is the maximum of either DP 3, 0 or something else. This is going to evaluate to false, so it's going to be this. What should the answer be here?


PROFESSOR: 0. So we want DP of 3, 0 to be 0 for this to work. So what's a reasonable initial condition?

AUDIENCE: Column 3 is of 0 height.

PROFESSOR: DP of nj is 0 for all j, pretty much what you said in math mode. So I'm going to have an extra column. This is like the Done problem over there, and here I'm going to say that whenever I'm looking at the empty subset-- these are all suffixes. All three items, two items, one item, empty suffix. Whenever I look at the empty suffix, I can fill up a backpack with any weight by using no items, and I'm going to make 0 money.

So what are the values here? You guys nodded, so you understood, so then you should be able to dictate the values in the table.

AUDIENCE: So is that one also 0 because you can't go up to anything, and then the next one is 4 because you can go--

PROFESSOR: Sorry. Next one is-- so it's for the fountain pen, right? Last item.

AUDIENCE: Crystal ball.

PROFESSOR: So we're going statute, ball, pen.

AUDIENCE: And this is weight on the left side, right?

PROFESSOR: So we're filling them like this in the topological sort order that we chose here. We said this is the order, so this is what we're going to use to calculate the table.

AUDIENCE: Oh. So it's still 0, then.



PROFESSOR: So here, this condition is going to evaluate to True finally, so this is going to be the maximum of either DP i plus 1j, so DP of 3, 3, which is 0, or 7 plus DP of 3, 0. So 7 plus 0, which is 7. That's why it's 7.


PROFESSOR: 7, 7. How about this?

AUDIENCE: 0. 0. 4.

PROFESSOR: The maximum of 0 and 4 plus 0, right?



PROFESSOR: So it's the maximum of 7 or 4 plus 0, so this is 7.

AUDIENCE: 7. 11.

PROFESSOR: 7 or 4 plus 7. Does this make sense for everyone? Last column. Let's do it.

AUDIENCE: 0. Feels like 0. 0 again. Why not? 4. How much does this thing weigh?


AUDIENCE: 7. 10.

PROFESSOR: Sorry for my handwriting.

AUDIENCE: And it needs to be 11 because you said that'd be the answer.

PROFESSOR: All right. So it's 11 because it's either this guy or 10 plus DP of 1, 1. The first item weighs 4, so I need to look one right and four up when adding. Does this make sense? Please say yes.

AUDIENCE: What's the [INAUDIBLE] again? I get how if you're looking horizontally, the top thing is the max. It's saying DP of i plus 1 j.

PROFESSOR: So this is looking horizontally, and this means I did not take item i.

AUDIENCE: You didn't take item i, but if you are going to take item i, you add the value of i, and then you look back for DP of i plus 1, which is the previous column, and the row number is whatever capacity you have left if you did take it?


AUDIENCE: And so in the case of cell 1, 3, if you took item one, then capacity you'd have left is 3.

PROFESSOR: Right. So you said total capacity three, right? So if I take item one, how many pounds does it have?

AUDIENCE: It has four.

PROFESSOR: Statue is 0, ball is 1, pen is 2.

AUDIENCE: Oh. You're not using--

PROFESSOR: So I'm using those items, but I'm starting using zero based indexing.

AUDIENCE: Yeah, but we're going from here to here. So I'm saying if you pick the middle cell that's 1, 3, that one--

PROFESSOR: So this has item one, so the ball, weight three. If I take the ball, how much weight do I have left?

AUDIENCE: The ball weighs two pounds, so you'd have three pounds left, and that's why you're in the three column.

PROFESSOR: I have a backpack of three pounds, and then I put a ball of two pounds in it, and I have three pounds left?

AUDIENCE: No, no, no. You have three pounds left if you put a ball and two pounds in, right?

PROFESSOR: I have a backpack of three pounds. I put a ball of two pounds. How many pounds do I have left of capacity? So backpack, three pounds total. I put a ball in that has two pounds. How many pounds?

AUDIENCE: We have none left. Oh, it can hold three total pounds?


AUDIENCE: Then you have one pound left. But we have a five pound backpack.

PROFESSOR: Well, you said we were looking at 3, 1. You said we were looking here.

AUDIENCE: Yeah, we're looking there.

PROFESSOR: So this means three pound backpack, because the sub-problem says your weight is at most j.

AUDIENCE: Got it, OK. Oh, I see. So j minus si is saying--

PROFESSOR: It means you used up, so you put up something there, so you ate some capacity.

AUDIENCE: Well, you ate only two pounds, though, so why is it in row three?

PROFESSOR: You told me to start here.

AUDIENCE: I know. I'm trying to remember how we got there.

PROFESSOR: You said, let's start here. That's why this arrow points here to one. You have two arrows, 2, 3, and 2, 1.

AUDIENCE: Because it's j minus si.

AUDIENCE: Oh, I see. Minus 3.

AUDIENCE: So j is the amount of weight you have left, right?

PROFESSOR: So j is the amount of weight I can have for this sub-problem.

AUDIENCE: Yeah, the amount of capacity I have still in my backpack.

PROFESSOR: Let's talk about pseudo polynomial time very quickly and what does it mean, OK? And you guys will promise to look at the recitation notes and look at the other problems, right? So incentive for that. Problem two is easier than knapsack, so if you get that, that should be a good confirmation that you got knapsack.

Problem three is a bit harder than problem two, but it shows up on interviews, so you want to understand problem three. I got problem two twice in four years, so there's a decent chance that you'll get it. So you want to get to problem three, so you should go through problem two.

AUDIENCE: How many times have you got problem three?

PROFESSOR: Twice in four years, so that's the problem that you want to get to. Problem two is a stepping stone. So running time for dynamic programming. How many sub-problems?

AUDIENCE: Same, ns.

PROFESSOR: Yep. Sub-problems. How many sub-problems do I look at when I compute the answer of a sub-problem? Two, right? So order one. So this is how much time it takes to compute the answer to a sub-problem, because I have the max of two elements, so it's constant time. So the total running time is?

AUDIENCE: Order ns.

PROFESSOR: All right. Order of ns. This polynomial, is this the same kind of algorithm as Dijkstra?

AUDIENCE: Pseudo polynomial.

PROFESSOR: Pseudo polynomial.

AUDIENCE: Don't know why.

PROFESSOR: So intuitively, the problem is s shows up in your running here, and s is the property of your input numbers. It's not how many elements you have in the input. It's what's inside one of those elements. So let's see why that matters. Let's look at the practical example.

AUDIENCE: What about measuring in cubic inches or cubic centimeters? Is that a problem? It would take a lot longer if we do it in cubic centimeters.

PROFESSOR: Yeah, but then you could argue that if it's an integer amount of cubic inches, you should divide everything by that. Where this really matters is suppose you have a 100 item input, so 100 items. What's the worst case input on a 32-bit machine?

An input looks like this, by the way. How many elements you have, so 3, and then for each item, what's the weight and what's the value? So then we're going to have three weights, which I believe are 4, 2, 3. You guys have to take my word for it. And then I have three values, which are 10, 4, 7.

Let's not worry about the values. I claim that they're irrelevant. You can convince yourselves afterwards that that's the case. Let's only look at this. The weights have to be between 0 and what for the problem to make sense?


PROFESSOR: If I have a weight bigger than this, I know I'm not going to take that item. How many bits do I need to represent these weights?


PROFESSOR: It's log s plus 1 technically, but if I write order of log s, I'm good.

AUDIENCE: Times the number of items.

PROFESSOR: Yes. So if I want to represent all of them, I have order of n times log s, plus the number of bits required to represent this. This is log n. So log n is smaller than n. I'm not going to add it here.

So this is my input size. This is how many bits I need for the input. So the worst case input on a 32-bit machine is going to have roughly 3,200 bits. What's the worst case s that I can represent on a 32-bit machine? These are all 32-bit numbers.

AUDIENCE: You mean each of the weights are 32-bit numbers?

PROFESSOR: Yeah. So what's the worst case s I can represent?

AUDIENCE: 2 to 31 minus 1.

PROFESSOR: Which is roughly 4 times 10 to the ninth. So the worst case running time is? Roughly ns, right? So n times s, which means roughly 100 times 10 to the ninth, which means 4 times 10 to the 11 operations.

Now, suppose we're looking at the worst case input on a 64-bit machine. Still 100 items. What's the input size?

AUDIENCE: To the 21 total at the end.

PROFESSOR: That's how many operations. Let's go through it step by step. How many bits of input?

AUDIENCE: 6,400.

AUDIENCE: Where does that come from?

AUDIENCE: 100 items times 32 bits.

PROFESSOR: 100 items. Each item weight has 64 bits. What's the worst case s?

AUDIENCE: 2 to the 64.

PROFESSOR: 2 to the 64 minus 1. Doesn't matter too much. It's 1 times 10 to the 19th, 1.6 times 10 to the 19th. So the worst case running time is?

AUDIENCE: 10 to the 21.

PROFESSOR: 10 to the 21. So what happened? I increased my word size. The running time increased quadratically. This is not a polynomial increase. What's the problem here? Why is this the case? If I write log s as b, so log s becomes b, what's the input size?


PROFESSOR: Yep. n times log s equals n times b, so this is the input size. Now, what is the running time?

AUDIENCE: The number of operations?


AUDIENCE: n times 2 to the b.

PROFESSOR: Yep. n times s, and s is 2 to the b. So this is the problem. This is a polynomial in n and b, but here, if I write the input this way, I have b as an exponent. So if I double the number of bits by doubling the field size, my running time increases quadratically, so this is not polynomial. However, if I write it as order of n times s, this looks like a polynomial. So it looks like a polynomial, but it's not truly a polynomial, so that's why it's pseudo, as in fake.

AUDIENCE: So it's actually exponential.

PROFESSOR: It's not exponential. So an exponential algorithm for this means try all the possible subsets, and that's order of 2 to the n times n to compute the sum. This is exponential, right? You have n here. It's clear.

AUDIENCE: I know, but if you have 2 to the b, that seems pretty clear as well.

PROFESSOR: Well, you have to look inside this s and see what it means. That's the difference. So by increasing the number of items, if you double the number of items but you don't do anything to the word size, then your running time is going to increase polynomially, but if you change the field size, it's going to increase exponentially.

So pseudo polynomial means watch out, there's a trap. It's not really polynomial. If you increase the input size by increasing the number width, bad stuff is going to happen. So think of Dijkstra. What's the running time of Dijkstra? You count the number of vertices, the number of edges, and you have a polynomial in that, right? You don't have to look at the number size. You don't have to look at any weird stuff like that. Here, you do. That's the difference.

Does it make more sense? So whenever you have an input number that shows up in your running time, instead of how many numbers you have, that means there's a trap. That's pseudo polynomial.

AUDIENCE: So it's just like having some constant that depends on something?

PROFESSOR: It's not a constant.

AUDIENCE: I mean coefficient.

AUDIENCE: Once you set it, it's a constant, right?

PROFESSOR: I mean, that's true for everything, right? You can say that you have 10 to the 80 atoms in the universe, so all the numbers that you work with are at most 10 to the 80. Therefore, your running time is order one no matter what you do, and then running times are no longer useful. You have to draw a line somewhere.

AUDIENCE: I think it's just like s, if it depends on your field size and you can scale it, it's kind of like--

PROFESSOR: So asymptotic running time. What's the point of that? How do our algorithms scale? As our data becomes bigger and bigger, what happens to the running time? This pseudo polynomial thing tells you that if you're shifting to a larger number size, to a larger word size, then your running time is going to explode. It's not going to scale linearly. Still don't buy it?

AUDIENCE: I trust you, I just--

AUDIENCE: So it's only dependent on the field in this case.

PROFESSOR: Do you guys want to look over Dijkstra and see what the input to Dijkstra looks like and why that's different? Do you think that's worth your time, given that this is what's standing between you and the weekend?

AUDIENCE: What time is it?

PROFESSOR: If you guys want to, I'm willing to do it.

AUDIENCE: It's 4:07. I'll stay.

PROFESSOR: Well, if you guys want to go, you can go. I will draw this for the people who still want to know what it looks like. Dijkstra. What's the input to Dijkstra? It's a graph, right?

AUDIENCE: Yeah, nodes and edges.

PROFESSOR: What does the graph look like? It's some number of nodes, so it's a single number-- that's the number of nodes-- that has log v bits in it. And then for each edge, we have three numbers-- first vertex, second vertex, and the weight. What are the sizes of these numbers? Log v, log v, and the last one?


PROFESSOR: OK. And what's w?

AUDIENCE: Maximum weight.

PROFESSOR: Yeah. So this is a property of the field size, right? Let's look at v, actually. So we have E edges here, right? So the input size is going to be log v bits plus E times log v plus log w.

AUDIENCE: How did you get that?

PROFESSOR: Because I have the edges, so I have E of these. I only have one vertex count, but then I have E edges, and each edge has three numbers, and these are the widths of the numbers.

AUDIENCE: Is the adjacent edges to v? Is that right? You say you have E edges.

PROFESSOR: I'm assuming it's a list, so the most compact representation of edges, I think-- or it might be a reasonably compact representation is that you have the list of edges. So you have a graph that has five nodes, and then you have an edge that goes from 1 to 2, and then weight 3, an edge that goes from 1 to 5, weight 4.

AUDIENCE: If you use a number, it's not a--

PROFESSOR: Well, do I need anything else for vertices? Not really, right?

AUDIENCE: So v1 and v2 are neighbors of capital V?

PROFESSOR: So v1, v2, w, these are the first edge. Each edge has these fields.

AUDIENCE: This is the entire graph in one thing.

PROFESSOR: This is the entire graph as a list of numbers, and this is how many bits it takes to represent the graph in a reasonably compact representation. Now let's say little v is log v, little w is log w. So then this is order of how many bits do I have here? E times v plus w plus v. I just replaced the logs with these variables. This is how many bits. Now, how many operations does Dijkstra take? What's the running time?

AUDIENCE: Well, it depends on--

AUDIENCE: E log v.

AUDIENCE: Wouldn't it be E plus v? That's the fastest one, right? But I think practically, it's only going to be--

PROFESSOR: So this is E plus v log v, the fastest theoretical limit. This is still smaller than E log v. This is going to make my life easier. So this is smaller than this. If this thing is polynomial, this is polynomial for sure. E log v is E times little v. So how many bits in the input? E times v plus w. How many operations? E times v. Any exponential anywhere here?

AUDIENCE: Wouldn't you change the size of v? That looks fine.

PROFESSOR: Well, so there is a trick that v is 2 to the v, right? So you can say that this is order of-- so this is definitely bigger than 2 to the little v times v, but then you have the same thing in the input. So the input also has at least 2 to the little v times v bits.

But don't worry about that. That's too much. The point is if you're worrying about this, don't worry. The math still works out. So whatever you have here as an input, the running time is going to be a polynomial in the size of the input. What happens if you double the word size? What happens if you have bigger weights?

AUDIENCE: Everything like v is multiplied by 2, and w is multiplied by 2 and everything in this problem, right?

PROFESSOR: So if you're doubling the word size, then this is going to double, this is going to double. Everything's fine. What if you double the size of the weights?

AUDIENCE: That only adds an extra bit.

PROFESSOR: Sorry. So if you double the size of the weight numbers? So if you move from 32-bit weights to 64-bit weights?

AUDIENCE: That's still a constant factor, right?

PROFESSOR: What happens to the running time?

AUDIENCE: Nothing.

PROFESSOR: Nothing. w does not show up in Dijkstra. If it would, then we'd be trouble. It wouldn't be polynomial anymore.

AUDIENCE: But practically, you will be accessing that number, right?

PROFESSOR: Yeah, but that shows up in the cost of one operation. That's why I'm saying this is how many operations you do, how many arithmetic operations. Then the model of computation that we use is RAM, and that says that you can do any math operation in order one.


PROFESSOR: So here, my number in the input, s, which is the size of a weight, showed up in the running time.

AUDIENCE: Oh, the size of the input showed up in the running time.

PROFESSOR: So the size of the input is OK, but one number in the input showed up, whereas here, that's not the case. The weights do not show up in the running time.

AUDIENCE: So why don't we just always run Dijkstra, then?

PROFESSOR: Actually for this problem, for the knapsack problem, you can't find an algorithm that is polynomial. If you do, there's a $1 million prize for it because you just proved that p equals mp. This is the best we can do for knapsack.

AUDIENCE: Is counting sort pseudo linear, because you need to have a maximum, a range? Does that make sense?

PROFESSOR: Counting sort.

AUDIENCE: Depends on your range.

PROFESSOR: Yeah, but the range shows up under a log. You're allowed to have logs. You're allowed to have log w. You're not allowed to have w. It's dependent on the size of the input. If you double the number size, then you're going to have twice as many rounds, but you don't have an exponential number of rounds.

Sorry. You're thinking of counting sort. I thought radix sort. Radix sort doesn't matter because you assume we can do everything in-- never mind. You're right for counting sort. Sorry. You're right for counting sort. Sorry. I was confusing counting sort with radix sort. So for counting sort, yeah, it's not linear. It's linear in your range size, which is not linear in the input size, which is why we don't do counting sort. We do radix sort.

AUDIENCE: We do counting sort in radix sort?

PROFESSOR: Yeah. But radix sort limits the size of the range, right? That's the point of radix sort.

AUDIENCE: Oh. I see what you're thinking. So doing counting sort with each number?

AUDIENCE: Right, with really big numbers, taking a really long time.

AUDIENCE: Yeah, that would take a long time.

PROFESSOR: Yep. That is very true. If you try to do pure counting sort on 64-bit numbers, you're going to run out of RAM. Does this make some sense? OK. Promise to look over the other problems, and in return, given that I didn't have time to cover them here, I promise to answer any emails you guys might ask me over the weekend.

AUDIENCE: Oh. Awesome.

Free Downloads



  • English-US (SRT)