Description: In this lecture, Professor Devadas introduces linear programming.
Instructors: Srinivas Devadas
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
SRINIVAS DEVADAS: Good morning, everyone. So we have a singleton lecture today on linear programming, which is general purpose optimization technique that you can use to solve a whole bunch of problems, including ones that we've seen in 6.046 and previously in 006.
And most recently, we looked at max flow. We wouldn't have had to go through all of that pain we went through to derive a max flow algorithm if you had a linear programming package handy and all you wanted to do was find the optimum solution. You could have just run the linear program with an appropriate input, of course, that is derived from the flow network. And you'd get your optimal solution. And we'll spend a couple of minutes on that as we look at the power of linear programming in today's lecture.
But it's not just max flow. You could do shortest paths. You could do multi commodity max flow, which is more complicated than max flow and a variety of other problems. So that's that good news.
The bad news is that the algorithms for linear programming are a heck of a lot more complicated than max flow. And you can imagine that that would be the case, because it's a more general purpose and more powerful technique. The history really is that it was an open problem. Up until 1979, people did not know if linear programming was polynomial-time solvable until Khachiyan came up with this ellipsoid method, and then there's been progress sense.
But the algorithm we're going to describe today and execute on a example input is a simplex algorithm-- the simplex algorithm-- that runs in worst case exponential time. But it's a very efficient in practice. And it's held its ground, even with the advent of more efficient, from a theoretical sense, polynomial-time algorithms, namely the ellipsoid method, which actually is not that efficient in practice and new interior point methods.
So a little bit of context, let's just dive into an example of optimization in the context of politics and see how you could formulate this particular problem as a linear program. So how does politics work? You buy elections, right? So you don't want to spend a lot of money, so you want to minimize the amount of money that is required to buy an election.
And the way you buy an election is, well, campaign, but you advertise. And it doesn't matter facts that are relevant. As long as you get to the right demographic with the right message, let's assume that you're going to win the election. So that's our mathematical abstraction pf campaigning and politics all in 30 seconds.
So how to campaign to win an election. And as I said, we're going to advertise. But you do have a little bit of work to do here. That's why you need your campaign manager. And this manager is going to estimate votes obtained per dollar spent.
But that dollar is spent advertising in support of a particular case, or a particular issue. And contradictions are allowed. As long as you're sending different messages to different demographics, you're all good, right? You're assuming that people don't watch more than one channel. So you're a Fox News guy or you're a MSNB guy. You don't do both.
So now, we get at this estimate. And it turns into a table. And so you have your policy, and you've got your demographic. So you got urban, now think Detroit, suburban, I guess, you could Lexington where you live, and rural-- I really don't have any idea what that means. But presumably, there's places.
And here's our policy. You want to build roads-- kind of boring, but some people are interested in roads-- gun control, of very sensitive, farm subsidies-- you know who's interested in that-- and gasoline tax-- kind of more, this hits your pocket, so more or less everybody's interested in that.
So you tell the urban guys you want to build roads, and they don't like you. So you get a minus 2 there. So this can go, you advertise, and it hurts you. You lose votes.
Tell the suburban people-- well, typically, it's a situation where you have these nice cars, and you don't like potholes, so you like roads if you live in suburbia. Rural people have 4-by-4's. They don't particularly care. They don't care as much.
Gun control-- well, you can imagine that urban people like that. Suburban people are, hm, OK, meh. And the rural people hate it, right? You do not want to advertise on gun control in the rural areas.
Farm subsidies-- so like, don't want to deal with that. What is a farm? And the rural people love it, right? And then gasoline tax, well, urban people are commuting. And, well, they typically don't have a lot of money. So there you go.
And those are the numbers. I'm not going to justify every number here. But you could put whatever numbers you what, I mean. So let's move on. This is just a table.
It could have positive numbers. It could have negative numbers. And you still want to win this election. Regardless of how crazy the demographics are, how crazy your electorate is, you want to win the election. So as long as you have a great campaign manager, who can get you this table, it's all mathematical here on out.
You've just got to figure out how you're going to win a majority. And you could argue that all you want is to win the election. We're going to do something slightly different, which is something that's obviously going to guarantee victory. But you want to win a majority in every demographic. As the tables may be off by little, you want to be careful.
So the last thing, of course, in order to estimate how much money you need is the population here. So that's votes obtained per dollar spent. So you're getting-- it's $10 a vote, it's $5 a vote, et cetera. And so we need to translate that to votes, because that's the dollars. And you got your population here corresponding to each of these areas. And that's what you got here.
Majority, we'll assume you win in the case of ties, just to keep these numbers easy, so that's just divided by 2. So that's what you got so far. And you want to win by spending the minimum amount of money. So that's our optimization problem.
So we can take this, and we can convert it to a set of linear equations. And that's going to create our linear program-- our first linear program. And this is our algebraic setup. And so let-- we need some variables here, so let x1, x2, x3, x4 denote the dollars spent per issue.
So you got those four issues up there. So let me write that out. It's important to make sure you know what I'm talking about with respect to a particular issue. So those are our four issues. And those are our four variables. So this linear program has four variables. You're trying to discover the values of these variables to optimize, minimize your cost function.
The second thing that a linear program has-- and pretty much the only other thing a linear program has-- are constraints. And these constraints are also linear. It gets much more complicated if you have quadratic constraints, and we won't go there. These constraints that I'm going to write correspond to this statement here that says you want to majority in each demographic.
So you can imagine that because you have three demographics, you're going to have three constraints. You could have written this differently. There's just any number of variants here. And you'll get the sense of that as we go to other examples. But we'll just stick to one variant here.
So now, I want to translate everything that I've written in English over there into algebra. And so I got my minimization criterion-- minimize x1 plus x2 plus x3 plus x4-- subject to minus 2x1 plus 8x2 plus 0x3 plus 10x4 greater than or equal to 50,000. And this represents the requirement that I want a majority in the first demographic, namely the urban demographic. And so I want at least 50,000 votes there.
And I need to spend the money corresponding to the values of x1 through x4 in such a way that I get those 50,000 votes. And that represents that, and it's just reading off the minus 2, 8, 0, and 10 from the urban column. So those numbers that you see here correspond to the column, because I'm talking about the urban demographic. And you can imagine that the next constraint is going to correspond to the middle column, and the third to the third.
So I'll just write that out. I will call this constraint the constraint number 1-- I might refer to it later-- 5x1, plus 2x2 plus 0x3, plus 0x4 greater than or equal to 100,000. Call this number two. And then finally, 3x1 minus 5x2 plus 10x3.
And that's our set of constraints, but there's one more little issue that we have to be careful about, if you're being precise, and that is that there's no notion of un-advertising. And so you're going to spend positive dollars. And so x1 through x4 is greater than or equal to 0.
So that's our first linear program. And it came from this particular problem. It'd be wonderful-- and that's exactly what we're going to do for the rest of the lecture-- if we could solve this linear program and any possible linear program in an efficient way.
And so the number of variables is small n. And you can imagine that the number of constraints here, just talking about these constraints, are m constraints. And you certainly want at run time that is polynomial in n. That's our goal here.
And as I mentioned early on, it was unclear for the longest time-- well, at least not until 1979, but people had been thinking about it for a long time before that-- as to whether there was a general algorithm that would solve any linear program in time-polynomial in n. And that was resolved in '79 by Khachiyan. We'll look at a algorithm as simplex that in the worst case runs exponentially in n but is simpler to describe and is very efficient in practice.
So in our particular problem, this one, it turns out you-- and I'm actually going to come back to this in a second-- but I will just tell you that the optimum for this particular linear program with these particular numbers correspond to these numbers here. So you want to spend something of the order of $20,000 for-- so there's 100 here, so take away the two 0's-- so spend something of the order of $20,000 for the first issue, building roads, spend a bit of money for the second issue, ignore the third corresponding to farm subsidies, and spend a bit of money for the gasoline tax issue.
And these numbers aren't important, other than the fact that they happen to be optimum. So if you add up these numbers, then x1 plus x2 plus x3 plus x4 is something of the order of $21,000-- $27,000, excuse me, though I'm writing it out as this fraction. So important consideration here is that these values xi are real numbers.
That's it. It's not that they have to be integral. Clearly, there were fractions here for the optimum, some of them anyway. But in general, linear programming says the variable values are real.
There's also integer linear programming, which is NP complete, which adds the additional constraint that the xi values are integral. So it turns into a harder problem. You got polynomial-time solvable if the xi are real. You got NP complete, which Eric is going to talk about on Thursday, if the values are forced to be integral. So this extra constraint makes things worse from a complexity standpoint. We won't talk about ILP anymore for the rest of this lecture.
So I will come back to this. And I'll talk about how we can show that this is optimum without actually going into a deep algorithmic dive. But what I want to do just before that is to give you the general formulation of a linear program. It's called the standard form in CLRS, also called the general form.
In some cases, we'll look at the standard form for LP. And I want to pop up a level about this example and give you the general setting. And we'll focus in on the general setting for the most part.
But what I have here is I can either minimize or maximize-- we had a minimization problem-- for the political problem, minimize the linear objective function subject to linear inequalities or equations. And the variables, think of x as a vector, it's a column vector, or x1, x2, all the way to xn. And the objective function is c times x, so that's c1x1 dot, dot, dot, cnxn. And we just had all the coefficients being a 1 over there.
And the inequalities, they're the fun part, you can represent them as a matrix A, so A times x less than or equal to b. And notice that this is the standard form that I'm talking about. And now, I have diverged from what I had here, because I had greater than or equal to over here.
So it turns out, you'll see linear programs in different settings. Sometimes, you'll have minimization. Sometimes, you'll have maximization. Sometimes, you'll have less than constraints, less than or equal to constraints. Sometimes, you'll have greater than or equal to constraints. Sometimes, you'll have equality constraints.
We'll spend a little bit of time talking about how you can transform any given linear program into a standard form. So our standard form is going to be something that maximizes the objective function. So these are our inequalities, and they're represented as less than or equal to. That's the standard form.
And you want to maximize c times x-- again, max for standard-- such that these set of inequalities told Ax less than or equal to b and x greater than or equal to 0. So for each of the values that correspond to the variables, you want these variables to be non-negative in the standard form. And you want less than or equal to corresponding to each of the inequalities-- not equal to, not greater than or equal to, but less than or equal to. And you have this linear cost function, where you could have arbitrary coefficients, but you're maximizing it.
So that's it. So it's all about polarities, not much more than that. It's just about polarities. And if you get a linear program, a specific linear program that doesn't conform to this-- we'll spend a few minutes talking about conversions-- and it's going to be fairly straightforward. May not be immediately obvious, but we'll get to that. Any questions so far?
So I want to go back to this claim here, where I said this is optimum. Now without actually describing an algorithm to you, I'm going to be able to show you, convince you, that this value here corresponding to whatever it is, 3 million, 3.1 million, is in fact optimum. And this is something I could do, because linear programming has this powerful notion of duality.
So what is that? Well, let's just first look at our specific example here. And I'll give you a very specific observation. I'm going to give you what you can think of as a certificate of optimality. I'm going to give you a certificate of optimality for that set of numbers.
And here's how I'm going to do it. So is there a short certificate? I can imagine giving you a long proof that a particular linear programming algorithm always gives you the minimum answer, the optimum answer, walk through that, executive that algorithm on this particular example, and then you're convinced, of course, that the solution is going to be optimum.
But for this specific example, I want to give you a certificate. This certificate isn't going to work for other examples. It's going to short, because it only works with this example. But it won't work for other ones.
And so how do I do that? So the answer is, in fact, there is a certificate that shows that the LP solution is optimal. And consider that I compute this particular algebraic quantity, where all I've done is I've taken these three equations and I've multiplied them by these magical constants.
And so I'm not going to tell you how we get this certificate of optimality. But I'm going to give you the certificate. And it's going to be clear that it's a certificate of optimality. And if I take these three equations here, 1, 2, and 3-- actually, I refer to 1, 2, and 3, they refer to the equations. These are equations or constraints.
And so I take that. And obviously, if I have a bunch of equations and I multiply them out, I can certainly add them up, and I get another equation at the end of it. And it's all going to be linear.
And if I do that, I get x1 plus x2 plus 140 divided by 222 times x3 plus x4. So that's what happens to the left hand side. And the right hand side is-- you'll recognize this quantity-- five 0's divided by 111. That's what you get.
So now, can someone tell me why in the last step, why this is a certificate of optimality-- the fact that obviously, this is all algebra, once I've discovered the coefficients-- so now that I've done this, why have I just shown you that 3.1 million divided by 111 is, in fact, optimum? Can someone tell me this? Look at what you have on the left hand side. No? Yeah.
AUDIENCE: Any other solution would cost more than the amount put in spent.
SRINIVAS DEVADAS: Any other solution, but I want you to relate that to-- what am I spending?
AUDIENCE: You're spending 3,100,000-- like, it's the same thing.
SRINIVAS DEVADAS: Yeah, but I mean, this was a claim. This was a claim-- and at this point, an unproven claim. It was an unproven claim. Yeah, go ahead.
AUDIENCE: You know that the left hand side of that inequality is less than or equal to x1 plus x2 plus x3 plus x4.
SRINIVAS DEVADAS: Correct. And what is x1 plus x2 plus x3 plus x4, to be clear?
AUDIENCE: The thing you're trying to minimize?
SRINIVAS DEVADAS: Yeah, exactly, the thing you're trying to minimize. Exactly. You're almost there. but the key observation here is that x1 plus x2 plus x3 plus x4 is greater than or equal to x1 plus x2-- because all of these are positive quantities, remember-- 140 divided by 222, that's less than 1, x3 plus x4, correct? So given that, I can say that this is greater than our equal to 3,100,000 111. It's because of this observation that it's a certificate of optimality.
She has her head down, OK. Great. So that's pretty cool. Just cooked up these coefficients from somewhere, pulled them out of a hat, you're all convinced now that the value we got was optimum. Did not run an algorithm. Maybe I ran an algorithm-- of course, you ran an algorithm to get those coefficients, right? Well, how did those coefficients appear?
So we're not going to spend a whole lot of time on this. You'll see this likely in a problem set or perhaps in section. But in general, I won't worry too much about duality, other than knowing the concept. And this notion of LP duality essentially says that what just happened wasn't a coincidence. You can do this all the time.
There will always be, for a linear program, a short certificate of optimality that corresponds to some set of coefficients that you can do this particular math with by taking these linear constraints, multiplying them out, adding them up together, and showing that you have a lower bound on-- in the case of this problem-- you can't get lower than this. And therefore, for a minimization problem, when you reach that, you clearly found the optimum.
And that's the notion of LP duality. And the basic theorem-- and this is really more as an FYI, we won't prove this theorem-- is that if you had the standard form for the LP, which I'm just writing down again here, where you had Ax less than b, x vector greater than or equal to 0. So that's identically what I had up here, or done here, corresponding to the standard LP form.
Well, there's a dual-- this is what's called the primal form. Usually, if you don't say it, you think of it as the primal form. And if it's dual, you call it a dual.
And this is primal form of LP. This is a dual form of LP, or dual LP. And the dual LP flips everything. And it's not just negation, but transposed, and the actual variables also swap functionalities. So it's really pretty cool.
So your max turns into a min. The c gets replaced by the b, which is on the right hand side of the inequalities. And your constraints are A transpose, y greater than or equal to c. So there's a flip there as well. And y is greater than or equal to 0.
So there's a bunch of things that's going on here. And these two problems end up being equivalent-- the primal and the dual, you can always do this. And essentially, what is happening here is that you're solving these two problems simultaneously. And there's lots of algorithms that keep flipping between these two forms for efficiency.
But ultimately, what ends up happening is you see that the actual constraints that you had here corresponding to the b constraints turn into-- the b ends up in the cost function here. And that's essentially what's happening out here with respect to multiplying these equations out with particular coefficients. So as I said, this is really more as an FYI.
This is an obviously interesting and an interesting proof of optimality, which is a different kind of proof from proving an algorithm correct and applying that proof to a particular instance. That's the kind of thing that happens in LP, especially when you flip from primal and dual forms. So I'll leave it at that.
What I'd like to do is give you a sense of how we can convert to standard form, so you can apply an algorithm that-- for example, you have a program and it only requires standard form. It runs on standard form. Let's go over it really quickly. This is not going to take very long-- a translation from different kinds of LPs-- and we had a slightly different here for our political problem that had a minimization-- and how would we convert that to standard form.
So it's probably just one conversion here that's tricky. So suppose I want to minimize minus 2x1 plus 3x2, and I want to convert it to standard formal. All I have is a standard LP solver. What do I do?
It should be easy. What do I do if I had a solver that was maximizing, but I want to minimize a quantity? Just switch the signs, right? So negate to 2x1 minus 3x2 and maximize. So that was easy.
This is a tricky one. Suppose xj does not have a non-negativity constraint. So it just happens to be the case that it's not dollars but it's some other quantity that can go negative. It might be profit or loss. So that quantity represents profit and loss, so it could go negative if it's loss.
So I don't have this constraint in my original problem specification. But my standard form and my LP solver requires this entire vector to be non-negative. So I got a problem here. I can't use my standard solver, because of this non-negativity constraint. So how do I fix that? How do I turn it into a problem that allows the standard solver to be used? Yeah, go ahead.
AUDIENCE: You can break it up into two variables, like xj1 and xj2, so xj1 minus xj2 equals xj, and both could have [INAUDIBLE].
SRINIVAS DEVADAS: Perfect, great, that's good. Here you go.
So what you do here is take xj and replace it with, let's say, xj prime minus xj double prime. And you have xj prime greater than or equal to 0, xj double prime greater than or equal to 0. But depending on the particular values in whatever solution you're exploring are the final solution, you may have an actual xj value that's negative or positive. So you added an extra variable here to your linear program.
And a couple more real quick, suppose that I have an equality constraint corresponding to x1 plus x2 equals 7. What do I do with an equality constraint where I have x1 plus x2 equals 7? Yeah, go ahead.
AUDIENCE: You can say x1 plus x2 is greater than or equal to 7, and x1 plus x2 is less than or equal to 7.
SRINIVAS DEVADAS: No, you can't do less than or equal to.
AUDIENCE: But then you can flip the signs to--
SRINIVAS DEVADAS: Ah, then you could flip the signs. So you have two steps there, good. So your less than or equal to needs another multiplier. So what you end up doing is something like x1 plus x2 greater than or equal to 7. And then you need-- if you do what the gentleman just said-- and flip the sign, you get minus x1 minus x2 greater than or equal to minus 7.
Is that right? No? I messed up? Oh, I want less than or equal to. You're right, you're right. So I need less than or equal to-- that's right, of course, thank you. So I need less than or equal to in both places.
So that's the standard form. I needed less than or equal to. Good. What you've done is increased the number of constraints by one. Did I get this right the second time? All right.
So that's pretty much it. The last thing, which I won't really write out, is, which we've done here already, greater than or equal to constraint translated-- I won't give you an example of this; we have this already-- translates to less than or equal to by minus 1 multiplied. So we have to invoke that in order to do the equality anyway.
So you're in business. If you have a standard LP solver, you can take pretty much any optimization problem that is linear in terms of its objective function and has linear constraints, and you can transform it into LP. If you had non-linear constraints, there's lots of work that goes on in linearizing those constraints and using LP solvers.
It's a very practical thing to do. It may be something you'll end up doing, invoking these powerful LP solvers-- typically, they're commercially available; the best ones are commercial-- and use it to solve your particular problem. It turns your algorithm design problem into a reduction. And so you'll spend really the next couple of weeks thinking about reductions.
We'll start that up right now, where we'll take existing combinatorial problems, for which we already know algorithms for, and you're going to reduce them to LP. Just to give you a sense of what the power of LP is, but this notion of reduction is very powerful, you can use it to do complexity proofs. Here, we're just using it as a convenience in today's lecture to use our LP hammer.
So let's say that I have our favorite problem of the week, namely max flow. And I want to convert that to LP. So go back a week ago, and right about this time a week ago, and we'd set up the max flow problem.
And let's assume that we went back there. And we didn't talk about augmenting paths, and we didn't talk about residual capacities or min-cut or anything like that, but we knew LP already. And we just want to solve max flow using LP. So let's do that.
So this is maximum flow. I'm not going to bother with converting to standard form. We know how to do that, given what we just did here, over there. So I'll just do whatever I want to keep things simple.
Max flow is obviously a maximization problem. And using the same notation we've used, it's not going to look like Ax and b, just because I want you to recall what max flow is. And we're going to translate that. And the values of these variables-- or the names of these variables, whether they're x or f, it should really matter. We know how to do LP at this point-- we know how to formulate LP, I should say, at this point. And we're assuming that we have an LP solver.
So what I want to do here with the maximum flow problem is maximizing the flow value. And it's simply, you grab the source, you have a variable associated with the flow from the source to every other vertex of v. And you have to maximize that. So that was the setup for max flow. I'm not changing that.
What do you think the three constraints, or whatever set of constraints that we have here, are going to correspond to the LP? You spent a week on max flow, looked at the problem set, what constraints am I going to have to put up there? I'm going to have to put up capacity constraints. That's an obvious one.
What is another one? Conservation constraints. All flow entering a node that is not the source or the sink has to leave it. In the original network, is there a concept of negative flow? No, you will define it going in the other direction. So we did talk about negative numbers, et cetera, but you're going to have positive quantities, especially if you look at net flow, the version of the flow that we zoomed in on in the Tuesday lecture from last week.
And you also have-- in the general setting, you're going to have these skew symmetry constraints as well. So the three things that you need here are skew symmetry, conservation, and capacity. So you have such that f u, v equals minus f v, u for all u, v belonging to V.
And depending on the kind of network that you have, if you constrain it to a sudden type that you don't have these two-way edges, you could certainly remove some, if not all, of these skew symmetry constraints. Important ones are conservation and capacity. And this should seem pretty familiar to you.
But the key-- the reason I'm writing these all out is primarily to ensure that you understand that these are all linear constraints. So that's pretty much the only thing that you need to observe here. Obviously, these constraints you've seen many a time from the two lectures last week. But notice that they're all linear. And finally, this one is f u, v less than or equal to c of u, v for all u, v belonging to cap V.
So this is f. That's a variable that's less than another constant, clearly linear. Doing a bunch of sums here. I could obviously have multipliers, scalar multipliers. In this case, for conservation, I don't have scalar multipliers, but clearly linear. Skew symmetry, got two variables in here. One of them is a negation of the other, clearly linear, so that's why this is an LP.
And so you might say, well, I know better, max flow is much more efficient than any LP solver that's out there. And you would be right. If you have a max flow problem of this variety, it's difficult to imagine that you would get performance, empirical performance from running an LP solver.
But this generalization of max flow that's a multi-commodity max flow, where you just don't have one commodity flowing through. You may be counting cars and trucks on a road, or there's two different kinds of liquid flowing through the same pipe, whatever, gas or liquid. And so when you have multiple commodities, you may have a linear but more complicated cost function that's a function of the flow of each of the commodities. And they may have a certain weight associated with them. So there's a lot of things that could be more general-- there could be more general settings corresponding to max flow.
And I'll just leave you with the thought that you could simply have two commodities. And we'll just call them 1 and 2. And so now, you have the f1's and the c1's and the f2's and the c2's. Each commodity has to be conserved.
But what about the capacity? What do you think happens with the capacity? Let's just assume these are two different kinds of cars. So what would the capacity constraint look like? Yeah.
AUDIENCE: You can say either c1 or f1 plus f2 is-- for each edge, you can add them together or you might take the linear [INAUDIBLE].
SRINIVAS DEVADAS: Exactly, that's right. So good point. It may be the case that I have distinct capacities. And in fact, if you have completely disjoint problems, you're right in that you can solve them separately.
But actually, the more interesting case would be that you have a single capacity c, so you'll have-- let me just write this out here. If in fact you had two distinct things, so if you had f1, c1, f1, c2, the question is, do you have two distinct, disjoint optimizations, in which case you just use max flow twice.
On other hand, what's more interesting really-- and I should've used this example for starters-- but here's a better one. You have two commodities and a single capacity. So the road is a good example. Both the cars and the trucks share the same road. It has a certain capacity.
And now, your capacity constraint is looking like f1 plus f2 over here. And that's the flow through the particular edge uv. So you have something like f1 u, v plus f2 u, v is less than or equal to c u, v for this total capacity. And that's pretty much it.
So that is linear. The nice thing is that it's linear. You could put weights on it. If you wanted to claim that a particular commodity 1 uses up-- because it's a truck, it uses up more space on the road. And you can accommodate fewer of them. You could put a multiplier in there. Still say it's linear.
So that's the power of by having an LP engine. You could translate problems that are not exactly max flow, that are multi-commodity flow. You may have additional linear constraints that you could add, and you could still use your LP package. So that's the reason why this is interesting and powerful.
So that is kind of an obvious, corresponding to max flow. Let's look at something that's a little less obvious. And it's going to be a little tricky to convert the shortest path problem to LP, not a lot of work but one little observation that's going to be important to make in order for the whole thing to flow through or actually work out.
So we all know the shortest path problem. We want to find-- let's just call it the single source shortest path problem. You have a specific source. That's going to turn into the point from which you're going to start computing the distance. That's what Dijkstra does, and that's Bellman-Ford does.
And so this is from vertex x-- s, excuse me, s. And what I want to do here is obviously set it up as a set of linear constraints. If I have dv corresponding to the distance from the source-- so dv represents the distance from the source-- and eventually I want dv to be the shortest distance from the source.
That's our notation for shortest paths. dv represents an existing path-- it may not be the shortest path-- from s to v, the value of that. But dv monotonically decreases as you run through. It's initially infinity in Dijkstra going back to Dijkstra. And then we shrink it through a process of relaxation.
Now I want to try and model that-- I want to try and model all of this as an LP. So it's not immediately obvious-- the thing that the flow networks had, where we had these constraints. We have capacity constraints and conservation constraints. And we could turn that constraint into an inequality. And it was pretty smooth. It's pretty easy.
So what I need to do here with shortest paths is something that's a little more subtle. So what basic constraint do I have in a shortest path algorithm? What's an inequality-- you remember an inequality from shortest paths that we kept talking about? The triangle inequality.
So we're going to have to go with the triangle inequality and take the triangle inequality and use that to create an LP formulation of shortest paths. In particular, what we have here is that I could write dv minus du is less than or equal to w u, v for all u, v belonging to E. And that's the triangle inequality. And I'm going to have d of s equals 0. That's the only thing that I start with.
And so what's happening out here is simply that there's different ways of getting to v. And my shortest path is going to be the best way of getting to v. So in particular, the way you want to think about this is that if I have a v and I can get to it from, let's just say, u1 and u2. And maybe the source is over here. And these are the only two edges that can get to v.
So I'm just looking at a fairly limited setting. u1 and u2 are going to have to be the two vertices. One of these two is going to get me to v. And I got w u1, v here. And I got w u2, v over here.
And so what this is saying is, I'm going to have to write this out for each of these edges. For each of these edges, I'm going to have this constraint. And that's says that the dv value, if I want the shortest path, should obey both of these constraints. And if I want to obey both of these constraints, one of them is going to be my limiting constraint. And I'm going to get the min of those two. Correct?
So in effect what this translates to is that it's an AND, right? So dv have minus du1 is less than or equal to w u1, v. dv minus du2 is less than or equal to w u2, v. That's an AND, because I'm putting both of those constraints in here.
And that essentially means that dv is going to be the min of the two quantities-- the du1 quantity plus the w u1, v, and the du2 quantity plus the w u2, v. That make sense? Ask me questions if this is unclear.
So that simply corresponds to the fact that I'm doing an AND over here. I'm adding all of these constraints in there. So I'm applying the triangle inequality to every edge, to every relationship between a vertex that has a path ending at it. And you're pushing it forward to this vertex v, all the different ways that you can get to v. In this case, there is two sets of ways-- one from u1 and one from u2. And the last step is a minimization step.
So you think you're done or we're done, but we're not quite done, because what's missing here in terms of my formulation of LP? What else do I have to do here? Well, sure, non-negative-- let's do that. Sorry?
SRINIVAS DEVADAS: Objective, who said objective? You again? So we are missing an objective function. Now shortest path is what kind of problem again? Short means minimum? Minimum, height whatever.
So do I put a-- what happens if I put a minimum in here? And let's say that I do something like sigma over V, cap V, dv, because I want to minimize-- or I could pick a particular one. I could pick a particular single source, single destination, and I put a minimum there.
What happens? Does this work? What's the solution to this problem, if I minimize the distance? 0, because the zero value is going to work. So there's something-- I haven't put in the constraint that I do want a path. I do want a path from s to v for any v that matches one of the quantities in the min, because the min says I have equality.
The big issue here is, this is a less than or equal to, and that's why the min doesn't work in the objective function. It's a less than or equal to. But this min over here, which is the definition of a shortest path, is saying that it's either equal to this quantity or equal to that quantity. There's an equality over here that is missing from this side. And that's the key observation.
Once you observe that, that need equality for one of these constituent quantities of the min, then you'll see that what you have to do is simply change this min to a max. So you say, well, how the heck did a min get changed into a max? And I'm not sure I'm going to convince every on of you in the next minute or so.
But the bottom line is, it comes down to I do have a min already in my inequalities, because I'm ANDing each of these inequalities, and I'm putting down each of those inequalities in there. So each of them is going to force me to find the best solution, because they're going to constrain me to not go via u2 if u1 is better, because the other constraint corresponding to u1 is going to force me down. So there's an additional min in there, because the ANDing of the less than or equal tos.
And then in order to actually force the equality for one of those, I need to push up as hard as I can, or as high as I can. So think about it. Play around with a couple of examples. Choose a simple example for starters. And you'll see that this is the correct formulation.
So you can see that it's not completely clear in some cases how to transform problems to LP. But even in those cases, sometimes you can. So there's just a ton of different problems, a good skill to have to be able to take combinatorial optimization engines, like LP or even max flow, and be able to translate problems to them. It's something that you'll probably do if you stick to algorithms in your careers or exploiting available algorithm packages.
So the last thing I want to do here for the rest of the time is to give you some sense for how an LP program is actually optimized. How can you possibly take the standard LP formulation, which is a general setting. You know nothing about shortest paths, let's assume, nothing about max flow. It's not about a specific problem. It's about the general setting.
How can we solve the general setting, because that was the theme here anyway. You had this engine, and you want to use this engine. But now, how do you build this engine?
So what we're going to do is look at a fairly simple example of the simplex algorithm. And this algorithm is in the textbook. And it'll be in my notes. So I'll get as far as I can. It's not that complicated to describe, especially from an example standpoint. But I may not get through all of the steps to get you the optimum for this particular example given how much time we have.
The most important concept in simplex is yet another form of representation for simplex, which says that you can represent the LP, not in standard form, but in slack form. So I'm going to tell you what slack form is. And then what we're going to do is, the flow of simplex, algorithmic flow, is to convert one slack form into and equivalent.
Obviously, you don't want to do something that's incorrect, but it has to be an equivalent slack form, whose objective value has not decreased and has likely increased. So you're guaranteed that the objective value has not decreased. You're not guaranteed that it's increased.
And then we're going to keep going till the optimal solution becomes obvious. And you might say, how is this obvious? That's the reason why I talked about the short certificate of optimality. It's definitely a relationship between the termination of simplex and the fact that you can now say, hey, I know I'm done here, it's kind of obvious that I can't do any better. And hopefully, you'll see that by the end of this lecture in this simple example.
So that's it. It's an iterative algorithm. It's exponential, unfortunately, because this takes m plus n, choose n iterations in the worst case, where n is the number of variables and m is the number of constraints. Most of the time, it does a lot better, but that's the only bound that you can actually prove in the worst case. And so you're stuck with an exponential algorithm if you're using simplex worst case.
We won't actually do much analysis on simplex. It's really out of scope for 046 in terms of the analysis. The actual algorithm is certainly within scope.
So what I want to do is give you some sense for what the slack form looks like. And we'll do a couple of iterations of simplex. And we'll get as far as we can before the end the lecture.
So we'll take a different example from our political example. It's similar in size. And I want to explain to you what the slack form is and why it's interesting.
So what I want to do is maximize 3x1 plus x2 plus x3 subject to the constraints that x1 plus x2 plus 3x2 is less than or equal to 30, 2x1 plus 2x2 plus 5x3 is less than or equal to 24, 4x1 plus x2 plus 2x3 less than or equal to 36, and then non-negativity constraints, x1, x2, x3 greater than or equal to 0. So that's our example problem. I'll leave it up there.
You're going to convert this to slack form. And so what is the slack form? We're going to introduce an additional number of variables that correspond to the number of equations that we have. So we're going to introduce, in this case, three new labels, because I have three equations.
And the slack for this problem looks like this. I'm going to have z equals 3x1 plus x2 plus x3, same as before. And then I'm going to have variables that represent-- these are called basic variables. And the original variables are called non-basic variables.
So I'm going to add three basic labels, x4, x5, and x6, corresponding to these three constraints. And they're going to represent slack in the sense that they're going to correspond to how much slack you have in the inequalities that you have in the original problem. So if x4 happens to be 0, then you're jammed up.
You have no slack, because x1 plus x2 plus 3x3 equals 30. And increasing any one of them will violate the constraint. So that's just simply the notion of slack. It's how much room do you have.
x5 is 24 minus 2x1 minus 2x2 minus 5x3. And the last one is 36 minus 4x1. This is very mechanical up to this point. And so I'll call this set of equations, equation I.
And what I'm going to do is I'm going to now work on a space that corresponds to x1, x2, x3, x4, x5, x6. So I'm going to have these solutions that now have six values associated with them, as opposed to just three values, because I've added three variables, the basic variables, to my non-basic set. So the original variables are non-basic, just to differentiate.
So so far it's just set up its definitions. We can think about up running through iterations of simplex. It takes about three iterations here in order to get to the point where the optimum is obvious. So you're going to convert through three slack forms. And then finally, when you get to this fourth slack form, you see that you have an optimal solution.
And how does that work? You're going to have the notion of a basic solution, where we set all non-basic variables to 0. So in this case, what we're going to have-- and then we're going to compute values of the basic variables.
So our objective function here is going to be 3 times 0 plus 1 times 0 plus 2 times 0, which is obviously 0. And the values x4, of course, is going to be-- because all of these are 0-- is going to be 30. x5 is going to be 24. And x6 is going to be 36.
So this is kind of a trivial starting point. So you can think of this as 0, 0, 0, the solution that you're looking at, 30 24, and 36. So that's our starting point, which doesn't really tell you much.
But now comes the key step, where we're going to do something that's called pivoting. And in pivoting, you're going to swap a basic able with a non-basic variable. It is a step that requires some intelligence. But the basic step is a swap.
So one of the basic variables is going to become a non-basic variable and vice versa. And how do we select this? Initially, you can kind of do this in an arbitrary way. It gets a little more subtle as you go along. You don't want to do things in a random way.
But let's just start with what pivoting actually does in a more generic setting. The basic step is select a non-basic variable-- let's call it x sub e-- who's coefficient in the objective function is positive. You can always redefine things if you can't find something like that. But we won't go there.
And then what we're going to do is we're going to increase the value of x as much as possible. And we always have constraints, of course. So we can do that without violating any of the constraints.
And then at this point, variable xe becomes basic. So it's going to turn into the left hand side. It's going to move over. You're going to swap-- the x1 might be over here, over here, and you got to rewrite these equations so the x1 becomes basic, for example, over to the left hand side, and the variable that it replaced goes over to the right hand side.
So you can think of this as Gaussian elimination, except with inequalities. There's definitely relationships there, if you recall your Gaussian elimination. If you don't, don't worry about it.
So xe becomes basic, and some other variable becomes non-basic. The values of other basic variables and the objective function may change. So we'll do one pivot step at least, so you get a sense for the algebra involved. And it becomes a little more concrete. To motivate you further, you'll be doing this in problem set 8 on a different example.
So what did I do here is we're going to select a non-basic variable. So let's just select x1, lexicographic order or numeric order, let's select x1 is selected. That's the non-basic variable. And what I want to do is increase the value of x1. So I want to increase it without violating constraints.
Now, which of these constraints do you think is going to cause trouble first with respect to increasing the value of x1? x1 is now 0. We're at ground level, we have all 0, things are feasible.
Now, as I start increasing x1, remember, you have non-negativity constraints associated with each of the x4 values as well. That's what these basic variables represent. So don't forget the fact that the constraints are violated. You need all the xi's to be greater than or equal to 0.
That was true for the non-basic variables. It's also true for the basic variables. So a violation of a constraint implies that one of the currently basic variables goes negative. That's exactly equivalent to the original inequality not being satisfied.
So which of these constraints, do you think, is going to cause trouble here? Just look at it, and should be able to look at the three equations and tell me, as I increase the value of x1, where am I going to hit my limit? x6, yes, absolutely correct.
So that's because of the minus 4 here. This is a big multiplier. I got a minus 1 here, a minus 2 here, and a minus 4 here. And if we just look at 4, the magnitudes, 4 is bigger than 2 and is bigger than 1, so it's going to be that third constraint.
So third constraint-- you can obviously compute this numerically or mechanically. It was just easier to do this in this example by eyeballing it. And so third constraint is the tightest one. And it limits how much we can increase x1. So I'm going to do my second step up there, which corresponds to rewriting x1 as these other variables.
And now, I got x1 on the left hand side and x6 on the right hand side. And now, it's just merely a matter of substitution. Once I've done this, I'm just going to jam through and go in and I'm going to rewrite the other equations with x6 on the right hand side. And that is, I'm going to replace x1 with the eyeball equation. And that's really a simple substitution.
So at this point, what's happened is, because the third constraint representing x6 was the one that was chosen, what's happened is that x1 and x6 are going to interchange roles. x1 was non-basic, it's going to become basic. x6 was basic and is going to become non-basic. And that's basically the essence of the simplex algorithm.
The iteration and then the convergence and all of that is, as I mentioned, going to require getting to a point where the optimality is obvious, but we won't be doing any proofs corresponding to conversions for simplex or any other specific LP techniques in this class. Maybe some constraint techniques, I take back what I said, but certainly not for simplex.
But I just want to give you a sense for the flow here. And so let's just go through this last thing here in terms of finishing off the pivot xi's. And I want to show you what the equations look like. And if you just keep doing that, at some point, you'll converge.
And so what happens here is you have z equals 27 plus x2 divided by 4 plus x3 divided by 2 minus 3x6 divided by 4. x1 equals-- so there's a bunch of algebra here that I'm obviously skipping over, but it's simple algebra. And I have x4 over here, 21 minus 3x2 divided by 4 minus 5x3 divided by 2 plus x6 divided by 4, and then one last thing here.
So that's my pivot step, that I've flipped x1 and x6. So now you ask, what was the point? What was the point of this? Well, the point of this was that you actually increased the objective function value and, in this particular case, quite significantly, while maintaining correctness.
And so let me just make these observations and conclude here, because then that gets us to the point where you've seen the details of one pivot step. And you can imagine applying it over and over to a convergence. And let's just look at the original basic solution, which as you recall was 0, 0, 0, 30, 24, and 36. And this is simply x1 through x6.
This original basic solution suddenly satisfies these equations-- equations II, if you just put them in there. And it makes sense that it will have the objective value of 0, given equivalents, but you can verify that by saying that you have 27-- the original had the objective value of 0, because all the x sub 1, x2, x3 were 0, so that was an easy check in the set of equations corresponding to the first set, which I've erased at this point. But no matter.
And if you look at what I have here, I have 36 equals 0. So the objective value here is 0. It matches what you had before. But the basic solution for II, I'm going to set the non-basic values.
So what is a solution here? The non-basic values are 0. So the solution is going to be 9, because 9 is non-basic. x1 is now non-basic. x2 and x3 are still basic. And then I have 21 and 6. And x6 now has become basic.
So the way I get this solution is simply by plugging in 0's on this aside. And I get 9, 21, and 6, because I just plugged in 0's on the right hand side. So that's how I got a new solution.
And if you look at the objective value for this, the objective value, you can look at this objective value simply by looking at the original problem. And the original problem had 3x1 plus x2 plus x3 as the objective value. And so if you go off and you see, well, that you had 0's for the other ones, but you have 3 times 9, so we have an objective value of 27.
So this flip of our pivot basically got you from an objective value of 0, while maintaining correctness, to an objective value of 27. And you can look at this in the notes or in CLRS, but you have to do two more pivots corresponding to two other variables-- the same grungy stuff that I went through here, substitution after selection. And the objective value is going to increase.
And you might ask, how do I know that I'm done? And so that was the last thing here, which is increase the value until it becomes obvious-- no, this was pivoting, I'm sorry-- increase the value pivot until it becomes obvious what the optimum is. And what ends up happening is you end up with an objective function.
In this case, the objective function mind you, this thing over here is the objective function. And notice that it has a negative coefficient on the variable that was actually the part of the first pivot. So x1 and x6 were a part of the first pivot. And this got a negative coefficient.
So what ends up happening here is you end up getting something like 30 minus something minus something minus something, where you have xi values over here. And when you set these to be 0, that's the best you can do, because these are all negative quantities. So I'll just leave you with that. Hopefully, you understood how we do the pivoting-- sub through it three times, and then you get that objective function, and the optimum value is 30. See you next time.