--- Video Title: Cook-Levin Theorem Description: Theory of Computation https://uvatoc.github.io/week11 25.3 Cook-Levin Theorem - Defining NP-Hard and NP-Complete - The Cook-Levin Theorem - NANDSAT - Proving NANDSAT is NP-Hard - Implications of P=NP David Evans and Nathan Brunelle University of Virginia --- Our definition of NP-hard, these are problems that are as hard as the hardest problems in NP are harder. These are problems that every problem in NP can be reduced to. If there's some problem that is NP-hard, every problem in NP can be reduced to it. G could be outside of NP. We could pick some problem that we know is in exponential time. Or something else, right? But it says every problem that's within NP can always be reduced to this problem. So that's what NP-hard means. It's a hardness property. And if you were that hard or harder, you're within class NP-hard. NP-completeness says it's NP-hard but also within class NP. To be NP-complete, this has to also be within class NP. So that means in order to prove this property, our goal is to prove that every problem within class NP can be reduced to 3SAT. Don't want to go through every problem. There are infinitely many. So if we can find one problem that is NP-hard, and so that that can be reduced to 3SAT, we're done. Because the definition of NP-hard says every problem within NP can be reduced to that problem. So if we have one problem that's NP-hard, and then we can show that that problem can be reduced to 3SAT, we're done. So here's the problem. The problem we'll call NANSAT. This proof is in the book. We're going to go through what I think is the key insight part of it. We're not going to go through all the technical steps. There are quite a lot of things to actually make the proof work. The problem that we're going to use is this one. Its input is a NANSERC program that has N inputs. It will output one if there is some string of length N. So the string is finite, but there's no constraint on N. Such that running the circuit on that string outputs one. Otherwise, the output of NANSAT is zero. The name of it as well as the ways defined should make you think, well, this is kind of like 3SAT. So 3SAT we said there's some assignment to variables that makes this formula true. So that's where the satisfiability name comes from. This is, there's some input to this program that makes it output one. If this was already an interesting problem, and we could show that this was NP-hard, we'd be pretty close to done. What we probably should know first is whether this is in NP. We're not going to get the property we want that helps us separate P and NP if our problem is not within NP. So do we think NANSAT is within NP? Two ways of thinking about class NP. And the one that's definitely should start to feel more useful to you, even if it's maybe not as intuitive, is is there a witness? So when the output of this function is true, is there some polynomial length witness that can convince you that it's true? Yeah. There's a reason W is the name for that. It's the witness. That's a witness, right? Our goal is to have something that if the output is one, convinces us the output's one. For any value where Q, Q is the input to NANSAT, any instance of this problem where. Q is a circuit that there's some input that makes it true, we can find a witness that proves that there is, and that's the input. Can we verify this in polynomial time? So we need some verification function that will take Q and the witness and tell us if it's valid. What does the verification have to do? Yeah. Just evaluate it. So it's going to evaluate the circuit that Q represents or the NANSARC problem. And if the output is one, it's a valid witness. If the output is zero, it's not a witness. So that works as a witness and it's easy to verify. Okay. So it is within NP. What we want to know is that it's NP-hard. Between those two, we know it's NP-complete. So to know that it's NP-hard, well, this is what we've got to do, right? This is our definition of NP-hard. So we've got to show for all functions that are within NP, there is some polynomial time reduction from that function to NANSAT. This sounds like this is hard to do. So you haven't had too many reductions yet, but you've seen a few, and they're all very specific to the function. Even the one we just talked about where you can use the generic function, well, use the function itself to do a polynomial time reduction between a polynomial time function, we need to know what the function was. So we're going to need to do something that only uses the definitions here. We can't make any other assumptions about F, but we do have one big thing we can do. It's in class NP. That means there is some verifier that runs in polynomial time on a polynomial length witness. So if we can show that that leads us to a reduction, now we can use that verifier and use NANSAT to verify any function, just because we know the verifier exists. We can't assume anything about it other than it exists. And this is what we want to find. So we want to know that there exists some reduction such that evaluating NANSAT on the output of that reduction on X will give us the value of F of X for any F that is in class NP. We need to formalize our definition a little more carefully to do this. Because in order to do this, we're going to need to use the verifier and use the witness to construct R. We've got to construct R somehow in terms of the verifier that we know exists. Let's formalize that a little bit more carefully. First, I should make the correction I made earlier. So this only works when the output is one, there should be a witness, and we should be able to verify it. So we know that we've got some verifier. What do we know about the verifier? Good, yeah. So the one thing we know about it, we know the verifier is NP. That's a requirement. That comes from the definition of class NP. That the verifier must be within P. And the verifier is a function from 0, 1 star to 0, or 1. It's a function that has this property that for all possible values of X, there exists some witness. We don't want the witness to be in 0, 1 star. Because we have some constraints on the witness. What's the constraint we have on the witness? Yeah. So we need a length constraint. We're going to give X a length. So X has length N. The length of the witness can be bigger than N, but it's got to be polynomial N. So the witness is in 0, 1, some polynomial in N. A is some natural number. So the witness is polynomial N, its length is bounded. The verifier is going to take an input of some length. The verifier gets two things. You can't just get the witness. It's got to get something else as well. What else does it get? You should be awake enough to answer this question. Don't trip on the stairs or if you want to do like jumping jacks. So what are the inputs to the verifier? It needs to be able to take a witness and know that that witness proves that the result is correct. That when the output of the function is one, like we saw the longest path one. When the output of the longest path is one, that means there is a path of length at least N. The witness is that path. What inputs does the verifier need to do that? You need X. You need the original input and you need the witness. Because it doesn't make sense to verify that the output is correct if you don't know what the input was. The inputs to the verifier are going to be X and W. And it's outputting a 0 or a 1. So that means if we think of encoding on a Turing machine, we're just going to paste those together. We may need some kind of separator or something else to deal with it. But it's going to be an input of length N plus N of A. It's a sum bounded length. I'll introduce a new variable to represent that. Some length, which is finite and is polynomial in the original input length N. So, how long does the verifier take? In general, all we know is it takes polynomial time. What about the one for NANSAT? So NANSAT, we said we can verify it by simulating a circuit. How long does that take? The number of lines in the program? We know that. The length of Q gives us that size. But in this case, we've got a verifier for some function in NP. All we know is it's polynomial time. But we do know it's polynomial in the size of the input. And what does it mean to be NP? There's a Turing machine that runs it in that number of steps. So can we simulate a Turing machine with a Boolean circuit? If you have an arbitrary Turing machine, can you simulate it with a Boolean circuit? Which is more powerful, a Turing machine or a Boolean circuit? Okay, good. The circuit only works on a fixed size input. So in general, we cannot simulate a Turing machine with a circuit. But this is not simulating any Turing machine. This is simulating a Turing machine that we know runs in some polynomial in L number of steps. The number of steps that takes the verifier to run is L to the C for some constant C. Can we simulate a Turing machine that runs for a bounded number of steps with a circuit? Yes, good. How do we know we can do that? Remember, we wrote our Turing machine execution model as a loop. So we definitely can't do loops and circuits. If we know a bound on the number of steps, what can we do to the loop? Yeah, we can unroll it, right? We know the maximum number of steps. If we know the maximum number of steps in the loop, we can unroll the whole loop. It's big. It's L to the C steps. But we know that there is some straight line program that does what that Turing machine does. Because we can just unroll it for that number of steps. And now, well, we've got that. So we've unrolled it. And we know the verifier exists. And we know it runs in that number of steps. So we can actually simulate the verifier with a NANSERC program of polynomial length. And if we can simulate it with a NANSERC program, this is what R of X needs to do is transform whatever F is, the verifier for F, which we know exists. We don't know what it is. But we know it exists and we know it can be simulated by a Turing machine that runs for some number of steps that is polynomial in the length of that input. So we can unroll that and we can make a circuit that's going to be polynomial size in the original input. And then we can use NANSAT to verify that. We can run the verifier that we needed on this input. And remember what NANSAT does? NANSAT outputs one if there's some input that makes that true. So if there's some input that makes the verifier true, what does that mean? It means there's a witness, right? There's some input that makes it true. So now we've solved that. So for cases where F of X is true, that means that the circuit that represents that has some input that makes it true because that's the witness that we would use to verify the output of fx when it's one. Okay, this is really the key step in the proof. And one of the things, the way the proof is in the book, and there's actually code in the repository, that actually does all these transformations. So there's lots of things you've got to be careful to get right to make those transformations. But the real key idea is that, because the verifies polynomial time, we can unroll it, and we can simulate it with a circuit. And then NANSAT can check if there's a witness that would verify for that input that there's a witness. And if there's a witness, then we've got a result that we know the output is one. Have we finished proving Cook-Levin theorem? Yeah. Good. So we haven't done anything about 3SAT. The Cook-Levin theorem was about 3SAT. What we've done is shown a problem, NANSAT, that is NP-hard. So what we would need to do to finish this would be to show that we can reduce 3SAT to NANSAT. And the way the book does this is actually introduce another problem, but it does it in two steps. So it reduces 3SAT to 3NAND, and then reduces that to NANSAT. And we've shown that NANSAT is harder than every problem in NP. And we can do the reductions to show that NANSAT and 3SAT are equivalent. We'd have to do them in both directions, but we could. And we only need to do it in one direction to show this property. Our goal is to show every function in NP. If we had a polynomial time-solver for 3SAT, we could use that to solve every other function in NP, or to compute every other function in NP. This means that 3SAT is NP-hard. It's also NP-complete, because we know it's within class NP. So it's as hard as the hardest problem within class NP. So now you should understand what it would mean. So let's assume for now that P is not equal to NP. We don't know if this is true. But if this is true, that means P is a subset of NP, proper subset. So what would NP-complete look like in this picture? So it's not drawn there yet. How should I draw NP-complete to add it to this picture? Is it going to be a circle? Is it going to be the complement of a circle? Is it going to be a square? An octagon? Octopus? What's it going to look like? This is NP. NP-complete is going to be some subset of NP. We could draw it any way we want, as long as it's a subset of NP that doesn't include anything NP. So it could be an octopus. What's important is it's entirely contained with NP, and none of it's with NP. If we think of it as, as you go further away, problems get harder. Thinking of a ring kind of works. What it means is every problem within NP can be reduced to one here. At least from everything we've seen so far, there's nothing that would tell you it couldn't actually include all of that space. It could be that every function is either, if it's within NP, it's either within P or it's NP-complete. There's no functions that are outside of P that are not as hard as the hardest functions within NP. We actually know this is not the case, that we know that if there is a separation between P and NP, there are some functions that are within NP and harder than P, but not as hard as the ones in NP-complete, which is actually kind of surprising. So what about this other universe? What if they're equal? So now, P equals NP. Those are the same set. What does NP-complete look like? NP-complete is problems within NP that are as hard with respect to polynomial time reductions as the hardest problems in NP. So if P equals NP, there is a polynomial time reduction between every problem in NP and every problem in NP. They're the same class. It's polynomial time. Everything within that class now can be done in a Turing machine in polynomial time. So there's a trivial polynomial reduction between those, which means they can all be mapped to every one of those problems. NP-complete is everything with a little hole in the middle. What's the little hole in the middle? It kind of came up a little while ago. There are two functions that don't work for these reductions. The two constant functions are not in NP-complete. And what is NP-hard? It's every function in the universe other than those two functions. All of those are as hard or harder. So everything is NP-hard. If P equals NP.