--- Video Title: Satisfiability Description: Theory of Computation https://uvatoc.github.io/week11 22.4 Satisfiability - Conjunctive Normal Form - Satisfiability Problem - 3SAT - Inefficient Algorithm for 3SAT - Efficient Algorithm for 2SAT Nathan Brunelle and David Evans University of Virginia --- The next two problems that we're going to be finding the running time of today, 2SAT and 3SAT problem. Before we can talk about that, we need to find something called 3CNF. CNF stands for Conjunctive Normal Form. This is basically just a certain format for writing down logical expressions, Boolean logic. Conjunctive... it sounds gross, I know. A conjunction is an and, as opposed to a disjunction, which is an or. So we typically call this like a conjunction, because con means together. And in order to make an and true, both things have to together be true. Whereas if it's a disjunction, that's kind of like disjoint. An or is a disjunction, because to make an or true, those can be separately true. One or the other could be true. So for Conjunctive Normal Form, the way that this works is that we're going to have a bunch of terms in our Boolean expression. The terms are going to be grouped in such a way that within each group, we can have only ors. So within each of these groups, which we call a clause, we can have only ors. Among groups, we can have only ands. Among all of our groups, we can have only ands. So anything in this format where all of our terms are grouped so that we have ors within a group and ands among groups, that is Conjunctive Normal Form. We can also talk about what is 3CNF. An expression in CNF, it was anything where you have ands of ors. 3CNF gives a limitation or a restriction on the number of terms I can see within each clause. So for 3CNF, this says that I must have exactly three terms within each clause. So in this case, I have x. I have the variable x. I have y in there. I have z. And I have u. So I have four different identities of variables within this expression down here. But because each clause has three terms in it, that makes this 3CNF. So the reason that we like this 3CNF is because typically when we write things down in this way, I want to know, can I assign true versus false to each of these variables so that if I evaluated that whole function, I ended up with true as the result. So this is a very common thing that we want to do. There's lots of different situations where we're interested in answering this sort of question. For instance, maybe you've got a chain of if statements and lf statements and while loops and for loops and those sorts of things. And you want to know if there's any way that you could output 5 in your program or something like that. Oftentimes, you can list that as some sort of Boolean formula where you can say all of these different Boolean expressions we can represent with one variable. And I want to know if there's any combination of those expressions that will cause some grander expression to be true, this larger expression to be true, so that you could tell if you ever output 5. So you can take that, like, can I navigate the if statements in this way and convert it into one of these problems of, can I assign true or false to a bunch of variables to make this whole expression true? And by writing this in CNF form, we have kind of an easy way to analyze these. In order to satisfy a CNF formula, we have to say every clause has one true thing. Has at least one true thing. So we can very easily satisfy this whole thing or very easily detect if this whole thing was satisfied by just checking for one true thing appearing in every single clause. So that's why we like this CNF. Because we have this quick way of kind of figuring out whether or not the whole thing evaluated to true. And we can have different values instead of 3 here. So we can have 5 here, which says you're allowed to have exactly 5 terms in every single clause. You can have 2 here, which says that you can have 2 terms. Exactly 2 terms in every single clause. And so forth. 3SAT, the 3 in 3SAT, says that we're going to be given a formula in 3CNF. So those 3s match one another. So this says if I'm given a formula in this 3CNF format, so such that I have clauses of variables which only have ors within a clause, and then those clauses are all combined together with ands, is there some way for me to assign true or false to each of the variable's values to make this whole thing evaluate to true? In this case, maybe this assignment would cause that whole expression to be true because here, this clause is satisfied since X was true. This clause was satisfied because X was true. This one was satisfied because U was true. And then this one would be satisfied because Y was false, making Y complement true. Making the negation of Y to be true. So since every single one of our clauses had at least one true thing in it, the whole expression evaluates to true. One way that we could go about solving this, so kind of a silly way that we could go about solving this, is if you gave me a 3CNF formula that had, let's say, invariables and M clauses. One way I could do this is try all combinations of true or false to each of those N variables, and then check to see if any combination caused the expression to be true. So in this case, every combination of true and false across the N things is a different one that I want to check. There are going to be 2 to the N such combinations. Because I could say all of them are false, or 1 is true and the rest are false, or 2 is true and the rest are false, and 3 is true and the rest are false. Every combination of trues and falses for all n of my variables is a different thing I'm going to check. So this is certainly going to take at least 2 to the n time in order for me to do this. This is another one of these situations where we don't really have methods that are substantially better than this method for solving this problem. At least not for our current model of computation. For this RAM model of computation, we don't know of any way where this is going to be more efficient than this 2 to the n. However, when we make this one smaller change, this one tiny change, where we say instead of having a 3-CNF formula that we're trying to evaluate to true, we're going to have a 2-CNF formula instead, we now get a little trick that we can use to make this a lot more efficient. So in this case, we're going to have a 2-CNF formula, where it's going to be the same as what we saw for 3-CNF, except we now have 2 variables per clause. And I want to figure out, is there some assignment of true or false to each variable in this formula in order to make the whole thing evaluate to true? That's our question for this one. The previous way of trying to solve this would work. We could do the same thing we did for 3-SAT, where we could try out all possible combinations of true or false and see which ones caused the thing to evaluate to true. So that says that 2-SAT certainly belongs to time of 2 to the n. So we mentioned that this time of T of n said, can we come up with a Turing machine that would solve that within T of n steps? So in this case, we saw that 3-SAT belonged to time 2 to the n. We saw that longest path, this belonged to time 2 to the n. And then we saw, like, shortest path, this belonged to time of, like, n or something like that. So we certainly know now that 2-SAT belongs to time 2 to the n, because the same algorithm as what worked for 3-SAT is going to work for 2-SAT as well. It turns out, though, we can make this in time let's say, just to be generous, n cubed. We can actually do 2-SAT in n cubed time. We can make this much, much faster. So here is the trick that we can do. So when I look at one of these two CNF formulas, because there's only two things in there, I only have two choices for which could be the thing that I make the true thing. That means that either X is going to be true, or if not X, then it must have been Y. I don't have any hope of satisfying this expression if both of these two terms were false. If it wasn't that X was the true thing, then it better have been Y. If it wasn't that Y was the true thing, then it better have been X. That's something that we can do for this 2-CNF that we can't do for 3-CNF. We're going to leverage this observation in order to kind of get a better idea for the structure of this formula. So what we're going to do is make a graph out of this 2-SAT thing. We're going to take our 2-CNF formula, and we're going to convert it into a graph where each directed edge in this graph is going to represent an implication. So here, with this formula, we have if X was not true, that must imply that Y is true if we're going to have any hope of satisfying this expression. So what I'm going to do is I'm going to then draw an edge from not X, that is from X being false, to Y being true. It says if X was false, you better hope Y was true. Or I'm requiring, I'm going to require that Y be true if X is false. I'm going to require that if Y is false, then X is true. And I can do this same thing for each one of the clauses. So each of the clauses is going to be two edges in this graph. And I'm going to have a node in this graph for every variable and its negation in my formula. So in this case, if there's going to be some way I can find a path through this graph so that I visit, like, every node and its negation, then that can represent a satisfying assignment. So something that will make this expression true. And if I have some cycle in the graph that contains a variable and its own negation, then that's not going to be possible to do. Since we're reading each of these arrows as a requirement, if Y requires, say, X to be true, and then X requires Z to be true, or something like that. I need to write this down. So maybe we have X is going to require that Y be true, and then Y being true requires that Z be false. And then Z being false maybe requires that Y be false. And then Y being false requires X be true. This is saying, if X is true, then Y is true, then Z is false, then Y is false. So we have this cycle that says that if ever I required Y to be true, If I had a path that went from, say, a simple path that went from Y to Y complement, then maybe there was going to be some other path that I could take that would work out to be a good assignment. So it does have to be a cycle. If we had this sort of cycle that had both Y and Y complement in it, then since this is strict requirement, this says there is some clause that says in order for that clause to be satisfied with Y being true, it must have been that Z was false. Which says that there is some other clause that says that whenever Z was false, Y must have been false. So that says that since we have this cycle, that says that one of these clauses that was used to draw this arrow is not going to be satisfiable for any assignment that we might pick. So, for instance, here is an assignment that is not going to be satisfiable. So, what we could do is we're going to have, like, here is x or y. So, I can have something that's like, whenever x was false, then that's going to imply that y must be true. So, this one says, whenever y was true, that must have meant that, yeah, x was true. So, whenever y was true, that was going to mean that x was true, because I couldn't satisfy this clause for y being true unless x was also true. If X is true, then that must have meant that Y was going to be false. If X was true, then that meant that Y must have been false. But now this one says, well, if Y was false, then that must have meant that X was going to be false. So now we have this cycle that has both X and X complements. We're not going to be able to come up with any assignment that's going to satisfy this. Basically, this is saying once I had satisfied these three clauses, There was not going to be any way I possibly could have satisfied that fourth clause. So now we have this little trick where essentially we can just build this graph. So we have n nodes in this graph that we're creating, and we're going to have two m edges. Because we have two edges for each of the clauses, we have m clauses here. And then we can just do a breadth-first search thing to figure out, is there any cycle that contains something and its negation? And that's going to be, say, n plus 2m. So this one, again, an efficient algorithm. We've only seen like four of these problems so far. But these four problems actually already start to show this strange pattern that we see for a lot of what we call natural problems. Natural problems are ones that we tried to solve because we were somehow motivated to solve them. It was something that appeared just naturally while we were going about our daily lives as something that we were interested in solving, as opposed to like an artificial problem which was contrived to have a certain property or something like that. But it turns out that most of the natural problems we see, they're either going to be solvable using a small degree polynomial, like n squared, n cubed, something like that. Or else they're going to require exponential time, like 2 to the n, or n factorial, or something like that. We don't see many things in the wild that are in between. We don't really know why. We don't have a super great intuition for why. But that seems to be how things go. So we're going to be exploring some of this relationship between polynomial time and exponential time. So that's where we're going next.