---
Video Title: Satisfiability
Description: Theory of Computation
https://uvatoc.github.io/week11

22.4 Satisfiability
- Conjunctive Normal Form
- Satisfiability Problem
- 3SAT
- Inefficient Algorithm for 3SAT
- Efficient Algorithm for 2SAT

Nathan Brunelle and David Evans
University of Virginia
---

﻿The next two problems that we're going to
be finding the running time of today,

2SAT and 3SAT problem.

Before we can talk about that,
we need to find something called 3CNF.

CNF stands for Conjunctive Normal Form.

This is basically just a certain format
for writing down logical expressions,

Boolean logic.

Conjunctive... it sounds gross,
I know.

A conjunction is an and, as opposed to a
disjunction, which is an or.

So we typically call this like a
conjunction, because con means together.

And in order to make an and true,
both things have to together be true.

Whereas if it's a disjunction,
that's kind of like disjoint.

An or is a disjunction, because to make an
or true, those can be separately true.

One or the other could be true.

So for Conjunctive Normal Form,
the way that this works is that we're

going to have a bunch of terms in our
Boolean expression.

The terms are going to
be grouped in such a way

that within each group,
we can have only ors.

So within each of these groups, which
we call a clause, we can have only ors.

Among groups, we can have only ands.

Among all of our groups, we can have only
ands.

So anything in this format where all of
our terms are grouped so that we have ors

within a group and ands among groups,
that is Conjunctive Normal Form.

We can also talk about what is 3CNF.

An expression in CNF, it was anything
where you have ands of ors.

3CNF gives a limitation
or a restriction on the

number of terms I can
see within each clause.

So for 3CNF, this says that I must have
exactly three terms within each clause.

So in this case, I have x.

I have the variable x.

I have y in there.

I have z.

And I have u.

So I have four different identities of
variables within this expression down here.

But because each clause has three terms in
it, that makes this 3CNF.

So the reason that we like this 3CNF is
because typically when we write things

down in this way, I want to know,
can I assign true versus false to each of

these variables so that
if I evaluated that whole

function, I ended up
with true as the result.

So this is a very common thing that we
want to do.

There's lots of different
situations where we're

interested in answering
this sort of question.

For instance, maybe you've got a chain of
if statements and lf statements and while

loops and for loops and those sorts of
things.

And you want to know if
there's any way that you

could output 5 in your
program or something like that.

Oftentimes, you can list that as some sort
of Boolean formula where you can say all

of these different Boolean expressions we
can represent with one variable.

And I want to know if there's any
combination of those expressions that will

cause some grander expression to be true,
this larger expression to be true,

so that you could tell if you ever output
5.

So you can take that, like, can I navigate
the if statements in this way and convert

it into one of these problems of,
can I assign true or false to a bunch of

variables to make this whole expression
true?

And by writing this in CNF form, we
have kind of an easy way to analyze these.

In order to satisfy a CNF formula, we have
to say every clause has one true thing.

Has at least one true thing.

So we can very easily satisfy this whole
thing or very easily detect if this whole

thing was satisfied by
just checking for one

true thing appearing
in every single clause.

So that's why we like this CNF.

Because we have this
quick way of kind of figuring

out whether or not the
whole thing evaluated to true.

And we can have different values instead
of 3 here.

So we can have 5 here,
which says you're allowed

to have exactly 5 terms
in every single clause.

You can have 2 here, which says that you
can have 2 terms.

Exactly 2 terms in every single clause.

And so forth.

3SAT, the 3 in 3SAT, says that we're going
to be given a formula in 3CNF.

So those 3s match one another.

So this says if I'm given a formula in
this 3CNF format, so such that I have

clauses of variables which only have ors
within a clause, and then those clauses

are all combined together with ands,
is there some way for me to assign true or

false to each of the variable's values to
make this whole thing evaluate to true?

In this case, maybe this assignment would
cause that whole expression to be true

because here, this clause is satisfied
since X was true.

This clause was satisfied because X was
true.

This one was satisfied because U was true.

And then this one would be satisfied because
Y was false, making Y complement true.

Making the negation of Y to be true.

So since every single one of our clauses
had at least one true thing in it,

the whole expression evaluates to true.

One way that we could go about solving
this, so kind of a silly way that we could

go about solving this, is if you gave me a
3CNF formula that had, let's say,

invariables and M clauses.

One way I could do this is try all
combinations of true or false to each of

those N variables, and
then check to see if any

combination caused
the expression to be true.

So in this case, every
combination of true and false across

the N things is a different
one that I want to check.

There are going to be 2 to the N such
combinations.

Because I could say all of them are false,
or 1 is true and the rest are false,

or 2 is true and the rest are false,
and 3 is true and the rest are false.

Every combination of
trues and falses for all n of

my variables is a different
thing I'm going to check.

So this is certainly
going to take at least

2 to the n time in
order for me to do this.

This is another one of these situations
where we don't really have methods that

are substantially better than this method
for solving this problem.

At least not for our current model of
computation.

For this RAM model of computation,
we don't know of any way where this is

going to be more efficient than this 2 to
the n.

However, when we make this one smaller
change, this one tiny change, where we say

instead of having a 3-CNF formula that
we're trying to evaluate to true,

we're going to have a 2-CNF
formula instead, we now get a

little trick that we can use to
make this a lot more efficient.

So in this case, we're going to have a
2-CNF formula, where it's going to be the

same as what we saw for 3-CNF,
except we now have 2 variables per clause.

And I want to figure out, is there some
assignment of true or false to each

variable in this formula in order to make
the whole thing evaluate to true?

That's our question for this one.

The previous way of trying to solve this
would work.

We could do the same thing we did for
3-SAT, where we could try out all possible

combinations of true or false and see which
ones caused the thing to evaluate to true.

So that says that 2-SAT certainly belongs
to time of 2 to the n.

So we mentioned that this time
of T of n said, can we come up

with a Turing machine that would
solve that within T of n steps?

So in this case, we saw that 3-SAT
belonged to time 2 to the n.

We saw that longest path, this belonged to
time 2 to the n.

And then we saw, like,
shortest path, this belonged

to time of, like, n or
something like that.

So we certainly know now that 2-SAT
belongs to time 2 to the n, because the

same algorithm as what worked for 3-SAT is
going to work for 2-SAT as well.

It turns out, though, we can make this in
time

let's say, just to be generous,
n cubed.

We can actually do 2-SAT in n cubed time.

We can make this much, much faster.

So here is the trick that we can do.

So when I look at one of these two CNF
formulas, because there's only two things

in there, I only have
two choices for which

could be the thing that
I make the true thing.

That means that either X is going to be
true, or if not X, then it must have been Y.

I don't have any
hope of satisfying this

expression if both of
these two terms were false.

If it wasn't that X was the true thing,
then it better have been Y.

If it wasn't that Y was the true thing,
then it better have been X.

That's something that we can do for this
2-CNF that we can't do for 3-CNF.

We're going to leverage this
observation in order to kind

of get a better idea for
the structure of this formula.

So what we're going to do is make a graph
out of this 2-SAT thing.

We're going to take our 2-CNF formula,
and we're going to convert it into a graph

where each directed edge in this graph is
going to represent an implication.

So here, with this formula, we have if X
was not true, that must imply that Y is

true if we're going to have any hope of
satisfying this expression.

So what I'm going to do is
I'm going to then draw an edge

from not X, that is from X
being false, to Y being true.

It says if X was false, you better hope Y
was true.

Or I'm requiring, I'm going to require
that Y be true if X is false.

I'm going to require that if Y is false,
then X is true.

And I can do this same thing for each one
of the clauses.

So each of the clauses is going to be two
edges in this graph.

And I'm going to have
a node in this graph for

every variable and its
negation in my formula.

So in this case, if there's going to be
some way I can find a path through this

graph so that I visit, like,
every node and its negation,

then that can represent
a satisfying assignment.

So something that will make this
expression true.

And if I have some cycle in the graph that
contains a variable and its own negation,

then that's not going to be possible to
do.

Since we're reading each of these arrows
as a requirement, if Y requires,

say, X to be true, and then X requires Z
to be true, or something like that.

I need to write this down.

So maybe we have X is
going to require that Y be

true, and then Y being
true requires that Z be false.

And then Z being false maybe requires that
Y be false.

And then Y being false requires X be true.

This is saying, if X is true, then Y is
true, then Z is false, then Y is false.

So we have this cycle that says that if
ever I required Y to be true,

If I had a path that went from,
say, a simple path that went from Y to Y

complement, then maybe there
was going to be some other path

that I could take that would
work out to be a good assignment.

So it does have to be a cycle.

If we had this sort of cycle that had both
Y and Y complement in it, then since this

is strict requirement, this says there is
some clause that says in order for that

clause to be satisfied with Y being true,
it must have been that Z was false.

Which says that there is some other clause
that says that whenever Z was false,

Y must have been false.

So that says that since we have this
cycle, that says that one of these clauses

that was used to draw
this arrow is not going to be

satisfiable for any
assignment that we might pick.

So, for instance, here is an assignment
that is not going to be satisfiable.

So, what we could do is we're going to
have, like, here is x or y.

So, I can have something
that's like, whenever x was

false, then that's going to
imply that y must be true.

So, this one says, whenever y was true,
that must have meant that, yeah, x was true.

So, whenever y was true, that was going to
mean that x was true, because I couldn't

satisfy this clause for y being true
unless x was also true.

If X is true, then that must have meant
that Y was going to be false.

If X was true, then that meant that Y must
have been false.

But now this one says,
well, if Y was false, then

that must have meant
that X was going to be false.

So now we have this cycle that has both X
and X complements.

We're not going to be
able to come up with

any assignment that's
going to satisfy this.

Basically, this is saying once I had
satisfied these three clauses,

There was not going to be any way I possibly
could have satisfied that fourth clause.

So now we have this little trick where
essentially we can just build this graph.

So we have n nodes
in this graph that we're

creating, and we're going
to have two m edges.

Because we have two edges for each of the
clauses, we have m clauses here.

And then we can just do a breadth-first
search thing to figure out, is there any

cycle that contains something and its
negation?

And that's going to be, say, n plus 2m.

So this one, again, an efficient
algorithm.

We've only seen like four of these
problems so far.

But these four problems actually already
start to show this strange pattern that we

see for a lot of what we call natural
problems.

Natural problems are
ones that we tried to solve

because we were somehow
motivated to solve them.

It was something that appeared just
naturally while we were going about our

daily lives as something that we were
interested in solving, as opposed to like

an artificial problem
which was contrived to

have a certain property
or something like that.

But it turns out that most of the natural
problems we see, they're either going to

be solvable using a small degree
polynomial, like n squared, n cubed,

something like that.

Or else they're going to require
exponential time, like 2 to the n,

or n factorial, or something like that.

We don't see many things in the wild that
are in between.

We don't really know why.

We don't have a super great intuition for
why.

But that seems to be how things go.

So we're going to be exploring
some of this relationship

between polynomial
time and exponential time.

So that's where we're going next.