Lecture 5: Nondeterminism and Computational Complexity - CSCI 381 Goldberg

Administrative Updates

HW: Show that on the faceplate of a scientific calculator there are at least 8 different functions that are in fact partial.
Upcoming lecture: The reduction notation discussion (Problem B $\leq$ Problem A, Karp’s paper) is for the next lecture. Bring Karp’s paper with you.

Nondeterminism

Our discussion of nondeterminism will be framed within the context of programming and follow the concepts described in the following sources.

Sources:

E. Dijkstra. (1976). A Discipline of Programming, Prentice-Hall Press.
R. Floyd. (1966). “Nondeterministic Algorithms”, Journal of the ACM. Vol. 14(10), pp. 636–644.

According to Dijkstra (based on Floyd), deterministic and nondeterministic computations can be understood with the rubric of standard programming languages, for example we will use JAVA. Consider current Java which is a deterministic language and also then consider a “slightly” modified version of Java, called Njava which is for programming nondeterministic computations. In fact, the ONLY difference between these two language versions is a single command: Choose a value for x from a set of values $X_1,...,X_n$ such that $P(x)$ is TRUE.

To understand why this concept is useful, consider the following scenario: Suppose you are faced with a problem where there are multiple possible approaches, but you don’t know ahead of time which one will lead to success. For instance, in physics research, two scientists might have two different hypotheses or methodologies to explore. One scientist follows one approach, another follows a different path—and only one of them leads to the correct answer (perhaps even a Nobel Prize discovery). In situations like this, a nondeterministic computer would be “intelligent enough” to simply choose the correct path. While we could employ random or probabilistic algorithms to make choices, the theoretical model of nondeterminism goes further: it guarantees that the computer will always make the correct choice from among the options provided.

All other commands in Njava are identical to that of Java: loops, methods and classes with inheritance, recursion etc.

Certain assumptions should be made in order that this special nondeterministic command should be meaningful to the nondeterministic program.

GIGO — Garbage In, Garbage Out

Applied to the “Choose” command above:
- a true correct value must be included amongst the set of values $X_1,...,X_n$
- in theory, a nondeterministic programmer can have bad logic or other such errors in their code, but the nondeterministic computer/code CANNOT correct them.
If a good option is not provided, then the nondetermistic computation will not correct the situation nor determine a correct answer.
The choice selected by the nondeterministic computation is always correct (again, assuming no GIGO.)
No matter how futuristic or powerful the user feels what the nondeterministic computation accomplishes, the fact is that both Turing (and Kleene at the lower automata level) both realized that WHATEVER can be computed nondeterministically, can ALSO be computed (simulated) DETERMINISTICALLY. In other words, nondeterminism does not enable any extra functional power to describe some feature or function that the normal deterministic computer/code could also provide. So, what is the advantage of nondeterminism?
Cost of the Choose Command — In essence, nondeterminism is not an issue of solvability, but rather an issue of complexity. How much time will such choices cost? The following is the fantastic assumption made in the literature about nondeterminsim related to this issue:

The COST of the (sole) nondeterministic Choose command is CONSTANT TIME. $O(1)$ . (cf. line #26 above.)

The theory wanted to present a model for such an assumption. The literature associates such computation with consulting the prophetess/oracle at Delphi. In fact, the literature calls nondeterministic computing “oracle computing” or “relativized computing”. The term relativized stresses the fact that the decision is made based on an external source other than the Turing machine computing the basic aspects of a given function. In fact, this external source does not have to be Turing compatible. For example, we can assign to the external source the knowledge of the halting problem, eventhough it is unsolvable. Then, in this case, the decisions made by your code can be said to be relativized to the halting problem.

As noted in the theory literature, computability incorporates the notion of nondeterminism. Doesn’t this indicate that computer theory believes that (machines or humans) can have the notion of free will? Caveat Emptor. Just because two disciplines may use the same terminology does not mean that they refer to the same concept. One should be especially careful about this when dealing with mathematics. Mathematics is about consistency. Mathematics is a relative logic based on assumptions (axioms) that you assume to be true. “If the moon is made out of blue cheese, then my name is Jeff Bazos” is a Logically True statement. Computer theory cannot be discussing the complete notion of free will since nondeterminism in computer theory, limits the choices to a set of values PREDETERMINED by the PROGRAMMER. Second of all, as was noted above, nondeterministic computations always chooses the correct answer whereas free will does not guarantee that the human will choose a good or correct choice.

Turing already realized that speedup implementations for nondeterministic computations can be obtained (simulated) by parallel processing.

If at some point of the computation, a decision has to be made by the computer, and if enough processors are available to run in parallel, then this “decision” can be implemented by assigning each value to a different processor and let the processes run with their own “decided” value. Since we assume that one of these values is correct, the process that concludes the computation first will report that it found the answer and the entire computation can halt.

(In OS, in addition to multiple processors for parallel computing, it is a question whether each processor has its own independent memory pool, or if they all share an overall memory storage. In the former case (separate, independent memory pools for each processor), these individual threads (processes) are called heavyweight. In the latter case (when multiple threads/processes running on different CPUs share a common memory storage), these threads are called lightweight. Turing understood this principle and, though he focused on sequential computing, he demonstrated how to simulate parallel processing using a multi-tape Turing machine with a round-robin policy: giving each possible choice a small amount of time to execute, then moving to the next choice, cycling through all possibilities until one produces a result.)

But, Turing decided to only present sequential computing.

There is a little irony here in that according to analysis of algorithms, the complexity of an algorithm drops all constants and if the Choose command is considered a built-in command then it is assumed to have complexity $O(1)$ constant time, it will come out that the complexity of a nondeterministic code may not actually be based on the Choose command itself.

Colloquially, the difference between deterministic and nondeterministic computing is as follows:

The complexity of a deterministic code is based on the number of steps/instructions that the code requires in order to solve a problem. Whereas, the complexity of a nondeterministic code is based on the number of decisions that the code requires in order to solve a problem. Because the main contribution of nondeterminism in computer theory ends up being on how fast the decision is made, Cook (1971) and Karp (1972) explore an important class of problems from deterministic and nondeterministic perspectives.

Jack Edmonds in the 1960s differentiated “easy” vs “hard” as to whether the time complexity is polynomial (“easy”) or exponential (“hard”). This distinction was accepted by Cook and Karp (for the space of non/deterministic polynomial timed algorithms). The question becomes $P = NP$ ?

Is " $P$ ", the set of problems that can be solved deterministically in a polynomial amount of time, the exact same set of problems that a nondeterministic computer could solve in a polynomial amount of time ( $NP$ )?

An interesting argument about what is easy vs. hard was proposed by J. Hartmanis but was NOT accepted by the literature community.

He suggested that a major factor in what is easy or hard should be whether you have enough time and space resources actually available to you in order to support the computation. So, $n^{100}$ is technically polynomial and according to Edmonds (above) would be considered “easy”, no one practically would be able to support this extreme amount of resources necessary to support it, so Hartmanis would call this “hard”.

“Who is a member” is governed by the membership predicate.

A Predicate implies that both Boolean values True/False are available. In other words, we can answer both the question of who IS a member of the set, as well as, who IS not a member of the set. However, computer theory disputes this notion of membership and shows that there could exist sets such that the question of who IS a member can be answered but who is NOT a member is unsolvable. Such a question cannot be answered. To differentiate themselves from the discrete mathematics notion of membership, computer theory calls this a “characteristic function” instead of a membership predicate.

Before diving deeper, it’s worth noting the historical origins of two key terms: In the early days of computing, when memory was implemented using light bulbs (with a lit bulb representing 1 and an off bulb representing 0), actual insects would be attracted to the heat and light, sometimes causing bulbs to burn out or crack. When a bulb failed, what the computer thought was a 1 bit would become a 0 bit, causing logical errors. The process of fixing these issues became known as “debugging”—literally removing bugs from the hardware. Over time, the term evolved to mean fixing errors in code. Similarly, early disk storage systems used large spinning platters (like record players) stacked on top of each other. These platters were heavy and could become wobbly, and when one platter would physically crash into another, it would cause a cascading hardware failure and the entire machine would halt. This is the origin of the term “computer crash.”

So, what happens to the programming code if the input causes an unsolvable situation to your program code? Mathematically, one might expect the program to simply crash or produce an error. However, Computer Theory states it must go through an infinite loop. (It was defined this way in order not to cause a problem with the Unsolvability of the Halting Problem.)

Unsolvability of the Halting Problem proves that no algorithm/program can exist that can determine for a given code and provided input whether or not the code provided will enter into an infinite loop based on the input provided. Consider a set defined for a given program as the set of all inputs such that they cause the given program to enter into an infinite loop. Based on the Unsolvability of the Halting Problem, the “membership” predicate for this set cannot exist for both who IS a member and who is not a member.

Clearly, to test whether an input does not cause a given program to enter into an infinite loop is decidable/solvable/effective. Simply, test that input with the program itself and if the program halts then clearly there was no infinite loop. So, based on this, it cannot be that if an input that causes an infinite loop were given to a Halting Problem testing program would cause Tester(input, program code) to crash. Crashing is a response. If the “program code” does not enter an infinite loop with “input” provided, then Tester halts. The fact that in this scenario, Tester crashed tells us that the input does cause an infinite loop. But, Unsolvability guarantees us that you will never know.

So, the only possible response of this Tester is that it too has to go through an infinite loop and in essence you will never find out whether or not it goes through an infinite loop or not. Maybe if you will wait just a little longer, the program will halt and you will find out the answer. This recalls the Seinfeld episode “The Chinese Restaurant” where the characters Elaine, Kramer, Jerry, and George wait endlessly for a table, hoping that if they just wait a little longer, the maitre d’ will call their name (“Seinfeld, 4!”)—but they never know for certain when (or if) their turn will come.

So, based on this, computer theorists would not use the term “membership” necessarily as a predicate since this paragraph indicates that there are sets were you cannot determine both who is and who is not a member. Therefore, computer theorists prefer the name, “characteristic” function instead of “membership”.

We now present an interesting case which seems to exhibit both deterministic and nondeterministic properties. How to categorize it?

In order to identify the quasi case which can be thought of having both deterministic and nondeterministic aspects to it, we have to start with the notion of mappings.

Mappings: Relational, Partial, and Total

Let us consider mapping elements of a set $A$ to elements of a set $B$ . We will consider three such mappings for our quasi situation involving nondeterminism: Relational mapping, Partial mapping, Total mapping.

   A              B
┌──────┐       ┌──────┐
│  A1  │       │  B1  │
│  A2  │  -->  │  B2  │
│   .  │       │   .  │
│   .  │       │   .  │
│   ,  │       │   ,  │
│  Am  │       │  Bn  │
└──────┘       └──────┘

(Note: although drawn here to same size, $m$ does not need to equal $n$ .)

In the case of Relations, any element of $A$ can be mapped to any element**(s)** of $B$ .

No restrictions.

In the case of Partial mapping, any element of $A$ can be mapped to at most one element of $B$ .

At most one (but possibly zero).

In the case of Total mapping, any element of $A$ has to be mapped to a single element of $B$ .

Exactly one.

(In a future class, we will calculate the upper bound on the search spaces formed by these various mappings.)

Deterministic vs. Nondeterministic Mappings

The notion of a deterministic computation is based on the model of the Total mapping:

every input has exactly one output
the code/mapping always knows what to do every step of the way

The same response always occurs given the same input; this follows from the first property but is stated separately for emphasis.

A Deterministic mapping is $F_d: N \rightarrow N$

reads as follows: “Function whose name is $F_d$ , (here $d$ is for deterministic) has the following map (”:”): one member of N(atural number) is mapped to (”—>”) a/nother member of N(atural numbers).”

Deterministic adds that the mapping is Total, i.e. well defined, as explained above.

Mathematically, the nondeterministic computation is modeled by the Relation, because at any decision point the output (i.e. the next step to take, the response) can be any one of a number of output elements. (“don’t care state”)

A Nondeterministic mapping is $F_{nd}: N \rightarrow PS(N)$ where $PS$ is the powerset operation.

What about the Partial Function? It can now be argued that it is closer to $F_{nd}$ (nondeterministic computation) since the Empty Set is always a member of the Powerset and the Empty output set means that this function/mapping is undefined for the given input.

However, partial functions on computing devices tend to give an “error” message. This is in fact a predictable response. For example, divide by 0 and you will always get the same error message (“error state”). From a computational (computer) perspective, it can be argued that the partial function is a deterministic computation.

So which one is it? Authors of textbooks on automata had to deal with this when considered an incompletely specified automata design: in a given state, received a given character input and the transition table for the given automata does not indicate which state to go to. Do we consider this an “error” condition (deterministic) or as a “don’t care” situation (nondeterministic)? The author Sudkamp and others answered that it really can be either depending on the application. As such they label partial mappings as “partially deterministic”.

NP-Hardness, Satisfiability, and Reductions

The first problem that was shown to be NP-hard was Satisfiability — find a sequence of true/false (i.e. 0/1) values that will cause a Boolean Predicate to be True.

Cook (1971) proved this and shows that ALL possible NP-hard problems must “reduce” to Satisfiability. Cook defines reduction in one direction, from your Problem/Algorithm to Satisfiability.

Karp argued that to show equivalent hardness, one must reduce two problems to each other. Yet, in Karp’s paper he only shows one direction: namely, Satisfiability reduces to other problems (and continues for a total of 20 other problems). This paradox is readily solved when you realize that COOK already SHOWED the other direction so Karp only needed to show the second direction, but in terms of definitions, KARP held that two problems must reduce to each other in order to prove that they both have similar hardness in complexity.

A third definition of reduction was given by Levin, inspired by the fact that Nondeterminism is relativized computing (defined above). So, Levin said that reduction is about seeking an external piece of information or source that will help you solve the second problem given a first problem. The hardness is then how hard (complexity) is the cost of obtaining that missing piece of information.

Reduction Notation

In the literature, if we write

\text{Problem B} \leq \text{Problem A}

this notation is read as that B “reduces to” A, but the actual meaning (in Karp’s paper) is the following:

B is known to be hard and A is unknown
A solves B’s problem (that is consistent with both cases above)
B is the parent considered in the reduction tree (cf. Karp, p. 96, Reduction Tree Summary)
A is the child (of B) considered in the reduction tree
the root of the entire tree is SATisfiability (Karp #1)
in this context, it is the CHILD (A) that solves the PARENT’s (B) problem
B reduces TO A
A reduces FROM B

Another notation for reduction, consistent with our entire discussion, is the letters $\propto$ with no space between them. Think of it as an 8 lying sideways and the right most tip then cut off.

To emphasize the need to transform between the NP problems within a reasonable amount of time, many put a “p” at the foot of the " $\leq$ ".

Let us take a pair of problems from Karp’s Reduction Tree Summary (Karp, p. 96) and derive these eight pieces of information as they apply to these problems. For example,

Problem #1 (“B”): Chromatic Number (KARP #12)

Problem #2 (“A”): Clique Cover (KARP #13)

Then, at the point of the tree where Chromatic Number connects to Clique Cover:

Chromatic Number is known to be hard (at this point), whereas Clique Cover is not yet known.
Clique Cover is A and Chromatic Number is B (i.e. B reduces to A)
Chromatic Number is the parent
Clique Cover is the child
The root of the tree is SAT
Clique Cover solves Chromatic Number
Chromatic Number reduces TO Clique Cover
Clique Cover reduces FROM Chromatic Number

KARP-21

(21 NP-Complete problems; historically Cook-1; Karp-20)

In order to define each of the problems presented as a NP-Complete problem, even if the problem inherently is an optimization problem and more easily would be classified as a NP-Hard problem (at the very beginning we mentioned that optimization problems generally do not have an easy verification process short of an exhaustive search and therefore typically can only be shown to be Hard and not complete), Karp adds an extra parameter in order to augment the underlying problem (can a solution be found whose output value is NP-Hard (optimization problem) to that of a decision either $<$ or $>$ than supplied parameter) that will be NP-Complete.

For example, in Job Scheduling/Sequencing #19, which asks for a permutation of a list of tasks such that they (preferably) all make internal deadlines. There exists a penalty parameter (other definitions of this problem discuss in terms of profit) for each task not completed by the internal deadline. Instead of optimization: Which permutation minimizes the tardiness over all (and has least total penalty); instead Karp asks can a permutation of the tasks exist such that the total penalties due to missing deadlines is $\leq k$ , where $k$ is a user-supplied parameter for lateness cost tolerance. So, by adding this extra parameter $k$ , an optimization problem is converted into a decision problem and thus can be proven to be NP-Complete in addition to being NP-Hard.