Skip to content

Lecture 12 - KARP 21 Problems Continued (Node Cover, Hitting Set, Steiner Tree, 3D Matching) and NP-Hardness - CSCI 381 Goldberg

  • Exam reference material: For exams, students are responsible for Karp’s original definitions of the NP-hard problems. Do not rely on alternative formulations found elsewhere, as variations exist but the course uses Karp’s paper for consistency across exams.

  • Recommended reading (informal homework): The professor encouraged students to read Karp’s original paper on their own, particularly the section on the Steiner Tree problem. This will not be formally checked, but is strongly recommended for solidifying definitions.

These problems don’t have a single category to unify them. They are as follows:

Find a set of mm nodes (user decides on mm) such that each edge of the entire graph has some vertex in common with a vertex found in the node cover. Note: any number of edges are allowed to be connected to the same vertex of the node cover. As such the mapping is surjective and not necessary to be injective.

For example: consider the following Star Graph

LaTeX diagram

V5 alone is a node cover for this graph since all edges touch V5.

Note: This problem can be technically be grouped with any of the graph theory problems listed above, but then the category would be too generic, whereas the above categories are more specific (such as cycles or cliques).

Consider n>1n > 1 subsets of a set of numbers (or items). Construct a set WW such that WW has only exactly one element in common with each of the nn subsets.

The user provided a collection of subsets UiU_i from some Universal set UU. (Universal set is the set of all possible elements under consideration.) You are tasked to find a representative set (termed “hitting set”) WW such that WUi=1|W \cap U_i| = 1

For example, consider the following set of numbers U={1,2,3,4,5,6,7,8,9,10,11,12}U = \{1,2,3,4,5,6,7,8,9,10,11,12\} with the following four subsets constructed.

SubsetElementsCommon with W
U1U_1{2,3,4,5}\{2,3,4,5\}Element 5 in common with WW chosen below. Only one element in common.
U2U_2{1,2,3,4}\{1,2,3,4\}Element 1 in common with WW chosen below. Only one element in common.
U3U_3{5,6,7,8}\{5,6,7,8\}Element 5 in common with WW chosen below. Only one element in common.
U4U_4{9,10,11,12}\{9,10,11,12\}Element 9 in common with WW chosen below. Only one element in common.

NOTE: It is quite acceptable that the common element should be shared by other subsets as well: such as, element 5 is in common with both Subsets U1U_1 and U3U_3.

What is not alright is that 2 or more elements of the hitting set are in common with a given subset (such as subset XX below).

W={1,5,9}W = \{1,5,9\}

This is a Hitting Set for subsets U1U_1-U4U_4.

X={2,5,9}X = \{2,5,9\}

is not a Hitting Set for subsets U1U_1-U4U_4 since 2 and 5 are both present in Subset U1U_1.

Defined in a worksheet tab “Tweaked P problems” starting at line 38.

(It has a tweaked version “Minimum Spanning Tree” which is in PP.)

[Note: An example of both of these are in separate tabs.]

Let T={1,2,3,4,5,6,7,8,9,10}T = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\} with T=10|T| = 10 numbers.

T3=T×T×T={1,1,1,1,1,2,,10,10,10}T^3 = T \times T \times T = \{\langle 1,1,1 \rangle, \langle 1,1,2 \rangle, \ldots, \langle 10,10,10 \rangle\}

where T3=1000|T^3| = 1000 Points.

The user chooses a subset UU of T3T^3. UU is the User’s Universal set.

Problem: Find a 3D Match WW from the points in UU such that W=T|W| = |T|.

  • Here, T=10|T| = 10 points will be chosen from the user’s UU.
  • In essence the solution will be a listing of T|T| points (obtained from UU) such that in each dimension, the values found there are a permutation of the values found in the original TT.
  • Note: The permutation of each dimension will be assumed independently so there could be three different permutations used, one for each dimension.

For example, the following 10 points were chosen from a supposed User’s subset of points, UU.

These points are a valid 3D Matching:

PointXYZ
P1123
P2678
P3999
P4214
P5551
P6432
P7787
P8101010
P9865
P10346

Looking at the X column: {1,6,9,2,5,4,7,10,8,3}\{1, 6, 9, 2, 5, 4, 7, 10, 8, 3\} — this is a permutation of {1,2,3,4,5,6,7,8,9,10}\{1,2,3,4,5,6,7,8,9,10\}

Looking at the Y column: {2,7,9,1,5,3,8,10,6,4}\{2, 7, 9, 1, 5, 3, 8, 10, 6, 4\} — this is a permutation of {1,2,3,4,5,6,7,8,9,10}\{1,2,3,4,5,6,7,8,9,10\}

Looking at the Z column: {3,8,9,4,1,2,7,10,5,6}\{3, 8, 9, 4, 1, 2, 7, 10, 5, 6\} — this is a permutation of {1,2,3,4,5,6,7,8,9,10}\{1,2,3,4,5,6,7,8,9,10\}

Remember that the columns are independent for the permutations.

The above is a “3D-Matching” and ironically because the numbers that appear in each given dimension (1st, 2nd, 3rd positions) do NOT have to match at all the other numbers in that dimension (position).

2D Matching (in P): If we were to limit the points to 2D as T2=T×TT^2 = T \times T with T2=100|T^2| = 100, then in PP.

An application of a weighted graph which involves a tree.

Minimum Spanning Tree (MST) — A subgraph that has no cycles. All nodes are present in this subgraph tree.

The cost of a Spanning Tree is the sum of the costs of all the edges involved. Choose the one that has Minimum cost.

Weighted Graph Gw=(V,E,w)Gw = (V,E,w) where w:ER+w:E \to \mathbb{R}^+

//R+ means positive real. Weighted does not require positive but most applications do.

Note: Greedy does not always guarantee that the true solution is found. But, in our case Kruskal proves that it works.

Start with a weighted graph and we find a subgraph that involves:

  • ALL the original nodes (not all of the original edges): Spanning
  • A further restriction is that the subgraph be a TREE.

There can be many such “skeleton” Spanning Trees.

Which do you choose? The one with the smallest (Minimum) total cost where cost of the tree is defined by simply adding together the costs of each edge in that tree.

  • Sort all edges from smallest to largest cost (weight) associated with each edge.
  • Of the remaining edges not yet chosen, choose the smallest cost edge available now (that is what makes this greedy) such that when added into the “tree” you are growing, a cycle does not result.
  • Stop when all vertices have been reached by the edges of your chosen tree.

Kruskal proves that this Greedy (choose what is best now without consideration of future) approach actually works.

Worked out example of Kruskal’s algorithm.

Section titled “Worked out example of Kruskal’s algorithm.”

This weighted graph has nine vertices number V0..V9 and 14 edges.

The edges in sorted order from smallest to largest is as follows:

FROMTOWEIGHTSSelectedOrder ChosenNotes
V7V611st
V2V822nd
V6V523rd
V0V144th
V2V545th
V6V86This would form a cycle: V6-V8-V2-V5-V6
V2V376th
V7V87This would form a cycle: V7-V8-V2-V5-V6-V7
V0V787th
V1V28This would form a cycle: V1-V2-V5-V6-V7-V1
V3V498thAlgorithm stops: all vertices reached (spanning)
V5V410Algorithm stopped: all vertices reached (spanning)
V1V711Algorithm stopped: all vertices reached (spanning)
V5V314Algorithm stopped: all vertices reached (spanning)

Cost of the MST = 37

Worked out example of Kruskal’s algorithm:

Step 1: Initial weighted graph with all edges:

LaTeX diagram

Step 2: Selection process with valid edges highlighted in red (invalid edges in gray):

LaTeX diagram

Step 3: Final MST (cost = 37):

LaTeX diagram

In this example, the user is only interested in the region filled in.

The vertex AA is outside the region. Within the region, the shortest distance between nodes BB and CC has weight = 17. However, if one considering neighboring vertices outside the region such vertex AA, then a Cheaper weight is obtained for traversing from BB (indirectly) to CC. Namely, BA=7B-A = 7 and AC=8A-C = 8; 7+8=157+8 = 15.

This extra vertex outside the region is called a Steiner/Ghost point.

alt text

Understanding Karp’s 21 Problems: Application and Relationships

Section titled “Understanding Karp’s 21 Problems: Application and Relationships”

We will be presenting Karp’s 21 Problems from an application point of view. This will make understanding problems easier, but does not necessarily follow Karp’s order. Karp’s order was based on the order of reduction necessary to prove that any given problem is also NP-Hard/Complete.

Although the 21 problems found in Karp’s paper comprise what colloquially is called “Karp-21”, the fact is that Karp only introduced 20 of these problems. The first problem was introduced by Stephen Cook a year earlier. Cook (1971); Karp (1972).

The first problem which is also the root of the entire reduction tree (Karp, p.96) is SATisfiability and all other problems are proved to be NP-Hard or Complete through a chain of reductions from SAT to problems underneath it till you reach a given problem interested in.

Tweaking Parameters and Complexity Classes

Section titled “Tweaking Parameters and Complexity Classes”

Karp emphasizes earlier in the paper that modifying details about the 21 problems can cause a NP problem to now be part of PP.

Note: Tweaking parameters can change PP into NP problem and vice versa. Tweaking means slight changes to parameters.

cf. Karp, p. 89-90 (PP) as compared to p. 94-97 (NP). Karp calls these problems their “close relatives”.

NP-Hard Problem Variants and Their P Counterparts

Section titled “NP-Hard Problem Variants and Their P Counterparts”

The following table shows how various NP-hard problems relate to their polynomial-time solvable variants through parameter modifications:

From NPKARP(NP#) PHow tweaked?NOTES
SAT-311 | SAT-2At most TWO variables per clause
STEINER TREE16 | MINIMUM SPANNING TREE (MST)Limiting the weight of the edges chosen to a specific regionRelaxation
MAX-CUT21 | MINIMUM CUTMinimizing vs. Maximizing
NODE COVER5 | ARC COVERSet of Edges to Set of Vertices
CHROMATIC NUMBER12 | 2-COLORINGIf it can be colored with 2 colors then aka Bipartite Checking the graph is bipartite. Can easily be solved using a BFS.
3D MATCHING17 | BIPARTITE (2D) MATCHINGLowering the dimensions by oneMatching actual here means NOT ie no duplicates
JOB SEQUENCING19 | SEQUENCING WITH DEADLINESRemoving the penalty parametersLater authors rewrite the NP problem in terms of profit, instead of penalty.
0/1 INTEGER PROGRAMMING2 | SOLVABILITY OF LINEAR EQUATIONSThe variables are reals instead of BooleanGaussian Elimination; Linear Programming is in PP (1984)
KNAPSACK18 | FRACTIONAL KNAPSACKThe variables are reals instead of BooleanConsider sales of spices for example; Karp did not mention this variation.
UNDIRECTED HAMILTONIAN CIRCUIT10 | SHORTEST PATHShortest path determines minimum length, Hamiltonian Circuit tries to find the theoretical maximum.
FEEDBACK ARC SET8 | ARC DELETIONKarp uses Arc Deletion to verify the solution to Feedback Arc Set.

The above problems and their tweaked variants are elaborated on below.

Section titled “The above problems and their tweaked variants are elaborated on below.”

SAT-3 (KARP#11): Identical to SAT but the Predicate has a “normal form” ie a certain format. Each logic clause (represented by parentheses) can only contain upto 3 variables that are connected disjunctively (OR) and the clauses are connected conjunctively (AND). Intraclause (within) are the OR operators and interclause (between) are the AND operators. NOT (negation) can be applied to any variable without restriction, but cannot be applied to the clause (cf. DeMorgan’s law from discrete mathematics).

Karp showed that any Boolean Predicate can be written in SAT-3 normal form.

SAT-2 (P): Identical to SAT-3 but the number of variables that appear in any clause is at most 2.

STEINER TREE (KARP#16) and MINIMUM SPANNING TREE (MST) (P)

Section titled “STEINER TREE (KARP#16) and MINIMUM SPANNING TREE (MST) (P)”

STEINER TREE (KARP#16): Is similar to MST but asks a decision problem of whether the Spanning tree has a sum of weights <= k (threshold provided by user). The key issue is Spanning Tree of which Subgraph. The user chooses a set of vertices for G’ (ie subgraph of G) but the problem allows you to select additional vertices not in the user’s set but in the original graph G. Suppose original graph had V(G)=100|V(G)| = 100. The user has an interest in a subgraph G’ of the original graph G which also include the vertices V.

See next worksheet (“Steiner”) for an example of this. The weighted graph there is referenced to here. The idea is that in the user’s region for example, the cost of edge from Vi to Vj is 17 but there is another vertex Vk which is in G but NOT amongst the vertices chosen by the user such that the cost Vi -> Vk -> Vj is only 15! Steiner wants the program to find THOSE vertices. In modern literature these nodes not selected or specified originally but chosen when constructing the Steiner Tree are called Steiner/Ghost points (vertices). The MST however can ONLY use vertices / edges in the user’s SELECTED region.

MINIMUM SPANNING TREE(P). (MST): Analysis of algorithms provides Dijkstra, Kruskal, Boruvka algorithms to solve this problem.

Graph G=(VG = (V is the set of Vertices and EE is the set of Edges

EE is a subset of the Cartesian Product V×VV \times V.

The problem involves a weighted graphs:

Graph Gw=(V,E,w)Gw = (V,E,w) as above where w(eight)w(\text{eight}) function maps E(dges)E(\text{dges}) to R(eals)\mathbb{R}(\text{eals}) — weight aka cost

Find a subgraph of GwGw such that

  • It is a connected TREE (no cycles).
  • ALL vertices are present in this tree. (Spanning; Cover)
    • (So, the subgraph is a subgraph of the edges, but not a subgraph of the nodes.)
  • The output will be the subgraph that has properties 1 and 2 and in addition, the sum of the costs/weights of all the edges present in the subgraph chosen is minimal when compared to any other such spanning tree.

MAX-CUT (KARP#21): Consider a weighted Graph Gw=(V,E,w)Gw = (V,E,w) where ww maps from E(dges)E(\text{dges}) to R(eal numbers)\mathbb{R}(\text{eal numbers}).

Divide the vertex set into two distinct subsets (termed a cut and no vertices in common between the subsets) such that the sum of the weights of the edges between these two sets (any edge with one vertex end in one of the sets and the other end vertex end in the other vertex set) is at least a user supplied threshold WW.

MIN CUT (P): Similar to Max-Cut but trying to divide the original vertex set into two disjoint subsets such that the cut (the edges between these disjoint subsets) has a total weight of edges <= to a user supplied threshold WW.

NODE COVER (KARP#5): Find a set of mm nodes (user decides on mm) such that each edge of the entire graph has some vertex in common with a vertex found in the node cover. Note: any number of edges are allowed to be connected to the same vertex of the node cover. As such the mapping is surjective and not necessary to be injective.

ARC COVER (P): Same problem but find a set of edges (arcs) such that each node of the graph is connected to some edge of the arc cover. Again, the user decides a priori a size restriction on the cover set.

CHROMATIC NUMBER (KARP#12) and 2-COLORING (P)

Section titled “CHROMATIC NUMBER (KARP#12) and 2-COLORING (P)”

CHROMATIC NUMBER (KARP#12): Given a Graph GG with nn vertices. Choose kk (chromatic) colors and assign each color to a subset of vertices such that only one color is assigned per vertex and no edge has its two endpoints having the same color.

The verification is straightforward once the nodes have been “painted” (assigned a color).

2-COLORING (P): Same problem but k=2k = 2 specifically. The decision of whether and if so, how this can be done, can simply be accomplished by a breadth-first search. For any node with color ii, its neighbors will be given color jj. If a vertex has been colored with color ii and then it turns out that the neighbor has already been colored also with color ii, then the Graph cannot be colored as such. (Also known as Bipartite Checking.)

3D MATCHING (KARP#17) and BIPARTITE (2D) MATCHING (P)

Section titled “3D MATCHING (KARP#17) and BIPARTITE (2D) MATCHING (P)”

3D MATCHING (KARP#17): Start with a finite set TT. Consider the search space universe T3=T×T×TT^3 = T \times T \times T where x is the Cartesian Product. (Simply, consider a lattice structure of points in a 3d space.) The user selects a subset UU of these 3d points and further selects a subset WW of this subset UU such that W=T|W| = |T| and amongst these T|T| points in WW, focusing on each dimension separately, no two points have the same number in any of the dimensions.

BIPARTITE (2D) MATCHING (P): Identical to 3D matching, except that the original search space universe is T2=T×TT^2 = T \times T, instead of T3T^3.

JOB SEQUENCING (KARP#19) and SEQUENCING WITH DEADLINES (P)

Section titled “JOB SEQUENCING (KARP#19) and SEQUENCING WITH DEADLINES (P)”

JOB SEQUENCING (KARP#19): Given tasks (jobs) J1,…,Jn each requiring a specific number Ti days respectively to complete and each must be completed by their respective calendric deadline Di. If a Task Tj misses its deadline Dj, then Penalty Pj is deducted from the total payment. The problem is to determine whether a sequence (permuted order) of these jobs can be found so that the total amount of penalty accrued is <= a tolerance threshold.

SEQUENCING WITH DEADLINES (P): The same problem without penalties. The decision problem is to find such a sequence of jobs so that no more than k jobs miss their deadline.

0/1 INTEGER PROGRAMMING (KARP#2) and SOLVABILITY OF LINEAR EQUATIONS (P)

Section titled “0/1 INTEGER PROGRAMMING (KARP#2) and SOLVABILITY OF LINEAR EQUATIONS (P)”

0/1 INTEGER PROGRAMMING (KARP#2): Given a system of mm linear equations consisting nn variables whose main coefficients appear in a matrix AZ[m×n]A \in \mathbb{Z}^{[m \times n]} where Z\mathbb{Z} is the set of integers. AA is a two dimensional matrix of dimensions consisting of mm rows and nn columns. nn is also the number of variables to be solved for but these variables can only have binary {0,1}\{0,1\} values. This system is fully described by Ax=bAx = b where AA is as mentioned and bZ[m]b \in \mathbb{Z}^{[m]}, the target output of each of the mm rows (equations). The problem is to find nn binary values for XiX_i such that Ax=bAx=b.

SOLVABILITY OF LINEAR EQUATIONS (P): Same problem but every number can be real numbers. (This is directly solved by Gaussian Elimination algorithm.)

If Karp wrote this paper today, he probably would have said linear programming which Karmarkar(1984) showed is in PP. The word “programming” was first used by mathematicians (most typically numerical analysts) and simply means a system of equations to be solved. Linear programming is to optimize a function f(Xvec)f(Xvec) where XvecXvec is a vector of variables subject to a set of linear equations (linear means that the variables in the constraints are not to raised to any power other than 1 (ie the default).

KNAPSACK (KARP#18) and FRACTIONAL KNAPSACK (P)

Section titled “KNAPSACK (KARP#18) and FRACTIONAL KNAPSACK (P)”

KNAPSACK (KARP#18): Mathematically, given a sequence of items whose weights are aia_i, find binary values for xix_i such that SUM(aixi)=b\text{SUM}(a_i*x_i) = b.

xi=1x_i = 1 means packing item; xi=0x_i = 0 means don’t.

All aia_i and bb are positive integers and xix_i are binary values. (For example, pack a suitcase with no more than 50 lbs. on airplane.) In Prose (avoiding mathematical formulas), this can be reworded as the following: Given a set of nn items whose weights are aia_i, select a subset of them such that the total weight of the items selected is bb. In the literature, this formulation is called the Subset Sum Problem.

FRACTIONAL KNAPSACK (P): The same problem with values 0<=xi<=10 <= x_i <= 1. (For example, spices salesperson with a partially filled bottle as a sample.)

UNDIRECTED HAMILTONIAN CIRCUIT (KARP#10) and SHORTEST PATH (P)

Section titled “UNDIRECTED HAMILTONIAN CIRCUIT (KARP#10) and SHORTEST PATH (P)”

UNDIRECTED HAMILTONIAN CIRCUIT (KARP#10): Given graph G=(V,E)G = (V,E) where V=n|V| = n (directed in #9; undirected in #10), find a cycle that traverses (“goes through”) all nodes of the graph (termed Hamiltonian). The exhaustive search will generate all permutations of the nodes. A cycle is a permutation of the nodes such that each node is connected to the node in front of it. Verification is simply validating that each node is connected to the next node in the proposed permutation and the last node is connected to the first. The solution to the problem will be store in an array of size n.

SHORTEST PATH (P): Given Graph(V,E,w)\text{Graph}(V,E,w) where w(E)Rw(E) \to \mathbb{R} (each edge has a weight or cost w). Find a path between two vertices in a weighted graph such that the sum of the weights of its constituent edges is minimized. Undirected Hamiltonian Circuit is not a weighted graph per se, but the connection between the two can easily be achieved by simply considering an unweighted graph as if each edge has weight or cost = 1. There is a major distinction however in that the Hamiltonian Circuit of a simply weighted graph is the longest such path (max) whereas shortest path is shortest (min). I assume that Karp considered this a tweak because max vs min cut is P vs NP and is mentioned in this paper (lines 61-68 above). One last note to this discussion is that longest path is actually in P for directed acyclic graphs. In addition, a longest path problem can be converted to a shortest path problem if negative lengths are allowed. (They are dual problems of each other.) cf. Bulterman et al (2002) on trees and others have expanded to directed acyclic graphs based on Topological Sorting (Kahn, 1962).

FEEDBACK ARC SET (KARP#8) and ARC DELETION (P)

Section titled “FEEDBACK ARC SET (KARP#8) and ARC DELETION (P)”

FEEDBACK ARC SET (KARP#8): Given a graph G=(V,E)G = (V,E) and a bound kk on the size of the Feedback set,

The user selects a subset (cardinality k) of edges such that every cycle C in graph G shares (at least) one member of this feedback set with cycle C. How do we verify the perfect solution? There is no simple algorithm for generating all cycles found in a graph.

ARC DELETION (P): If a group of k arcs break all cycles of a graph, then delete these from the original graph.

If the proposed feedback set is correct, then the subgraph remaining after deleting the feedback set will not contain any cycles, and there polynomial timed algorithms that will verify that the resulting subgraph does not contain any cycles.

Conclusion: Karp’s 21 Problems and Tweaked Variants

Section titled “Conclusion: Karp’s 21 Problems and Tweaked Variants”

This concludes the list of NP-Complete problems mentioned in KARP-21 which have a “tweaked” version that is in PP.

The next worksheet groups a number of problems based on some commonality (the commonalities are summarized here.) Some other “groupings” for the KARP-21.

KARP#TITLESIn Common
1SatisfiabilityThe solution is expressed in terms of finding n boolean (0/1) variables.
20/1 Integer Programming
11SAT-3
18Knapsack
cf. line 105
7Feedback Node SetThe problem definitions of these all involve cycles.
8Feedback Arc Set
9Directed Hamiltonian Circuit
10Undirected Hamiltonian Circuit
3CliqueThese problems seek cliques for their solutions.
13Clique Cover
9Directed Hamiltonian CircuitThe solutions are finding a permutation of the elements of an original set.
10Undirected Hamiltonian Circuit
19Job Scheduling
12Chromatic NumberThe solutions are finding a permutation of the elements of an original set.
20Partition
21Max Cut
4Set PackingThese are all related to the mathematical (discrete math) notion of a partition.
6Set Covering
14Exact Cover
20Partition
5Node CoverThese are all the remaining problems not listed above.
15Hitting Set
16Steiner Tree
173D Matching

Now, we concentrate on the actual problem definitions and which problems are related in definition or in problem solving.

See the next worksheet tab for Karp 21 Problem Definitions and the elaboration on the above groupings.

We have concluded Karp’s paper and have discusses extensively the notion of NP-Completeness and 21 examples of this. Analogously, we can define

P-Completeness: A decision problem is P-complete (complete for the complexity class P) if it is in P and every problem in P can be reduced to it by an appropriate reduction.

(1) Circuit Value Problem

Given a hardware boolean circuit comprised of NOT, AND, OR circuits,

What will the boolean output be of a given boolean circuit for a given provided input.

(2) Step Counter Predicate

Given a Turing Machine TM implementing an algorithm and an input x for that machine, with a number T (as TIME) written in unary,

does that machine halt on that input within the first T steps?

Although beyond the scope of this course, one of the reasons that P-completeness is explored so much is that it helps understanding complexity on parallel processing machines.

cf. P-complete

cf. Circuit Value Problem