--- Video Title: Shortest and Longest Paths Description: Theory of Computation https://uvatoc.github.io/week11 22.3 Shortest and Longest Paths - Graph Review - Shortest Path Problem - Efficient Algorithm for Shortest Path - Longest Path Problem - Inefficient (Exponential Time) Algorithm for Longest Path Nathan Brunelle and David Evans University of Virginia --- So the next thing we're going to do is I'm going to talk about a few different problems, talk about some of the preliminary things we need to know, some of the preliminary definitions we need for these various problems, and we're going to talk about finding running times for those problems. We're going to look at four different problems today that are kind of in two pairs. The two in each pair are only very slightly different from one another, so they're very similar problems with One slight change to each. But we're going to see that that slight change makes actually a huge difference in the running time of that algorithm and how efficiently we can implement a solution to that question, that problem, that function. The first thing we're going to look at is looking at the shortest path in a graph. So given a graph and two nodes, I want to find what is the shortest path in terms of number of hops between those two nodes. And then I'm going to make a slight modification where I'm going to look at the longest path instead. Same problem, but now I want to find the longest path from point A to point B rather than the shortest path from point A to point B. The next pair of problems we're going to be looking at are what are called satisfiability problems. With these satisfiability problems, the idea for these is that we're given some Boolean expressions over some variables. And I want to know, can I assign true or false values to these variables in order to cause the whole expression to evaluate to true? I'm going to very quickly go through sort of a recap on what graphs look like. A graph is defined by a pair of sets, where the first set is going to be a set of what we call vertices or nodes. And then the second set is going to be edges that go between two nodes. Those can be ordered, those can be unordered. If we care about the order that those edges appear, we call that a directed graph. If we don't care about the order, we call that an undirected graph. So in this case, we can see that the nodes are A through I. Edges are going to be potentially ordered pairs of nodes. In this case, the way that I drew this, this was an undirected graph. So the edge that goes between A and B should have been the same as the edge that goes between B and A, because order doesn't matter. Typically, if you draw arrows on this, then now you're saying that order matters. So this says AB is a different edge from BA. So you could be able to, say, traverse from A to B, but not in the opposite direction. We can also, if we have something called a weighted graph, have numbers that are associated with each of these edges that tell us what is the weight or the cost of that edge in some sense. And there are several different ways that we can represent these graphs. If we wanted to write them down as data structures, one way we might do this is with an adjacency list. We say that two nodes are adjacent to one another if they share an edge. So we can have this adjacency list representation where we kind of say, for every single node that's in our graph, I'm just going to have a separate list of everything adjacent to it. So this says that A is adjacent to B and C. So we have edges that go from A to B and A to C. B has edges to A and to C and to E and so forth. So that is one way we could represent our graph, write down the definition of our graph. We could also have this adjacency matrix representation of our graph. Where this adjacency matrix representation says, I'm going to have this big table where there's going to be a row and a column for every node in this graph. And I'm going to write a one in every cell that represents an edge that exists. So here I have an edge from A to B. So the cell AB has a one in it. And then zeros are going every place else. I left those out just so that it was more readable. Since there is no edge from A to I, we're going to have a zero there from A to I. This is an adjacency matrix representation. Each one has its own advantages, disadvantages. For the purposes of what we're talking about in this class, those advantages and disadvantages aren't going to make a huge amount of difference. So the next thing we can talk about is a path. A path is a sequence of nodes where any two things that are next to each other in the sequence have to have an edge in between them. So here we have this path A, B, E, G, F, D. So that is that sequence of nodes A, B, E, G, F, D. And in order to be a valid path, we have to have an edge from each thing in the path to the next thing in the path. So that is we need to be able to follow edges to traverse that sequence of nodes. If I can't go directly from one node to the next one in the path or in that sequence, then that is not a valid path. So for instance, A, I is not a valid path because there is no edge that goes from A to I. Any such sequence we call a path, we call it a simple path if each node that appears appears at most once. That is, we're never visiting the same node twice. If we never visit the same node twice, that's a simple path. If we start and end at the same location, that's called a cycle. So we can have paths that are neither simple nor cycles. For instance, a path that looks something like that would be neither simple nor a cycle because we visited that node twice. A cycle is something that has to look like this. It has to start and end at the same place. A simple path is one that never crosses the same node twice. Something that is neither simple nor a cycle is like a path that has maybe an interior cycle or something like that. So the first problem we're going to look at and try to figure out the runtime for is this shortest path problem. The shortest path problem says if I have a graph and you gave me a start node, let's call that S, and an end node, let's call that T, how long is the shortest path from S to T? So in this case, our input is three things. A graph, a start node, and an end node. And our output is one thing, which is a number. That number being the length of that longest path. We're not interested in the identity of that path, just the length of it. In this case, if I wanted to find the length of the shortest path from A to E, that's going to be two, because I can do this in one, two hops. There are other paths that go from A to E as well. I could go A to C, C to D, D to E. So that's a path of length three, but that is longer than this path of length two. So our answer is two. So that is the shortest path, the length of the shortest path. So what's the runtime of this? I only care about the number of hops. I don't care about the weights of the edges, if that graph has weighted edges. So instead, I can use a much simpler algorithm to solve this. So I can use something called a breadth-first search, where the idea of a breadth-first search is we can start with our start node and sort of ripple outward in terms of distance until we find our end node. So if we wanted to look at some pseudocode for this, we could have all of our nodes in some sort of queue. And what we're going to do is add our start node to the queue. And then over and over and over again, until I have run out of nodes to visit, we're going to remove something from the queue and then add all of its unvisited neighbors to the queue. Over and over and over again. In this case, the first thing I'm going to add to the queue is A, because that's what I'm starting with. So I'm going to have A in the queue. And then I'm going to remove A from the queue and then add all of A's neighbors to the queue, B and C. So I'm going to remove A from the queue, and then add B and C to the queue. And the next thing I do is I remove B from the queue, and then I'm going to add CDE to the queue. C was already there, so I didn't re-add it. But here I'm going to add CD, and when I add E, I'm going to say, oh, that was my destination. I could get there in two hops. So every single time we are kind of going one further distance away, we're going to keep track of that was a node that was one farther away. As soon as we see our end node, then we knew our answer for this breadth first search. So it's basically how many ripples we had to do outward before we found our destination is our answer. So this super easy algorithm... This is basically going to be linear time by the size of the graph. It's the number of nodes in the graph plus the number of edges in the graph. This is a super easy algorithm, so this is going to be roughly a linear time algorithm in order to solve this one. So this is something that's generally considered to be a pretty efficient algorithm, or a very efficient algorithm. Linear time algorithms are generally understood to be efficient algorithms for solving various problems. So we're going to make now a very minor change, where instead of talking about the shortest path through our graph from point A to point B, I want to talk about the longest path in our graph from point A to point B. So if I'm trying to find the longest path from, say, A to E, then maybe that's going to be like A to C to D to F to G to I to H to E. So in this case, this would be like the longest simple path is what we're after. Let me add that in here. How long is the longest simple path from the start node to the end node? Turns out this one minor change of just changing it from a minimization problem to a maximization problem makes it a substantially more difficult problem to solve. One way that we could do this is we could enumerate all of the possible sequences of the nodes, of which there's going to be n factorial of them. Really n minus 1 factorial of them, since we knew which node was going to be our start node. So we could just enumerate all n minus 1 factorial different sequences of nodes, and then check whether or not each one of those was a path. And then once we have all of those that were valid paths, we just return the length of the longest one. So this would certainly give us our answer. But this is very, very slow. This is like an exponential time algorithm. This is more than 2 to the power n. Turns out we don't really know a way of doing substantially better than this. We don't know of any way to do this in better than an exponential time algorithm. So just this one minor change to a very simple problem makes this a very challenging problem to solve. We don't know a way to solve it kind of faster than this. It's substantially faster than this sort of silly brute force method. This, by the way, is something that we would call an inefficient algorithm. It's going to take like years to solve this for even graphs of like 20 nodes. So this is an inefficient algorithm.