--- Video Title: Common Misuses of Asymptotic Notation Description: Theory of Computation https://uvatoc.github.io/week5 11.4: Common Misuses of Asymptotic Notation - Characterizing the Running Time of a Java Program David Evans and Nathan Brunelle University of Virginia --- Now, hopefully, what big O means is clear. How it's often used in talking about efficiency of programs is like this. You might have something that says, here's some Java code. It is computing the sum of all numbers up to n. So n is some int. Got a value somewhere. And that segment of code has running time in big O of n. Are we happy with that? You've seen things kind of like this before. Maybe exactly like this. What's the actual running time of this code? Yeah. Certainly the actual clock time is going to depend on a lot of things. What this is trying to capture is to avoid all those details and capture, well, it's scaling with the size of n in a linear way. Which is probably true to some degree. How consistent is that with the mathematical notion of what big O of n is? What's the longest this code could actually run for? For real? If we're talking about a concrete Java program? If you wrote this code and ran it on your laptop from when I had the slide up, would it be already done? For any value of n you could pick? Yeah. So the int type has a maximum. It's actually a small maximum, right? It's only 32 bucks. It's a couple million. So the real running time of this, based on that maximum integer, is constant. It does not scale asymptotically. We know as the maximum total running time there's some constant. The other thing I want to point out, which is sorry I forgot earlier, is anytime we have actual code, we could use big O, but we should get a tight bound. The places where we need the loose bounds like big. O and omega are when we're talking about problems. When we're talking about a problem, we don't necessarily know the most efficient circuit for it, or the most efficient code, so that's why we can only get bounds. When we actually have code, we can say big O, but we always should be able to get a tight bound. Because we've got the code. We know exactly what's going on. There's no reason to not get a tight bound and know exactly what we're counting, right? So that's often a misuse of big O. It's not incorrect. It's just telling us a lot of this information, right? It would certainly be correct to also say this code has running time in big O end of the end of the end of the end. I guess if you answer that on 2110, you might not get full credit for all of the asymptotic analysis questions. But it would always be correct, right? So for code, you should always be able to get a tight bound. If you think of this as actual Java code, well, int is 32 bits. It's got some maximum value. It's constant time. We know the maximum it can run. It can never run longer than that, because this is not unbounded. Well, maybe that's still. Maybe we should say, well, of course that's not what someone means when they look at this code. They're trying to think of what it would mean in some more abstract, mythical version of Java that doesn't have this int type. Then are we okay with it? Are we now okay with, let's forget this, assume some unbounded integer type. Then are we okay with saying it has running time in theta n? So how long does each iteration take? This is doing add one. This is doing add i. How long does it take to add two numbers? Probably when you analyze this. Do you analyze it saying, oh, add simple, it's constant time. Can addition really be constant time? Yeah. So what would it mean if you could do addition in constant time? And now instead of Java ints, we've got add that takes in two numbers. Let's see. We've got our add function. And it outputs their sum. If that was constant time, what would it mean? Yeah. Could you even look at the inputs? If this is constant time, right, you would have to know the sum without even looking at every bit in the inputs. Because just looking at every bit in input is not constant time. The time it takes to look at all the bits in input depends on how big the number is. So this cannot be constant time. To be constant time, you've got to be able to get the output without looking at the whole input. If the size of the input can increase. And certainly if we're talking about add with unbounded numbers, the size of the input can increase, right? So a reasonable add is going to be linear in the length of the inputs. So either you're assuming if add's constant for 32-bit ints, add's pretty close to constant. It's certainly bounded by some constant. For integer adds, it actually probably does take exactly the same amount of time for any two numbers. For more complicated operations like floating point, they do take different amounts of time, which is another potential leak of what you're computing on. So we've got a problem. Either we're talking about unbounded ints, and then it's probably n-squared. Because our natural way of doing add is linear in... Actually, no, it's not going to be n-squared. What is it going to be? So n is the value of n. This is going to be linear in the length of n, right? So this is going to be more like n-log n. Which is probably not the correct answer. Or at least not what would be deemed the correct answer for this question in most contexts. If you're assuming constant, your integers have to be bounded. And then the whole thing is constant time. Or it's non-constant, and you've got to actually analyze these operations carefully. And all of them are going to be sort of in the log of the magnitude event. So this is correct. But it's only correct because it's constant time. That's probably not the answer that's desired when people look at this code. At least with our definitions, and if we're careful about computing models, those are the answers that we should have.