Scott, I don’t dispute what you say. I just suggest that the confusing term “in the worst case” be replaced by the more accurate phrase “supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated ‘random’”.
While this is an accurate characterization of what the term technically means, it does not really capture all the reasons why worst-case complexity is studied. I would say worst-case complexity is studied because
It allows more general statements, since the statement does not have to be conditioned on an input distribution
For many problems, both efficiently solvable as well as “hard”, the average-case complexity is not really different from the worst case.
Note that complexity theory studies the complexity of problems, not really the complexity of specific algorithms. So while it is true that the naive implementation of quicksort has n log n complexity on average but n^2 complexity in the worst case, what complexity theory studies is the hardness of sorting, and that has complexity n log n in both the worst case and the average case (assuming the uniform distribution over inputs).
Many problems believed to be hard are provably equally hard in both the worst case and the average case. For example, computing discrete logarithms using a randomized algorithm efficiently in the average case implies an algorithm to compute it efficiently in all cases.
Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.
This actually does not seem obvious, and I’m not sure it can be proved. The problem is that while it is true that there must exist some particular random string (indeed, we can assume most of them) that works for a given problem size, in the sense that the randomized algorithm when run with that one fixed string will have expected running time equal (to within a constant factor, say) to the original average, it is not easy to see how one can efficiently find such a string using a deterministic algorithm.
This is the same sort of problem one encounters when derandomizing BPP. It can be proved that if you have a BPP algorithm, then for every input size n, there exists one random string (in fact, all but an exponentially small fraction will work) that gives a deterministic algorithm that runs in time polynomial in n. However, only existence is known, and there is no known efficient way to find this string given n. So the proof only shows that BPP is contained in P/poly, not BPP = P.
While this is an accurate characterization of what the term technically means, it does not really capture all the reasons why worst-case complexity is studied. I would say worst-case complexity is studied because
It allows more general statements, since the statement does not have to be conditioned on an input distribution
For many problems, both efficiently solvable as well as “hard”, the average-case complexity is not really different from the worst case.
Note that complexity theory studies the complexity of problems, not really the complexity of specific algorithms. So while it is true that the naive implementation of quicksort has n log n complexity on average but n^2 complexity in the worst case, what complexity theory studies is the hardness of sorting, and that has complexity n log n in both the worst case and the average case (assuming the uniform distribution over inputs).
Many problems believed to be hard are provably equally hard in both the worst case and the average case. For example, computing discrete logarithms using a randomized algorithm efficiently in the average case implies an algorithm to compute it efficiently in all cases.
This actually does not seem obvious, and I’m not sure it can be proved. The problem is that while it is true that there must exist some particular random string (indeed, we can assume most of them) that works for a given problem size, in the sense that the randomized algorithm when run with that one fixed string will have expected running time equal (to within a constant factor, say) to the original average, it is not easy to see how one can efficiently find such a string using a deterministic algorithm.
This is the same sort of problem one encounters when derandomizing BPP. It can be proved that if you have a BPP algorithm, then for every input size n, there exists one random string (in fact, all but an exponentially small fraction will work) that gives a deterministic algorithm that runs in time polynomial in n. However, only existence is known, and there is no known efficient way to find this string given n. So the proof only shows that BPP is contained in P/poly, not BPP = P.