johnswentworth comments on Halpern’s paper—A refutation of Cox’s theorem?

johnswentworth 16 Aug 2021 17:39 UTC
3 points
I just want to do some machine learning! Is there an interpretation of probability that doesn’t have inconsistencies, loopholes, etc. and is always reliable—not just under certain conditions like “when there are enough bits of randomness”
I sympathize.
Most important thing: when I say it “works very well even when things are small”, I mean that I personally have used Jaynes-style probability on prediction problems with very small data and results have generally made sense, and lots of other people have too. There’s empirical evidence that it works.
So we can only rely on the plausibility interpretation asymptotically i.e. as #of data points $⟶ \infty$ or as #of variables $⟶ \infty$ ?
No; smoothness assumptions should be able to substitute for infinite limits.
Here’s a conceptual analogy: suppose we’re trying to estimate some continuous function on the interval [0, 1), and we have (noiseless) measurements of the function at grid-points $[0, 1 / 2^{n}, 2 / 2^{n}, 3 / 2^{n}, . . ., (2^{n} - 1) / 2^{n}]$ . Problem is, we have no idea what the function does between those points—it could go all over the place! Two possible ways to get around this:
- Take a limit as $n \to \infty$ , so that our “grid” covers the whole interval.
- Use a smoothness assumption: if we assume some bound on how much the function varies within the little window between grid-points, then we can get reasonably-tight estimates of the function-values between grid points.
I haven’t looked into the details, but the Cox loophole should be basically-similar to this. The whole issue is that we have “too few grid-points” in e.g. Halpern’s example, so smoothness assumptions should be able to patch over that problem, at least approximately (which is all we need in practice).
A different way to frame it: we should be able to weaken the need for “lots of grid-points” by instead strengthening the continuity assumption to impose smoothness bounds even at not-infinitesimal distances.
Note that this is the move we almost always make when dealing with “continuous functions” in the real world. We almost never have an actually-infinitely-fine “grid”; we only have some finite precision, and the function doesn’t “move around too much” at finer scales. In general, when using math about “continuous functions” in the real world, we’re almost always substituting a smoothness assumption for the infinite limit.
What do you think about alternatives axiomatisations that have been proposed for patching up Jaynes-Cox probability theory?
I haven’t looked into these, but I expect they’re generally pretty similar for most practical purposes.