[SEQ RERUN] Occam’s Razor
Today’s post, Occam’s Razor was originally published on 26 September 2007. A summary (taken from the LW wiki):
To a human, Thor feels like a simpler explanation for lightning than Maxwell’s equations, but that is because we don’t see the full complexity of an intelligent mind. However, if you try to write a computer program to simulate Thor and a computer program to simulate Maxwell’s equations, one will be much easier to accomplish. This is how the complexity of a hypothesis is measured in the formalisms of Occam’s Razor.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we’ll be going through Eliezer Yudkowsky’s old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Einstein’s Arrogance, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day’s sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
I would really like to see some solid estimates here, not just the usual hand-waving. Maybe someone better qualified can critique the following.
By “a computer program to simulate Maxwell’s equations” EY presumably means a linear PDE solver for initial boundary value problems. The same general type of code should be able to handle the Schroedinger equation. There are a number of those available online, most written in Fortran or C, with the relevant code size about a megabyte. The Kolmogorov complexity of a solution produced by such a solver is probably of the same order as its code size (since the solver effectively describes the strings it generates), so, say, about 10^6 “complexity units”. It might be much lower, but this is clearly the upper bound.
One wrinkle is that the initial and boundary conditions also have to be given, and the size of the relevant data heavily depends on the desired precision (you have to give the Dirichlet or Neumann boundary conditions at each point of a 3D grid, and the grid size can be 10^9 points or larger). On the other hand, the Kolmogorov complexity of this initial data set should be much lower than that, as the values for the points on the grid are generated by a piece of code usually much smaller than the main engine. So, in the first approximation, we can assume that it does not add significantly to the overall complexity.
Things get dicier if we try to estimate in a similar way the complexity of the models like General Relativity, the Navier-Stokes equations or the Quantum Field Theory, due to their non-linearity and a host of other issues. When no general-purpose solver is available, how does one estimate the complexity? Currently, a lot of heuristics are used, effectively hiding part of the algorithm in the human mind, thus making any estimate unreliable, as human mind (or “Thor’s mind”) is rather hard to simulate.
One can argue that the equations themselves for each of the theories are pretty compact, so the complexity cannot be that high, but then, as Feynman noted, all of physical laws can be written simply as A=0, where A hides all the gory details. We still have to specify the algorithm to generate the predictions, and that brings us back to numerical solvers.
I also cannot resist noting, yet again, that all interpretation of QM that rely on solving the Schrodinger equation have exactly the same complexity, as estimated above, and so cannot be distinguished by the Occam’s razor. This applies, in particular, to MWI vs Copenhagen.
It is entirely possible that my understanding of how to calculate the Kolmogorov complexity of a physical theory is flawed, so I welcome any feedback on the matter. But no hand-waving, please.
That way of formalizing Occam’s razor, anyway.
I would be happy to see a different formalization of it… What I have tried to use here is something like the Minimum Message Length formalism (only without the “Minimum” part, since it is not easily computable), which is, admittedly, a rather simplistic approach. Hopefully someone can do better.
Very interesting comment, upvoted. I can’t tell that what you are saying makes sense but I welcome and endorse your examination and attempt to reduce the formalization of Occam’s razor to practice.
One think always bothered me about this post. Its main thrust appears to be that one should use (something like) a Komogorov prior rather than one’s intuitive prior because that is what the universe looks like, except we then adjust this prior with the same evidence we used to determine what the prior should be. That seems like double-counting evidence.
I really wish there were a post that was more advanced than this one but more intuitive than Shane Legg’s explanation. This post doesn’t really convey a full technichal understanding of how to actually apply Solomonoff Induction or MML, and Legg’s version just shows a succinct derivation.
Perhaps the following review article can be of some help here: A Philosophical Treatise of Universal Induction