Occam’s Razor: In need of sharpening?
In the first half of the 14th century, the Franciscan friar and logician, William of Occam proposed a heuristic for deciding between alternative explanations of physical observables. As William put it: “Entities should not be multiplied without necessity”. Or, as Einstein reformulated it 600 years later: “Everything should be made as simple as possible, but not simpler”.
Occam’s Razor, as it became known, was enthusiastically adopted by the scientific community and remains the unquestioned criterion for deciding between alternative hypotheses to this day. In my opinion, its success is traceable to two characteristics:
o Utility: OR is not a logical deduction. Neither is it a statement about which hypothesis is most likely. Instead, it is a procedure for selecting a theory which makes further work as easy is possible. And by facilitating work, we can usually advance further and faster.
o Combinability. OR is fully compatible with each the epistemological stances which have been adopted within science from time to time (empiricism, rationalism, positivism, falsifiability, etc.)
It is remarkable that such a widely applied principle is exercised with so little thought to its interpretation. I thought of this recently upon reading an article claiming that the multiverse interpretation of quantum mechanics is appealing because it is so simple. Really?? The multiverse explanation proposes the creation of an infinitude of new universes at every instant. To me, that makes it an egregiously complex hypothesis. But if someone decides that it is simple, I have no basis for refutation, since the notion of what it means for a theory to be simple has never been specified.
What do we mean when we call something simple? My naive notion is to begin by counting parts and features. A milling device made up of two stones, one stationary one mobile, fitted with a stick for rotation by hand becomes more complex when we add devices to capture and transmit water power for setting the stone in motion. And my mobile phone becomes more complex each time I add a new app. But these notions don’t serve to answer the question whether Lagrange’s formulation of classical mechanics, based on action, is simpler than the equivalent formulation by Newton, based on his three laws of forces.
Isn’t remarkable that scientists, so renown for their exactitude, have been relying heavily on so vague a principle for 700 years?
Can we do anything to make it more precise?
This specific topic has also been addressed quite extensively in the sequences. See the complete quantum mechanics sequence: https://lesswrong.com/s/Kqs6GR7F5xziuSyGZ
Help me out here, habryka.
I’ve read part way through the article. The first paragraph seemed to be carrying on a continuing conversation (John Searle comes to mind). Then it seemed to change direction abruptly, addressing a problem in mechanism design, namely how to assign payoffs so as to incentivise an agent in a certain game to be honest about his predictions.
These are interesting topics, but I struggle to see the relevance.
EY’s article is also very long. I haven’t read it to the end. Can you point out where to look or, better, summarise the point you were making?
Thanks a lot!
I or Habryka might be able to summarize the key points sometime later, but one of the important bits here is that LessWrong is generally a site where people are expected to have read through the sequences (not necessarily meaning that you have to right away, they are indeed super long. But if you’re going to pose questions that are answered in the sequences, longterm users will probably ask that you read them before putting in time clarifying misunderstandings)
(I realize this is a huge ask, but the site is sort of built around the notion that we build knowledge over time, rather than rehashing the same arguments over and over. This does mean accumulating more and more background reading, alas. We do have projects underway to distill the background reading into smaller chunks but it’s an ongoing process)
Raemon, I understand your remark. But I’ve detected another problem. I’ve dropped the ball by posting my reply to the wrong remark. So, I’m going to have to do some cutting and pasting. Please bear with me.
The EY article really is super long (but interesting) and seems to go all over the place. I’d like to do habryka the courtesy of an answer reasonably promptly. I hope I’m not out of order by asking habryka for guidance about what is on his mind.
HI Raemon,
I’m gratified to see my humble contribution receive attention, including from you. I’m learning. So thanks.
This is my first independent posting (I’ve commented before) and I didn’t notice it appearing in the front page “latest posts”. I understand you are a LW organisor. Can you help me understand the trigger criteria for an article to appear under “latest posts”? Thanks a lot! JH
Latest Posts follows a hackernews algorithm, where things appear in Latest Posts based on how much karma they have, and how recent they are. Your post has relatively low karma (most likely because the topic wasn’t that novel for LW readers), so it probably appeared for a few hours in the Latest Posts column and then eventually moved off the bottom of the page.
(By now, a 4 days later, it most likely is appearing in Latest Posts but you have to click ‘load more’ many times before it’ll show up)
hi habryka,
It wasn’t my purpose to open a discussion of interpretation of quantum mechanics. I only took this as an example.
My point is something else entirely: scientists have been leaning very heavily on William of Occam for a long while now. But try to pin down what they mean by a the relative complexity of an explanation, and they shrug their shoulders.
It’s not even the case that scientists disagree on which metric to apply. (That would just be normal business!) But, as far as I know, no one has made a serious effort to define a metric. Maybe because they can’t?
A very unscientific behaviour indeed!
Yes, and the sequence (as well as the post I linked below) tries to define a complexity measure based on Solomonoff Induction, which is a formalization of Occam’s Razor.
I have the impression that Solomonoff Induction provides a precise procedure to a very narrow set of problems with little practical applicability elsewhere.
How would you use Solomonoff Induction to choose between the two alternative theories mentioned in the article: one based on Newton’s Force Laws, the other based on the principle of least action. (Both theories have the same range of validity and produce the identical results).
But it isn’t very successful, because if you cast SI on terms of a linear string of bits, as is standard, you are building in a kind of single universe assumption.
First, I assume you mean a sequential string of bits. “Linear” has a well defined meaning in math that doesn’t make sense in the context you used it.
Second, can you explain what you mean by that? It doesn’t sound correct. I mean, an agent can only make predictions about its observable universe, but that’s true of humans too. We can speculate about multiverses and how they may shape our observations (e.g. the many worlds interpretation of QFT), but so could an SI agent.
I think you’re example of interpreting quantum mechanics gets pretty close to the heart of the matter. It’s one thing to point at solomonoff induction and say, “there’s your formalization”. It’s quite another to understand how Occam’s Razor is used in practice.
Nobody actually tries to convert the Standard Model to the shortest possible computer program, count the bits, and compare it to the shortest possible computer program for string theory or whatever.
What you’ll find, however; is that some theories amount to other theories but with an extra postulate or two (e.g. many worlds vs. Copenhagen). So they are strictly more complex. If it doesn’t explain more than the simpler theory the extra complexity isn’t justified.
A lot of the progression of science over the last few centuries has been toward unifying diverse theories under less complex, general frameworks. Special relativity helped unify theories about the electric and magnetic forces, which were then unified with the weak nuclear force and eventually the strong nuclear force. A lot of that work has helped explain the composition of the periodic table and the underlying mechanisms to chemistry. In other words, where there used to be many separate theories, there are now only two theories that explain almost every phenomenon in the observable universe. Those two theories are based on surprisingly few and surprisingly simple postulates.
Over the 20th century, the trend was towards reducing postulates and explaining more, so it was pretty clear that Occam’s razor was being followed. Since then, we’ve run into a bit of an impasse with GR and QFT not nicely unifying and discoveries like dark energy and dark matter.
There is a substantial philosophical literature on Occam’s Razor and related issues:
https://plato.stanford.edu/entries/simplicity/
The Many Worlds interpretation of Quantum Mechanics is considered simple because it takes the math at face value and adds nothing more. There is no phenomenon of wave-function collapse. There is no special perspective of some observer. There is no pilot wave. There are no additional phenomena or special frames of reference imposed on the math to tell a story. You just look at the equations and that’s what they say is happening.
The complexity of a theory is related to the number of postulates you have to make. For instance: Special Relativity is actually based on two postulates:
the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
The only way to reconcile those two postulates are if space and time become variables.
The rest is derived from those postulates.
Quantum Filed Theory is based on Special Relativity and the Principal of Least Action.
The idea of counting postulates is attractive, but it harbours a problem which reminds me of a story. There once was an editor assigned to review an article. The editor was conscientious and raised 15 questions. But his boss thought this was too many and would only permit five questions. Now the editor cared about his points, so he kept them by generous application of the conjunctive: “and”.
We could come up formal requirements to avoid anything as crude as the editor’s behaviour. But, I think we’d still find that each postulate encapsulates many concepts, and that a fair comparison between competing theories should consider the relative complexity of the concepts as well. So, we are still far away from assigning each theory a numerical complexity score.
A more serious problem is that a postulate count differs from what we usually mean by complexity, which generally reflects in some sense the heterogeneity and volume of considerations that go into applying a theory. Ptolemy’s and Newton’s model of the solar system give similar results. It’s true that Ptolemy’s theory is more complex in its expression. But even if its expression were simpler, I’d still label Newton’s theory simpler, since the Ptolemaic theory requires many more steps to apply.
Yes, I agree. A simple postulate count is not sufficient. That’s why I said complexity is *related* to it rather than the number itself. If you want a mathematical formalization of Occam’s Razor, you should read up on Solomonoff’s Inductive Inference.
To address your point about the “complexity” of the “Many Worlds” interpretation of quantum field theory (QFT): The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
† Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.
The formalisation used in the Sequences (and algorithmic information theory) is the complexity of a hypothesis is the shortest computer program that can specify that hypothesis.
An illustrative example is that, when explaining lightning, Maxwell’s equations are simpler in this sense than the hypothesis that Thor is angry because the shortest computer program that implements Maxwell’s equations is much simpler than an emulation of a humanlike brain and its associated emotions.
In the case of many-worlds vs. Copenhagen interpretation, a computer program that implemented either of them would start with the same algorithm (Schrodinger’s equation), but (the claim is) that the computer program for Copenhagen would have to have an extra section that specified how collapse upon observation worked that many-worlds wouldn’t need.
I just realized that this argument, long accepted on LW, seems to be wrong. Once you’ve observed a chunk of binary tape that has at least one humanlike brain (you), it shouldn’t take that many bits to describe another (Thor). The problem with Thor isn’t that he’s humanlike—it’s that he has supernatural powers, something you’ve never seen. These supernatural powers, not the humanlike brain, are the cause of the complexity penalty. If something non-supernatural happens, e.g. you find your flower vase knocked over, it’s fine to compare hypotheses “the wind did it” vs “a human did it” without penalizing the latter for humanlike brain complexity.
(I see Peter de Blanc and Abram Demski already raised this objection in the comments to Eliezer’s original post, and then everyone including me cheerfully missed it. Ouch.)
I originally agreed with this comment, but after thinking about it for two more days I disagree. Just because you see a high-level phenomenon, doesn’t mean you have to have that high-level phenomenon as a low-level atom in your model of the world.
Humans might not be a low-level atom, but obviously we have to privilege the hypothesis ‘something human-like did this’ if we’ve already observed a lot of human-like things in our environment.
Suppose I’m a member of a prehistoric tribe, and I see a fire in the distance. It’s fine for me to say ‘I have a low-ish prior on a human starting the fire, because (AFAIK) there are only a few dozen humans in the area’. And it’s fine for me to say ‘I’ve never seen a human start a fire, so I don’t think a human started this fire’. But it’s not fine for me to say ‘It’s very unlikely a human started that fire, because human brains are more complicated than other phenomena that might start fires’, even if I correctly intuit how and why humans are more complicated than other phenomena.
The case of Thor is a bit more complicated, because gods are different from humans. If Eliezer and cousin_it disagree on this point, maybe Eliezer would say ‘The complexity of the human brain is the biggest reason why you shouldn’t infer that there are other, as-yet-unobserved species of human-brain-ish things that are very different from humans’, and maybe cousin_it would say ‘No, it’s pretty much just the differentness-from-observed-humans (on the “has direct control over elemental forces” dimension) that matters, not the fact that it has a complicated brain.’
If that’s a good characterization of the disagreement, then it seems like Eliezer might say ‘In ancient societies, it was much more reasonable to posit mindless “supernatural” phenomena (i.e., mindless physical mechanisms wildly different from anything we’ve observed) than to posit intelligent supernatural phenomena.’ Whereas the hypothetical cousin-it might say that ancient people didn’t have enough evidence to conclude that gods were any more unlikely than mindless mechanisms that were similarly different from experience. Example question: what probability should ancient people have assigned to
vs.
Yeah, that’s a good summary of my view (except maybe I wouldn’t even persist into the fourth paragraph). Thanks!
This seems right, though something about this still feels confusing to me in a way I can’t yet put into words. Might write a comment at a later point in time.
Maxwell’s Equations don’t contain any such chunk of tape. In current physical theories (the Standard Model and General Relativity), the brains are not described in the math, rather brains are a consequence of the theories carried out under specific conditions.
Theories are based on postulates which are equivalent to axioms in mathematics. They are the statements from which everything else is derived but which can’t be derived themselves. Statements like “the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.”
At the turn of the 20th century, scientists were confused by the apparent contradiction between Galilean Relativity and the implication from Maxwell’s Equations and empirical observation that the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer. Einstein formulate Special Relativity by simply asserting that both were true. That is: the postulates of SR are:
the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
The only way to reconcile those two statements is if time and space become variables. The rest of SR is derived from those two postulates.
Quantum Field Theory is similarly derived from only a few postulates. None of them postulate that some intelligent being just exists. Any program that would describe such a postulate would be relatively enormous.
Yeah. But not sure you got the point of my argument. If your brain is a consequence of theory+conditions, why should the hypothesis of another humanlike brain (Thor) be penalized for excessive complexity under the same theory+conditions?
You’re trying to conflate theory, conditions, and what they entail in a not so subtle way. Occam’s razor is about the complexity of a theory, not conditions, not what the theory and conditions entail. Just the theory. The Thor hypothesis puts Thor directly in the theory. It’s not derived from the theory under certain conditions. In the case of the Thor theory, you have to assume more to arrive at the same conclusion.
It’s really not that complicated.
Thor isn’t quite as directly in the theory :-) In Norse mythology he’s a creature born to a father and mother, a consequence of initial conditions just like you.
Sure, you’d have to believe that initial conditions were such that would lead to Thor. But if I told you I had a neighbor named Bob, you’d have no problem believing that initial conditions were such that would lead to Bob the neighbor. You wouldn’t penalize the Bob hypothesis by saying “Bob’s brain is too complicated”, so neither should you penalize the Thor hypothesis for that reason.
The true reason you penalize the Thor hypothesis is because he has supernatural powers, unlike Bob. Which is what I’ve been saying since the first comment.
Tetraspace Grouping’s original post clearly invokes Thor as an alternate hypothesis to Maxwell’s equations to explain the phenomenon of electromagnetism. They’re using Thor as a generic stand-in for the God hypothesis.
Now you’re calling them “initial conditions”. This is very different from “conditions” which are directly observable. We can observe the current conditions of the universe, come up with theories that explain the various phenomena we see and use those theories to make testable predictions about the future and somewhat harder to test predictions about the past. I would love to see a simple theory that predicts that the universe not only had a definite beginning (hint: your High School science teacher was wrong about modern cosmology) but started with sentient beings given the currently observable conditions.
Which would be a lineage of Gods that begins with some God that created everything and is either directly or indirectly responsible for all the phenomena we observe according to the mythology.
I think you’re the one missing Tetraspace Grouping’s point. They weren’t trying to invoke all of Norse mythology, they were trying to compare the complexity of explaining the phenomenon of electromagnetism by a few short equations vs. saying some intelligent being does it.
The existence of Bob isn’t a hypothesis it’s not used to explain any phenomenon. Thor is invoked as the cause of, not consequence of, a fundamental phenomenon. If I noticed some loud noise on my roof every full moon, and you told me that your friend bob likes to do parkour on rooftops in my neighborhood in the light of the full moon, that would be a hypothesis for a phenomenon that I observed and I could test that hypothesis and verify that the noise is caused by Bob. If you posited that Bob was responsible for some fundamental forces of the universe, that would be much harder for me to swallow.
No. The supernatural doesn’t just violate Occam’s Razor: it is flat-out incompatible with science. The one assumption in science is naturalism. Science is the best system we know for accumulating information without relying on trust. You have to state how you performed an experiment and what you observed so that others can recreate your result. If you say, “my neighbor picked up sticks on the sabbath and was struck by lightning” others can try to repeat that experiment.
It is, indeed, possible that life on Earth was created by an intelligent being or a group of intelligent beings. They need not be supernatural. That theory, however; is necessarily more complex than any a-biogenesis theory because you have to then explain how the intelligent designer(s) came about which would eventually involve some form of a-biogenesis.
Yeah, I agree it’s unlikely that the equations of nature include a humanlike mind bossing things around. I was arguing against a different idea—that lightning (a bunch of light and noise) shouldn’t be explained by Thor (a humanlike creature) because humanlike creatures are too complex.
You’re right. I think I see your point more clearly now. I may have to think about this a little deeper. It’s very hard to apply Occam’s razor to theories about emergent phenomena. Especially those several steps removed from basic particle interactions. There are, of course, other ways to weigh on theory against another. One of which is falsifiability.
If the Thor theory must be constantly modified so to explain why nobody can directly observe Thor, then it gets pushed towards un-falsifiability. It gets ejected from science because there’s no way to even test the theory which in-turn means it has no predictive power.
As I explained in one of my replies to Jimdrix_Hendri, thought there is a formalization for Occam’s razor, Solomonoff induction isn’t really used. It’s usually more like: individual phenomena are studied and characterized mathematically, then; links between them are found that explain more with fewer and less complex assumptions.
In the case of Many Worlds vs. Copenhagen, it’s pretty clear cut. Copenhagen has the same explanatory power as Many Worlds and shares all the postulates of Many Worlds, but adds some extra assumptions, so it’s a clear violation of Occam’s razor. I don’t know of a *practical* way to handle situations that are less clear cut.
I made a kind of related point in: https://www.lesswrong.com/posts/3xnkw6JkQdwc8Cfcf/is-the-human-brain-a-valid-choice-for-the-universal-turing
There has been some discussion in the community about whether you want to add memory or runtime-based penalties as well. At least Paul comments on it a bit in “What does the Universal Prior actually look like?”
If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you have to identify the subset of bits relating to your world. That’s extra complexity which isn’t accounted for because it’s being done by hand, as it were.
Whichever interpretation you hold to, you need some way of discarding unobserved results, even for SU&C.
That’s not how algorithmic information theory works. The output tape is not a factor in the complexity of the program. Just the length of the program.
The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
If you take the Many Worlds interpretation and decide to follow the perspective of a single particle as though it were special, Copenhagen is what falls out. You’re left having to explain what makes that perspective so special.
† Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.
And that’s the problem! You want the shortest programme that predicts your observations, but output of a TM that just runs the SWE doesn’t predict your and only your observations. You have to manually perform an extra operation to extract them, and that’s extra complexity that isn’t part of the “complexity of the programme”. The argument that MWI is algorithmically simple cheats by hiding some complexity outside the programme.
That’s not relevant to my argument.
Operationally, something like copenhagen, ie. neglect of unobserved predictions, and renormalisation , has to occur, because otherwise you can’t make predictions. Hence my comment about SU&C. Different adds some extra baggage about what that means—occurred in a different branch versus didn’t occur—but the operation still needs to occur.
Thinking this through some more, I think the real problem is that S.I. is defined in the perspective of an agent modeling an environment, so the assumption that Many Worlds has to put any un-observable on the output tape is incorrect. It’s like stating that Copenhagen has to output all the probability amplitudes onto the output tape and maybe whatever dice god rolled to produce the final answer as well. Neither of those are true.
Well, you’ve got to test that the programme is at least correct so that you can can go on to find the simplest correct programme. How would you do that?
First, can you define “SWE”? I’m not familiar with the acronym.
Second, why is that a problem? You should want a theory that requires as few assumptions as possible to explain as much as possible. The fact that it explains more than just your point of view (POV) is a good thing. It lets you make predictions. The only requirement is that it explains at least your POV.
The point is to explain the patterns you observe.
It most certainly is. If you try to run the Copenhagen interpretation in a Turing machine to get output that matches your POV, then it has to output the whole universe and you have to find your POV on the tape somewhere.
The problem is: That’s not how theories are tested. It’s not like people are looking for a theory that explains electromagnetism and why they’re afraid of clowns and why their uncle “Bob” visited so much when they were a teenager and why their’s a white streak in their prom photo as though a cosmic ray hit the camera when the picture was taken, etc. etc.
The observations we’re talking about are experiments where a particular phenomenon is invoked with minimal disturbance from the outside world (if you’re lucky enough to work in a field like Physics which permits such experiments). In a simple universe that just has an electron traveling toward a double-slit wall and a detector, what happens? We can observe that and we can run our model to see what it predicts. We don’t have to run the Turing machine with input of 10^80 particles for 13.8 billion years then try to sift through the output tape to find what matches our observations.
Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist).
First of all: Of course you can use many worlds to make predictions, You do it every time you use the math of QFT. You can make predictions about entangled particles, can’t you? The only thing is: while the math of probability is about weighted sums of hypothetical paths, in MW you take it quite literally as paths the actually being traversed. That’s what you’re trading for the magic dice machine in non-deterministic theories.
Secondly: Just because Many Worlds says those worlds exist, doesn’t mean you have to invent some extra phenomenon to justify renormalization. At the end of the day the unobservable universe is still unobservable. When you’re talking about predicting what you might observe when you run experiment X, it’s fine to ultimately discard the rest of the multiverse. You just don’t need to make up some story about how your perspective is special and you have some magic power to collapse waveforms that other particles don’t have.
Please stop introducing obscure acronyms without stating what they mean. It makes your argument less clear. More often than not it results in *more* typing because of the confusion it causes. I have no idea what this sentence means. SU&C = Single Universe and Collapse? Like objective collapse? “Different” what?
S.I is a inept tool for measuring the relative complexity of CI and MWI because it is a bad match for both. It’s a bad match for MWI because of the linear, or., if you prfer sequential, nature of the output tape, and its a bad match for CI because its deterministic and CI isn’t. You can simulate collapse with a PRNG, but it won’t give you the right random numbers. Also, CI’ers think collapse is a fundamental process, so it loads the dice to represent it with a multi-step PRNG. It should be just a call to one RAND instruction to represent their views fairly.
SWE=Schroedinger Wave Equation. SU&C=Shut Up and Calculate.
The topic is using S.I to quantify O’s R, and S.I is not a measure on assumptions , it is a measure on algorithmic complexity.
Explaining just my POV doesn’t stop me making predictions. In fact predicting the observations of one observer is exactly how S.I is supposed to work. It also prevents various forms of cheating. I don’t know why you are using “explain” rather than “predict”. Deutsch favours explanation over prediction but the very relevant point here is that how well a theory explains is an unquantifiable human judgement. Predicting observations, on the other hand, is definite an quantifiable..that’s the whole point of using S.I as a mechanistic process to quantify O’s. R.
Predicting every observers observations is a bad thing from the POV of proving that MWI is simple, because if you allow one observer to pick out their observations from a morass of data, then the easisest way of generating data that contains any substring is a PRNG. You basically ending up proving that “everything random” is the simplest explanation. Private Messaging pointed that out, too.
How do you do that with S.I?
No. I run the TM with my experimental conditions as the starting state, and I keep deleting unobserved results, renormalising and re-running. That’s how physics is done any way—what I have called Shut Up and Calculate.
If you perform the same operations with S.I set up to emulate MW you’ll get the same results. That’s just a way of restating the truism that all interpretations agree on results. But you need a difference in algorithmic complexity as well.
You seem to be saying that MWI is a simpler ontological picture now. I dispute that, but its beside the point because what we are discussing is using SI to quantify O’s R via alorithmic complexity.
I didn’t say MW can’t make predictions at all. I am saying that operationally, predicition-making is the same under all interpretations, and that neglect of unobserved outcomes always has to occur.
The point about predicting my observations is that they are the only ones I can test. It’s operating, not metaphysical.
Incidentally, this was pointed out before:-
https://www.lesswrong.com/posts/Kyc5dFDzBg4WccrbK/an-intuitive-explanation-of-solomonoff-induction#ceq7HLYhx4YiciKWq
That’s a link to somebody complaining about how someone else presented an argument. I have no idea what point you think it makes that’s relevant to this discussion.
A significant fraction of the sequences (also known as Rationality: AI to Zombies) deals with exactly this issue. This article basically answers your question: https://lesswrong.com/posts/afmj8TKAqH6F2QMfZ/a-technical-explanation-of-technical-explanation
It seems to me that that piece has to do a lot of scaffolding because it doesn’t use the compression of ‘degrees of freedom.’ Eg your explanans has to have fewer degrees of freedom than the explanandum.
There are some posts on complexity measures, which make note of a) degrees of freedom, b) free variables, ways of penalizing them, and how to calculate these measures. They probably rely on formalization though.
Links: Model Comparison Sequence, What are principled ways for penalising complexity in practice?*, Complexity Penalties in Statistical Learning,
*This might be someone asking the same question as you. While the other links might hold an answer, this one has multiple.
I’d like to point out a source of confusion around Occam’s Razor that I see you’re falling for, dispelling it will make things clearer: “you should not multiplicate entities without necessities!”. This means that Occam’s Razor helps decide between competing theories if and only if they have the same explanation and predictive power. But in the history of science, it was almost never the case that competing theories had the same power. Maybe it happened a couple of times (epicycles, the Copenhagen interpretation), but in all other instances a theory was selected not because it was simpler, but because it was much more powerful.
Contrary to popular misconception, Occam’s razor gets to be used very, very rarely.
We do have, anyway, a formalization of that principle in algorithmic information theory: Solomonoff induction. A agent that, to predict the outcome of a sequence, places the highest probabilities in the shortest compatible programs, will eventually outperform every other class of predictor. The catch here is the word ‘eventually’: in every measure of complexity, there’s a constant that offset the values due to the definition of the reference universal Turing machine. Different references will indicate different complexities for the same first programs, but all measure will converge after a finite amount.
This is also why I think that the problem explaining thunders with “Thor vs clouds” is such a poor example of Occam’s razor: Solomonoff induction is a formalization of Occam razor for theories, not explanations. Due to the aforementioned constant, you cannot have absolutely simpler model of a finite sequence of event. There’s no such a thing, it will always depend on the complexity of the starting Turing machine. However, you can have eventually simpler models of infinite sequence of events (infinite sequence predictor are equivalent to programs). In that case, the natural event program will prevail because it will allow to control better the outcomes.