Tetraspace comments on Occam’s Razor: In need of sharpening?

Tetraspace 4 Aug 2019 22:28 UTC
11 points
The formalisation used in the Sequences (and algorithmic information theory) is the complexity of a hypothesis is the shortest computer program that can specify that hypothesis.
An illustrative example is that, when explaining lightning, Maxwell’s equations are simpler in this sense than the hypothesis that Thor is angry because the shortest computer program that implements Maxwell’s equations is much simpler than an emulation of a humanlike brain and its associated emotions.
In the case of many-worlds vs. Copenhagen interpretation, a computer program that implemented either of them would start with the same algorithm (Schrodinger’s equation), but (the claim is) that the computer program for Copenhagen would have to have an extra section that specified how collapse upon observation worked that many-worlds wouldn’t need.
- cousin_it 4 Aug 2019 22:55 UTC
  23 points
  Parent
  
  An illustrative example is that, when explaining lightning, Maxwell’s equations are simpler in this sense than the hypothesis that Thor is angry because the shortest computer program that implements Maxwell’s equations is much simpler than an emulation of a humanlike brain and its associated emotions.
  
  I just realized that this argument, long accepted on LW, seems to be wrong. Once you’ve observed a chunk of binary tape that has at least one humanlike brain (you), it shouldn’t take that many bits to describe another (Thor). The problem with Thor isn’t that he’s humanlike—it’s that he has supernatural powers, something you’ve never seen. These supernatural powers, not the humanlike brain, are the cause of the complexity penalty. If something non-supernatural happens, e.g. you find your flower vase knocked over, it’s fine to compare hypotheses “the wind did it” vs “a human did it” without penalizing the latter for humanlike brain complexity.
  
  (I see Peter de Blanc and Abram Demski already raised this objection in the comments to Eliezer’s original post, and then everyone including me cheerfully missed it. Ouch.)
  - habryka 6 Aug 2019 18:39 UTC
    3 points
    Parent
    I originally agreed with this comment, but after thinking about it for two more days I disagree. Just because you see a high-level phenomenon, doesn’t mean you have to have that high-level phenomenon as a low-level atom in your model of the world.
    - Rob Bensinger 6 Aug 2019 22:11 UTC
      7 points
      Parent
      Humans might not be a low-level atom, but obviously we have to privilege the hypothesis ‘something human-like did this’ if we’ve already observed a lot of human-like things in our environment.
      Suppose I’m a member of a prehistoric tribe, and I see a fire in the distance. It’s fine for me to say ‘I have a low-ish prior on a human starting the fire, because (AFAIK) there are only a few dozen humans in the area’. And it’s fine for me to say ‘I’ve never seen a human start a fire, so I don’t think a human started this fire’. But it’s not fine for me to say ‘It’s very unlikely a human started that fire, because human brains are more complicated than other phenomena that might start fires’, even if I correctly intuit how and why humans are more complicated than other phenomena.
      The case of Thor is a bit more complicated, because gods are different from humans. If Eliezer and cousin_it disagree on this point, maybe Eliezer would say ‘The complexity of the human brain is the biggest reason why you shouldn’t infer that there are other, as-yet-unobserved species of human-brain-ish things that are very different from humans’, and maybe cousin_it would say ‘No, it’s pretty much just the differentness-from-observed-humans (on the “has direct control over elemental forces” dimension) that matters, not the fact that it has a complicated brain.’
      If that’s a good characterization of the disagreement, then it seems like Eliezer might say ‘In ancient societies, it was much more reasonable to posit mindless “supernatural” phenomena (i.e., mindless physical mechanisms wildly different from anything we’ve observed) than to posit intelligent supernatural phenomena.’ Whereas the hypothetical cousin-it might say that ancient people didn’t have enough evidence to conclude that gods were any more unlikely than mindless mechanisms that were similarly different from experience. Example question: what probability should ancient people have assigned to
      The regular motion of the planets is due to a random process plus a mindless invisible force, like the mindless invisible force that causes recently-cooked food to cool down all on its own.
      vs.
      The regular motion of the planets is due to deliberate design / intelligent intervention, like the intelligent intervention that arranges and cooks food.
      - cousin_it 6 Aug 2019 22:53 UTC
        6 points
        Parent
        Yeah, that’s a good summary of my view (except maybe I wouldn’t even persist into the fourth paragraph). Thanks!
      - habryka 6 Aug 2019 22:49 UTC
        4 points
        Parent
        This seems right, though something about this still feels confusing to me in a way I can’t yet put into words. Might write a comment at a later point in time.
  - Abe Dillon 5 Aug 2019 19:37 UTC
    3 points
    Parent
    Once you’ve observed a chunk of binary tape that has at least one humanlike brain (you), it shouldn’t take that many bits to describe another (Thor).
    Maxwell’s Equations don’t contain any such chunk of tape. In current physical theories (the Standard Model and General Relativity), the brains are not described in the math, rather brains are a consequence of the theories carried out under specific conditions.
    Theories are based on postulates which are equivalent to axioms in mathematics. They are the statements from which everything else is derived but which can’t be derived themselves. Statements like “the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.”
    At the turn of the 20th century, scientists were confused by the apparent contradiction between Galilean Relativity and the implication from Maxwell’s Equations and empirical observation that the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer. Einstein formulate Special Relativity by simply asserting that both were true. That is: the postulates of SR are:
    the laws of physics are invariant (i.e. identical) in all inertial frames of reference (i.e. non-accelerating frames of reference); and
    the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source or observer.
    The only way to reconcile those two statements is if time and space become variables. The rest of SR is derived from those two postulates.
    Quantum Field Theory is similarly derived from only a few postulates. None of them postulate that some intelligent being just exists. Any program that would describe such a postulate would be relatively enormous.
    - cousin_it 5 Aug 2019 20:28 UTC
      2 points
      Parent
      
      In current physical theories (the Standard Model and General Relativity), the brains are not described in the math, rather brains are a consequence of the theories carried out under specific conditions.
      
      Yeah. But not sure you got the point of my argument. If your brain is a consequence of theory+conditions, why should the hypothesis of another humanlike brain (Thor) be penalized for excessive complexity under the same theory+conditions?
      - Abe Dillon 6 Aug 2019 4:00 UTC
        0 points
        Parent
        You’re trying to conflate theory, conditions, and what they entail in a not so subtle way. Occam’s razor is about the complexity of a theory, not conditions, not what the theory and conditions entail. Just the theory. The Thor hypothesis puts Thor directly in the theory. It’s not derived from the theory under certain conditions. In the case of the Thor theory, you have to assume more to arrive at the same conclusion.
        It’s really not that complicated.
        cousin_it 6 Aug 2019 7:03 UTC
        5 points
        Parent
        Thor isn’t quite as directly in the theory :-) In Norse mythology he’s a creature born to a father and mother, a consequence of initial conditions just like you.
        
        Sure, you’d have to believe that initial conditions were such that would lead to Thor. But if I told you I had a neighbor named Bob, you’d have no problem believing that initial conditions were such that would lead to Bob the neighbor. You wouldn’t penalize the Bob hypothesis by saying “Bob’s brain is too complicated”, so neither should you penalize the Thor hypothesis for that reason.
        
        The true reason you penalize the Thor hypothesis is because he has supernatural powers, unlike Bob. Which is what I’ve been saying since the first comment.
        
        Abe Dillon 6 Aug 2019 18:30 UTC
        1 point
        Parent
        Thor isn’t quite as directly in the theory :-) In Norse mythology...
        Tetraspace Grouping’s original post clearly invokes Thor as an alternate hypothesis to Maxwell’s equations to explain the phenomenon of electromagnetism. They’re using Thor as a generic stand-in for the God hypothesis.
        Norse mythology he’s a creature born to a father and mother, a consequence of initial conditions just like you.
        Now you’re calling them “initial conditions”. This is very different from “conditions” which are directly observable. We can observe the current conditions of the universe, come up with theories that explain the various phenomena we see and use those theories to make testable predictions about the future and somewhat harder to test predictions about the past. I would love to see a simple theory that predicts that the universe not only had a definite beginning (hint: your High School science teacher was wrong about modern cosmology) but started with sentient beings given the currently observable conditions.
        Sure, you’d have to believe that initial conditions were such that would lead to Thor.
        Which would be a lineage of Gods that begins with some God that created everything and is either directly or indirectly responsible for all the phenomena we observe according to the mythology.
        I think you’re the one missing Tetraspace Grouping’s point. They weren’t trying to invoke all of Norse mythology, they were trying to compare the complexity of explaining the phenomenon of electromagnetism by a few short equations vs. saying some intelligent being does it.
        You wouldn’t penalize the Bob hypothesis by saying “Bob’s brain is too complicated”, so neither should you penalize the Thor hypothesis for that reason.
        The existence of Bob isn’t a hypothesis it’s not used to explain any phenomenon. Thor is invoked as the cause of, not consequence of, a fundamental phenomenon. If I noticed some loud noise on my roof every full moon, and you told me that your friend bob likes to do parkour on rooftops in my neighborhood in the light of the full moon, that would be a hypothesis for a phenomenon that I observed and I could test that hypothesis and verify that the noise is caused by Bob. If you posited that Bob was responsible for some fundamental forces of the universe, that would be much harder for me to swallow.
        The true reason you penalize the Thor hypothesis is because he has supernatural powers, unlike Bob. Which is what I’ve been saying since the first comment.
        No. The supernatural doesn’t just violate Occam’s Razor: it is flat-out incompatible with science. The one assumption in science is naturalism. Science is the best system we know for accumulating information without relying on trust. You have to state how you performed an experiment and what you observed so that others can recreate your result. If you say, “my neighbor picked up sticks on the sabbath and was struck by lightning” others can try to repeat that experiment.
        It is, indeed, possible that life on Earth was created by an intelligent being or a group of intelligent beings. They need not be supernatural. That theory, however; is necessarily more complex than any a-biogenesis theory because you have to then explain how the intelligent designer(s) came about which would eventually involve some form of a-biogenesis.
        cousin_it 6 Aug 2019 19:14 UTC
        2 points
        Parent
        Yeah, I agree it’s unlikely that the equations of nature include a humanlike mind bossing things around. I was arguing against a different idea—that lightning (a bunch of light and noise) shouldn’t be explained by Thor (a humanlike creature) because humanlike creatures are too complex.
        
        Abe Dillon 6 Aug 2019 21:11 UTC
        1 point
        Parent
        Well, the original comment was about explaining lightning
        You’re right. I think I see your point more clearly now. I may have to think about this a little deeper. It’s very hard to apply Occam’s razor to theories about emergent phenomena. Especially those several steps removed from basic particle interactions. There are, of course, other ways to weigh on theory against another. One of which is falsifiability.
        If the Thor theory must be constantly modified so to explain why nobody can directly observe Thor, then it gets pushed towards un-falsifiability. It gets ejected from science because there’s no way to even test the theory which in-turn means it has no predictive power.
        As I explained in one of my replies to Jimdrix_Hendri, thought there is a formalization for Occam’s razor, Solomonoff induction isn’t really used. It’s usually more like: individual phenomena are studied and characterized mathematically, then; links between them are found that explain more with fewer and less complex assumptions.
        In the case of Many Worlds vs. Copenhagen, it’s pretty clear cut. Copenhagen has the same explanatory power as Many Worlds and shares all the postulates of Many Worlds, but adds some extra assumptions, so it’s a clear violation of Occam’s razor. I don’t know of a *practical* way to handle situations that are less clear cut.
  - habryka 5 Aug 2019 0:08 UTC
    2 points
    Parent
    I made a kind of related point in: https://www.lesswrong.com/posts/3xnkw6JkQdwc8Cfcf/is-the-human-brain-a-valid-choice-for-the-universal-turing
- habryka 4 Aug 2019 22:32 UTC
  2 points
  Parent
  There has been some discussion in the community about whether you want to add memory or runtime-based penalties as well. At least Paul comments on it a bit in “What does the Universal Prior actually look like?”
- TAG 5 Aug 2019 10:22 UTC
  1 point
  Parent
  
  but (the claim is) that the computer program for Copenhagen would have to have an extra section that specified how collapse upon observation worked that many-worlds wouldn’t need.
  
  If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you have to identify the subset of bits relating to your world. That’s extra complexity which isn’t accounted for because it’s being done by hand, as it were.
  
  Whichever interpretation you hold to, you need some way of discarding unobserved results, even for SU&C.
  - Abe Dillon 5 Aug 2019 19:48 UTC
    3 points
    Parent
    That’s not how algorithmic information theory works. The output tape is not a factor in the complexity of the program. Just the length of the program.
    The size of the universe is not a postulate of the QFT or General Relativity. One could derive what a universe containing only two particles would look like using QFT or GR. It’s not a fault of the theory that the universe actually contains ~ 10^80 particles†.
    People used to think the solar system was the extent of the universe. Just over a century ago, the Milky Way Galaxy was thought to be the extent of the universe. Then it grew by a factor of over 100 Billion when we found that there were that many galaxies. That doesn’t mean that our theories got 100 Billion times more complex.
    If you take the Many Worlds interpretation and decide to follow the perspective of a single particle as though it were special, Copenhagen is what falls out. You’re left having to explain what makes that perspective so special.
    † Now we know that the observable universe may only be a tiny fraction of the universe at large which may be infinite. In-fact, there are several different types of multiverse that could exist simultaneously.
    - TAG 8 Aug 2019 10:44 UTC
      1 point
      Parent
      
      That’s not how algorithmic information theory works. The output tape is not a factor in the complexity of the program. Just the length of the program.
      
      And that’s the problem! You want the shortest programme that predicts your observations, but output of a TM that just runs the SWE doesn’t predict your and only your observations. You have to manually perform an extra operation to extract them, and that’s extra complexity that isn’t part of the “complexity of the programme”. The argument that MWI is algorithmically simple cheats by hiding some complexity outside the programme.
      
      The size of the universe is not a postulate of the QFT or General Relativity.
      
      That’s not relevant to my argument.
      
      If you take the Many Worlds interpretation and decide to follow the perspective of a single particle as though it were special, Copenhagen is what falls out.
      
      Operationally, something like copenhagen, ie. neglect of unobserved predictions, and renormalisation , has to occur, because otherwise you can’t make predictions. Hence my comment about SU&C. Different adds some extra baggage about what that means—occurred in a different branch versus didn’t occur—but the operation still needs to occur.
      - Abe Dillon 9 Aug 2019 0:58 UTC
        1 point
        Parent
        Thinking this through some more, I think the real problem is that S.I. is defined in the perspective of an agent modeling an environment, so the assumption that Many Worlds has to put any un-observable on the output tape is incorrect. It’s like stating that Copenhagen has to output all the probability amplitudes onto the output tape and maybe whatever dice god rolled to produce the final answer as well. Neither of those are true.
        TAG 9 Aug 2019 4:25 UTC
        1 point
        Parent
        Well, you’ve got to test that the programme is at least correct so that you can can go on to find the simplest correct programme. How would you do that?
      - Abe Dillon 9 Aug 2019 0:37 UTC
        1 point
        Parent
        output of a TM that just runs the SWE doesn’t predict your and only your observations. You have to manually perform an extra operation to extract them, and that’s extra complexity that isn’t part of the “complexity of the programme”.
        First, can you define “SWE”? I’m not familiar with the acronym.
        Second, why is that a problem? You should want a theory that requires as few assumptions as possible to explain as much as possible. The fact that it explains more than just your point of view (POV) is a good thing. It lets you make predictions. The only requirement is that it explains at least your POV.
        The point is to explain the patterns you observe.
        >The size of the universe is not a postulate of the QFT or General Relativity.
        That’s not relevant to my argument.
        It most certainly is. If you try to run the Copenhagen interpretation in a Turing machine to get output that matches your POV, then it has to output the whole universe and you have to find your POV on the tape somewhere.
        The problem is: That’s not how theories are tested. It’s not like people are looking for a theory that explains electromagnetism and why they’re afraid of clowns and why their uncle “Bob” visited so much when they were a teenager and why their’s a white streak in their prom photo as though a cosmic ray hit the camera when the picture was taken, etc. etc.
        The observations we’re talking about are experiments where a particular phenomenon is invoked with minimal disturbance from the outside world (if you’re lucky enough to work in a field like Physics which permits such experiments). In a simple universe that just has an electron traveling toward a double-slit wall and a detector, what happens? We can observe that and we can run our model to see what it predicts. We don’t have to run the Turing machine with input of 10^80 particles for 13.8 billion years then try to sift through the output tape to find what matches our observations.
        Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist).
        Operationally, something like copenhagen, ie. neglect of unobserved predictions, and renormalisation , hasto occur, because otherwise you can’t make predictions.
        First of all: Of course you can use many worlds to make predictions, You do it every time you use the math of QFT. You can make predictions about entangled particles, can’t you? The only thing is: while the math of probability is about weighted sums of hypothetical paths, in MW you take it quite literally as paths the actually being traversed. That’s what you’re trading for the magic dice machine in non-deterministic theories.
        Secondly: Just because Many Worlds says those worlds exist, doesn’t mean you have to invent some extra phenomenon to justify renormalization. At the end of the day the unobservable universe is still unobservable. When you’re talking about predicting what you might observe when you run experiment X, it’s fine to ultimately discard the rest of the multiverse. You just don’t need to make up some story about how your perspective is special and you have some magic power to collapse waveforms that other particles don’t have.
        Hence my comment about SU&C. Different adds some extra baggage about what that means—occurred in a different branch versus didn’t occur—but the operation still needs to occur.
        Please stop introducing obscure acronyms without stating what they mean. It makes your argument less clear. More often than not it results in *more* typing because of the confusion it causes. I have no idea what this sentence means. SU&C = Single Universe and Collapse? Like objective collapse? “Different” what?
        TAG 9 Aug 2019 8:16 UTC
        1 point
        Parent
        S.I is a inept tool for measuring the relative complexity of CI and MWI because it is a bad match for both. It’s a bad match for MWI because of the linear, or., if you prfer sequential, nature of the output tape, and its a bad match for CI because its deterministic and CI isn’t. You can simulate collapse with a PRNG, but it won’t give you the right random numbers. Also, CI’ers think collapse is a fundamental process, so it loads the dice to represent it with a multi-step PRNG. It should be just a call to one RAND instruction to represent their views fairly.
        
        TAG 9 Aug 2019 7:58 UTC
        1 point
        Parent
        
        First, can you define “SWE”?
        
        SWE=Schroedinger Wave Equation. SU&C=Shut Up and Calculate.
        
        You should want a theory that requires as few assumptions as possible to explain as much as possible
        
        The topic is using S.I to quantify O’s R, and S.I is not a measure on assumptions , it is a measure on algorithmic complexity.
        
        The fact that it explains more than just your point of view (POV) is a good thing. It lets you make predictions.
        
        Explaining just my POV doesn’t stop me making predictions. In fact predicting the observations of one observer is exactly how S.I is supposed to work. It also prevents various forms of cheating. I don’t know why you are using “explain” rather than “predict”. Deutsch favours explanation over prediction but the very relevant point here is that how well a theory explains is an unquantifiable human judgement. Predicting observations, on the other hand, is definite an quantifiable..that’s the whole point of using S.I as a mechanistic process to quantify O’s. R.
        
        Predicting every observers observations is a bad thing from the POV of proving that MWI is simple, because if you allow one observer to pick out their observations from a morass of data, then the easisest way of generating data that contains any substring is a PRNG. You basically ending up proving that “everything random” is the simplest explanation. Private Messaging pointed that out, too.
        
        The point is to explain the patterns you observe.
        
        How do you do that with S.I?
        
        It most certainly is. If you try to run the Copenhagen interpretation in a Turing machine to get output that matches your POV, then it has to output the whole universe and you have to find your POV on the tape somewhere.
        
        No. I run the TM with my experimental conditions as the starting state, and I keep deleting unobserved results, renormalising and re-running. That’s how physics is done any way—what I have called Shut Up and Calculate.
        
        Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist)
        
        If you perform the same operations with S.I set up to emulate MW you’ll get the same results. That’s just a way of restating the truism that all interpretations agree on results. But you need a difference in algorithmic complexity as well.
        
        Same thing for the Many Worlds interpretation. It explains the results of our experiments just as well as Copenhagen, it just doesn’t posit any special phenomenon like observation, observation is just what entanglement looks like from the perspective of one of the entangled particles (or system of particles if you’re talking about the scientist).
        
        You seem to be saying that MWI is a simpler ontological picture now. I dispute that, but its beside the point because what we are discussing is using SI to quantify O’s R via alorithmic complexity.
        
        First of all: Of course you can use many worlds to make predictions,
        
        I didn’t say MW can’t make predictions at all. I am saying that operationally, predicition-making is the same under all interpretations, and that neglect of unobserved outcomes always has to occur.
        
        You just don’t need to make up some story about how your perspective is special
        
        The point about predicting my observations is that they are the only ones I can test. It’s operating, not metaphysical.
      - TAG 8 Aug 2019 13:36 UTC
        1 point
        Parent
        Incidentally, this was pointed out before:-
        
        https://www.lesswrong.com/posts/Kyc5dFDzBg4WccrbK/an-intuitive-explanation-of-solomonoff-induction#ceq7HLYhx4YiciKWq
        
        Abe Dillon 9 Aug 2019 0:44 UTC
        1 point
        Parent
        That’s a link to somebody complaining about how someone else presented an argument. I have no idea what point you think it makes that’s relevant to this discussion.