then you want to use the simpler one—not because it’s more likely to be ‘true’, but because it allows you to think more clearly.
Congratulations, you have now officially broken with Bayesianism and become a heretic. Your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think. Surely you can imagine all manner of ugly scenarios if that were the case.
then you want to use the simpler one—not because it’s more likely to be ‘true’, but because it allows you to think more clearly.
Congratulations, you have now officially broken with Bayesianism and become a heretic. Your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think. Surely you can imagine all manner of ugly scenarios if that were the case.
Preferring to use a simpler theory doesn’t require believing it to be more probable than it is. Expected utility maximization to the rescue.
When there is a testable physical difference between hypotheses, we want the one that makes the correct prediction.
When there is no testable physical difference between hypotheses, we want to use the one that makes it easiest to make the correct prediction. By definition, we can never get a prediction that wouldn’t have happened were we using the other hypothesis, but we’ll get that prediction quicker. Neither hypothesis can be said to be ‘the way the world really is’ because there’s no way to distinguish between them, but the simpler hypothesis is more useful.
Wha? Then you must order the equivalent theories by running time, not code length. The two are frequently opposed: for example, the fastest known algorithm for matrix multiplication (in the big-O sense) is very complex compared to the naive one. In short, I feel you’re only digging yourself deeper into the hole of heresy.
Well, firstly, who said I cared at all about ‘heresy’? I’m not replying here in order to demonstrate my adherence to the First Church Of Bayes or something...
And while there are, obviously, occasions where ordering by running time and code length are opposed, in general when comparing two arbitrary programs which generate the same output, the longer one will also take longer. This is obvious when you consider it—if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly. However, there are an infinite number of programs in the sample space that will generate the output more slowly, and that are also longer than X—just keep adding an extra ‘sleep 1’ before it prints the output, to take a trivial example.
In general, the longer the program, the more operations it performs and the longer it takes, when you’re sampling from the space of all possible programs. So while run time and code length aren’t perfectly correlated, they’re a very decent proxy for each other.
if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly.
Amusingly, this statement is false. If a program Z is faster than X, then there exist infinitely many versions of Z that also run faster than X: just add some never-executed code under an if(false) branch. I’m not sure whether your overall argument can be salvaged.
You’re quite correct, there. I was only including code paths that can ever actually be executed, in the same way I wouldn’t count comments as part of the program. This seems to me to be the correct thing to do, and I believe one could come up with some more rigorous reasoning along the lines of my previous comment, but I’m too tired right now to do so. I’ll think about this...
I was only including code paths that can ever actually be executed...
Wouldn’t a meta-algorithm that determines which paths are executable in a given algorithm necessarily not be able to do so for every possible algorithm unless it was functionally equivalent to a halting oracle?
I’m not sure how problematic this is to your idea, but it’s one advantage that the simpler system of just counting total lines has.
I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)
The length of the program description is not really the measure of how easy it is to make a correct prediction. In fact, the shortest program for predicting is almost never the one you should use to make predictions in practice, precisely because it is normally quite slow. It is also very rarely the program which is easiest to manipulate mentally, since short programs tend to be very hard for humans to reason about.
Like PaulFChistiano said, the shortest accurate program isn’t particularly useful, but its predictive model is more a priori probable according to the universal / Occamian prior.
It’s really hard (and uncomputable) to discover, understand, and verify the shortest program that computes a certain input->prediction mapping. But we use the “shortest equivalent program” concept to judge which human-understandable program is more a priori probable.
When there is no testable physical difference between hypotheses, we want to use the one that makes it easiest to make the correct prediction.
Yes, we want to use the hypothesis that is easiest to use. But if we use it, does that commit us to ‘believing’ in it? In the case of no testable physical difference between hypotheses, I propose that someone has no obligation to believe (or admit they believe) that particular theory instead of another one with the same predictions.
I enthusiastically propose that we say we ‘have’ a belief only when we use or apply a belief for which there is an empirical difference in the predictions of the belief compared to the non-belief. Alternatively, we can use some other word instead of belief, that will serve to carry this more relevant distinction.
(Later: I realize this comment is actually directed at cousin_it, since he was the one that wrote, ‘your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think’. I also think I may have reiterated what Vladimir_Nesov wrote here.)
Never mind usefulness, it seems to me that “Evolution by natural selection occurs” and “God made the world and everything in it, but did so in such a way as to make it look exactly as if evolution by natural selection occured” are not the same hypothesis, that one of them is true and one of them is false, that it is simplicity that leads us to say which is which, and that we do, indeed, prefer the simpler of two theories that make the same predictions, rather than calling them the same theory.
While my post was pretty misguided (I even wrote an apology for it), your comment looks even more misguided to me. In effect, you’re saying that between Lagrangian and Hamiltonian mechanics, at most one can be “true”. And you’re also saying that which of them is “true” depends on the programming language we use to encode them. Are you sure you want to go there?
In effect, you’re saying that between Lagrangian and Hamiltonian mechanics, at most one can be “true”.
We may even be able to observe which one. Actually, I am pretty sure that if I looked closely at QM and these two formulations, I would go with Hamiltonian mechanics.
Ah, but which Hamiltonian mechanics is the true one: the one that says real numbers are infinite binary expansions, or the one that says real numbers are Dedekind cuts? I dunno, your way of thinking makes me queasy.
That point of view has far-reaching implications that make me uncomfortable. Consider two physical theories that are equivalent in every respect, except they use different definitions of real numbers. So they have a common part C, and theory A is the conjunction of C with “real numbers are Dedekind cuts”, while theory B is the conjunction of C with “real numbers are infinite binary expansions”. According to your and Eliezer’s point of view as I understand it right now, at most one of the two theories can be “true”. So if C (the common part) is “true”, then ordinary logic tells us that at most one definition of the real numbers can be “true”. Are you really, really sure you want to go there?
I think there’s a distinction that should be made explicit between “a theory” and “our human mental model of a theory.” The theory is the same, but we rightfully try to interpret it in the simplest possible way, to make it clearer to think about.
Usually, two different mental models necessarily imply two different theories, so it’s easy to conflate the two, but sometimes (in mathematics especially) that’s just not true.
Hmmm. But the very first posting in the sequences says something about “making your beliefs pay rent in expected experience”. If you don’t expect different experiences in choosing between the theories, it seems that you are making an unfalsifiable claim.
I’m not totally convinced that the two theories do not make different predictions in some sense. The evolution theory pretty much predicts that we are not going to see a Rapture any time soon, whereas the God theory leaves the question open. Not exactly “different predictions”, but something close.
If theories A and B pay rent on the same house, then the theory (A OR B) pays enough rent so that the stronger theory A need pay no additional rent at all. Yet you seem to prefer A to B, and also to (A OR B).
Let’s say A = (MWI is correct) and B = (Copenhagen)
The equivalent of “A OR B” is the statement “either Copenhagen or MWI is correct”, and I’m sure everyone here assigns “A OR B” a higher prior than either A or B separately.
But that’s not really a theory, it’s a disjunction between two different theories, so ofcourse we want to understand which of the two is actually the correct one. Not sure what your objection is here.
I’m not sure I have one. It is just a little puzzling how we might reconcile two things:
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
ETA: Bayes’s rule only deals with the fraction of reality-space spanned by a sentence, never with the number of characters needed to express the sentence.
There’s a useful heuristic to solve tricky questions about “truths” and “beliefs”: reduce them to questions about decisions and utilities. For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes trivial once you introduce any payoff structure. Maybe we could apply this heuristic here? Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Actually, it might help a reasoner saddled with bounded rationality. One theory might require less computation to get from theory to prediction, or it might require less memory resources to store. Having a fast, easy-to-use theory can be like money in the bank to someone who needs lots and lots of predictions.
It might be interesting to look at that idea someone here was talking about that merged ideas from Zadeh’s fuzzy logic with Bayesianism. Instead of simple Bayesian probabilities which can be updated instantaneously, we may need to think of fuzzy probabilities which grow sharper as we devote cognitive resources to refining them. But with a good, simple theory we can get a sharper picture quicker.
I don’t understand your point about bounded rationality. If you know theory X is equivalent to theory Y, you can believe in X more, but use Y for calculations.
Thats the definition of a free-floating belief isn’t it? If you only have so much computational resources even storing theory X in your memory is a waste of space.
I think cousin_it’s point was that if you have a preference for both quickly solving problems and knowing the true nature of things, then if theory X tells you the true nature of things but theory Y is a hackjob approximation that nevertheless gives you the answer you need much faster (in computer terms, say, a simulation of the actual event vs a monte-carlo run with the probabilities just plugged in) then it might be positive utility even under bounded rationality to keep both theory X and theory Y.
edit: the assumption is that we have at least mild preferences for both and the bounds on our rationality are sufficiently high that this is the preferred option for most of science).
It’s one thing if you want to calculate a theory that is simpler because you don’t have a need for perfect accuracy. Newton is good enough for a large fraction of physics calculations and so even though it is strictly wrong I imagine most reasoners would have need to keep it handy because it is simpler. But if you have two empirically equivalent and complete theories X and Y, and X is computationally simpler so you rely on X for calculating predictions, it seems to me you believe x. What would saying “No, actually I believe in Y not X” even mean in this context? The statement is unconnected to anticipated experience and any conceivable payoff structure.
Better yet, taboo “belief”. Say you are an agent with a program that allows you to calculate, based on your observations, what your observations will be in the future contingent on various actions. You have another program that ranks those futures according to a utility function. What would it mean to add “belief” to this picture?
Your first paragraph looks misguided to me: does it imply we should “believe” matrix multiplication is defined by the naive algorithm for small n, and the Strassen and Coppersmith-Winograd algorithms for larger values of n? Your second paragraph, on the other hand, makes exactly the point I was trying to make in the original post: we can assign degrees of belief to equivalence classes of theories that give the same observable predictions.
For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes completely clear once you introduce a payoff structure.
Heh, I was just working on a post on that point.
Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in. Therefore the reasoner should assign degrees of belief to equivalence classes of theories rather than individual theories.
I agree that that is true about equivalent formulations, literally isomorphic theories (as in this comment), but is that really the case about MWI vs. Copenhagen? Collapse is claimed as something that’s actually happening out there in reality, not just as another way of looking at the same thing. Doesn’t it have to be evaluated as a hypothesis on its own, such that the conjunction (MWI & Collapse) is necessarily less probable than just MWI?
Except the whole quantum suicide thing does create payoff structures. In determining weather or not to play a game of Quantum Russian Roulette you take your estimated winnings for playing if MWI and Quantum immortality is true and your estimated winnings if MWI or Quantum immortality is false and weigh them according to the probability you assign each theory.
(ETA: But this seems to be a quirky feature of QM interpretation, not a feature of empirically equivalent theories generally.)
(ETA 2: And it is a quirky feature of QM interpretation because MWI+Quantum Immortality is empirically equivalent to single world theories is a really quirky way.)
IMO quantum suicide/immortality is so mysterious that it can’t support any definite conclusions about the topic we’re discussing. I’m beginning to view it as a sort of thread-killer, like “consciousness”. See a comment that mentions QI, collapse the whole thread because you know it’s not gonna make you happier.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
Since when is that the Bayesian tradition? Citation needed.
the truth of a statement is to be found through its observable consequences.
Since when?
Well, I guess I am taking “observable consequences” to be something closely related to P(E|H)/P(E). And I am taking “the truth of a statement” to have something to do with P(H|E) adjusted for any bias that might have been present in the prior P(H).
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth. Things outside our past light cone exist despite being unobservable. Asking about a claim about some unobservable “Then how can we know whether it’s true?” is irrelevant to evaluating whether it is the sort of thing that could be a truth because we’re not talking about ourselves. Confusing truths with beliefs — even carefully-acquired accurate beliefs — is mind projection.
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
I can’t speak for everyone who’d call themselves Bayesians, but I would say: There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect. A statement is true to the extent that it mirrors some aspect of reality (or some other structure if specified).
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth.
…
There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect.
If we’re going to distinguish ‘truth’ from our ‘observations’ then we need to be able to define ‘reality’ as something other than ‘experience generator’ (or else decouple truth and reality).
Personally, I suspect that we really need to think of reality as something other than an experience generator. What we can extract out of reality is only half of the story. The other half is the stuff we put in so as to create reality.
This is not a fully worked out philosophical position, but I do have some slogans:
You can’t do QM with only kets and no bras.
You can’t do Gentzen natural deduction with rules of elimination, but no rules of introduction.
You can’t write a program with GOTOs, but no COMEFROMs.
(That last slogan probably needs some work. Maybe I’ll try something involving causes and effects.)
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
“More Wrong”. :)
I can think of two circumstances under which two theories would make the same predictions (that is, where they’d systematically make the same predictions, under all possible circumstances under which they could be called upon to do so):
They are mathematically isomorphic — in this case I would say they are the same theory.
They contain isomorphic substructures that are responsible for the identical predictions. In this case, the part outside what’s needed to actually generate the predictions counts as extra detail, and by the conjunction rule, this reduces the probability of the “outer” hypothesis.
The latter is where collapse vs. MWI falls, and where “we don’t know why the fundamental laws of physics are what they are” vs. “God designed the fundamental laws of physics, and we don’t know why there’s a God” falls, etc.
Well the second of those things already has very serious problems. See for example Quine’s Confirmation Holism. We’ve know for a long time that our theories are under-determined by our observations and that we need some other way of adjudicating empirically equivalent theories. This was our basis for preferring Special Relativity over Lorentz Ether Theory. Parsimony seems like one important criteria but involves two questions:
One man’s simple seems like another man’s complex. How do you rigorously identify the more parsimonious between two hypotheses. Lots of people thing God is a very simple hypothesis. The most seemingly productive approach that I know of is the algorithmic complexity one that is popular here.
Is parsimony important because parsimonious theories are more likely be ‘real’ or is the issue really one of developing clear and helpful prediction generating devices?
The way the algorithmic probability stuff has been leveraged is by building candidates for universal priors. But this doesn’t seem like the right way to do it. Beliefs are about anticipating future experience so they should take the form of ’Sensory experience x will occur at time t” (or something reducible to this). Theories aren’t like this. Theories are frameworks that let us take some sensory experience and generate beliefs about our future sensory experiences.
So I’m not sure it makes sense to have beliefs distinguishing empirically identical theories. That seems like a kind of category error- a map-territory confusion. The question is, what do we do with this algorithmic complexity stuff that was so promising. I think we still have good reasons to be thinking cleanly about complicated science- the QM interpretation debate isn’t totally irrelevant. But it isn’t obvious algorithmic simplicity is what we want out of our theories (nor is it clear that what we want is the same thing other agents might want out of their theories). (ETA: Though of course K-complexity might still be helpful in making predictions between two possible futures that are empirically distinct. For example, we can assign a low probability to finding evidence of a moon landing conspiracy since the theory that would predict discovering such evidence is unparsimonious. But if that is the case, if theories can be ruled improbable on the basis of the structure of the theory alone why can we only do this with empirically distinct theories? Shouldn’t all theories be understandable in this way?)
Thanks, your comment is a very clear formulation of the reason why I wrote the post. Probably even better than the post itself.
I’m halfway tempted to write yet another post about complexity (maybe in the discussion area), summarizing all the different positions expressed here in the comments and bringing out the key questions. The last 24 hours have been a very educational experience for me. Or maybe let someone else do it, because I don’t want to spam LW.
“Bayes’s rule only deals with the fraction of reality-space spanned by a sentence”
Well, that’s the thing: reality-space doesn’t concern just our observations of the universe. If two different theories make the same predictions about our observations but disagree about which mechanism produces those events we observe, those are two different slices of reality-space.
Which brings us back to an issue which I was debating here a couple of weeks ago: Is there a difference between an event being impossible, and an event being of measure zero?
Orthodox Bayesianism says there is no difference and strongly advises against thinking either to be the case. I’m wondering whether there isn’t some way to make the idea work that there is a distinction to be made—that some things are completely impossible given a theory, while other things are merely of infinitesimal probability.
It might be more accurate to say that surreal numbers are a subset of the numbers that were invented by Conway to describe the value of game positions.
Congratulations, you have now officially broken with Bayesianism and become a heretic. Your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think. Surely you can imagine all manner of ugly scenarios if that were the case.
Preferring to use a simpler theory doesn’t require believing it to be more probable than it is. Expected utility maximization to the rescue.
When there is a testable physical difference between hypotheses, we want the one that makes the correct prediction.
When there is no testable physical difference between hypotheses, we want to use the one that makes it easiest to make the correct prediction. By definition, we can never get a prediction that wouldn’t have happened were we using the other hypothesis, but we’ll get that prediction quicker. Neither hypothesis can be said to be ‘the way the world really is’ because there’s no way to distinguish between them, but the simpler hypothesis is more useful.
Wha? Then you must order the equivalent theories by running time, not code length. The two are frequently opposed: for example, the fastest known algorithm for matrix multiplication (in the big-O sense) is very complex compared to the naive one. In short, I feel you’re only digging yourself deeper into the hole of heresy.
Well, firstly, who said I cared at all about ‘heresy’? I’m not replying here in order to demonstrate my adherence to the First Church Of Bayes or something...
And while there are, obviously, occasions where ordering by running time and code length are opposed, in general when comparing two arbitrary programs which generate the same output, the longer one will also take longer. This is obvious when you consider it—if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly. However, there are an infinite number of programs in the sample space that will generate the output more slowly, and that are also longer than X—just keep adding an extra ‘sleep 1’ before it prints the output, to take a trivial example.
In general, the longer the program, the more operations it performs and the longer it takes, when you’re sampling from the space of all possible programs. So while run time and code length aren’t perfectly correlated, they’re a very decent proxy for each other.
Amusingly, this statement is false. If a program Z is faster than X, then there exist infinitely many versions of Z that also run faster than X: just add some never-executed code under an if(false) branch. I’m not sure whether your overall argument can be salvaged.
You’re quite correct, there. I was only including code paths that can ever actually be executed, in the same way I wouldn’t count comments as part of the program. This seems to me to be the correct thing to do, and I believe one could come up with some more rigorous reasoning along the lines of my previous comment, but I’m too tired right now to do so. I’ll think about this...
Wouldn’t a meta-algorithm that determines which paths are executable in a given algorithm necessarily not be able to do so for every possible algorithm unless it was functionally equivalent to a halting oracle?
I’m not sure how problematic this is to your idea, but it’s one advantage that the simpler system of just counting total lines has.
I think there’s a difference between looking at a theory as data versus looking at it as code.
You look at a theory as code when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions are equivalent, you don’t care about their size. In fact, even theories with different predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You do care about how fast you can run them, though.
However, you look at a theory as data when you need to reason about theories, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make exactly the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something you haven’t tested yet, but will test in the future, and you need to take an action now that has different outcomes depending on the result of the future test (simple example: a bet), then you need to try to guess which is more likely.
You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.
Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add one detail; add another detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)
The length of the program description is not really the measure of how easy it is to make a correct prediction. In fact, the shortest program for predicting is almost never the one you should use to make predictions in practice, precisely because it is normally quite slow. It is also very rarely the program which is easiest to manipulate mentally, since short programs tend to be very hard for humans to reason about.
Like PaulFChistiano said, the shortest accurate program isn’t particularly useful, but its predictive model is more a priori probable according to the universal / Occamian prior.
It’s really hard (and uncomputable) to discover, understand, and verify the shortest program that computes a certain input->prediction mapping. But we use the “shortest equivalent program” concept to judge which human-understandable program is more a priori probable.
Yes, we want to use the hypothesis that is easiest to use. But if we use it, does that commit us to ‘believing’ in it? In the case of no testable physical difference between hypotheses, I propose that someone has no obligation to believe (or admit they believe) that particular theory instead of another one with the same predictions.
I enthusiastically propose that we say we ‘have’ a belief only when we use or apply a belief for which there is an empirical difference in the predictions of the belief compared to the non-belief. Alternatively, we can use some other word instead of belief, that will serve to carry this more relevant distinction.
(Later: I realize this comment is actually directed at cousin_it, since he was the one that wrote, ‘your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think’. I also think I may have reiterated what Vladimir_Nesov wrote here.)
Never mind usefulness, it seems to me that “Evolution by natural selection occurs” and “God made the world and everything in it, but did so in such a way as to make it look exactly as if evolution by natural selection occured” are not the same hypothesis, that one of them is true and one of them is false, that it is simplicity that leads us to say which is which, and that we do, indeed, prefer the simpler of two theories that make the same predictions, rather than calling them the same theory.
While my post was pretty misguided (I even wrote an apology for it), your comment looks even more misguided to me. In effect, you’re saying that between Lagrangian and Hamiltonian mechanics, at most one can be “true”. And you’re also saying that which of them is “true” depends on the programming language we use to encode them. Are you sure you want to go there?
We may even be able to observe which one. Actually, I am pretty sure that if I looked closely at QM and these two formulations, I would go with Hamiltonian mechanics.
Ah, but which Hamiltonian mechanics is the true one: the one that says real numbers are infinite binary expansions, or the one that says real numbers are Dedekind cuts? I dunno, your way of thinking makes me queasy.
Sorry—I wrote an incorrect reply and deleted it. Let me think some more.
That point of view has far-reaching implications that make me uncomfortable. Consider two physical theories that are equivalent in every respect, except they use different definitions of real numbers. So they have a common part C, and theory A is the conjunction of C with “real numbers are Dedekind cuts”, while theory B is the conjunction of C with “real numbers are infinite binary expansions”. According to your and Eliezer’s point of view as I understand it right now, at most one of the two theories can be “true”. So if C (the common part) is “true”, then ordinary logic tells us that at most one definition of the real numbers can be “true”. Are you really, really sure you want to go there?
I think there’s a distinction that should be made explicit between “a theory” and “our human mental model of a theory.” The theory is the same, but we rightfully try to interpret it in the simplest possible way, to make it clearer to think about.
Usually, two different mental models necessarily imply two different theories, so it’s easy to conflate the two, but sometimes (in mathematics especially) that’s just not true.
Hmmm. But the very first posting in the sequences says something about “making your beliefs pay rent in expected experience”. If you don’t expect different experiences in choosing between the theories, it seems that you are making an unfalsifiable claim.
I’m not totally convinced that the two theories do not make different predictions in some sense. The evolution theory pretty much predicts that we are not going to see a Rapture any time soon, whereas the God theory leaves the question open. Not exactly “different predictions”, but something close.
Both theories are trying to pay rent on the same house; that’s the problem here, which is quite distinct from neither theory paying rent at all.
Clever. But …
If theories A and B pay rent on the same house, then the theory (A OR B) pays enough rent so that the stronger theory A need pay no additional rent at all. Yet you seem to prefer A to B, and also to (A OR B).
(A OR B) is more probable than A, but if A is much more probable than B, then saying “(A OR B)” instead of “A” is leaving out information.
Let’s say A = (MWI is correct) and B = (Copenhagen)
The equivalent of “A OR B” is the statement “either Copenhagen or MWI is correct”, and I’m sure everyone here assigns “A OR B” a higher prior than either A or B separately.
But that’s not really a theory, it’s a disjunction between two different theories, so ofcourse we want to understand which of the two is actually the correct one. Not sure what your objection is here.
EDITED to correct a wrong term.
I’m not sure I have one. It is just a little puzzling how we might reconcile two things:
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
ETA: Bayes’s rule only deals with the fraction of reality-space spanned by a sentence, never with the number of characters needed to express the sentence.
There’s a useful heuristic to solve tricky questions about “truths” and “beliefs”: reduce them to questions about decisions and utilities. For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes trivial once you introduce any payoff structure. Maybe we could apply this heuristic here? Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Actually, it might help a reasoner saddled with bounded rationality. One theory might require less computation to get from theory to prediction, or it might require less memory resources to store. Having a fast, easy-to-use theory can be like money in the bank to someone who needs lots and lots of predictions.
It might be interesting to look at that idea someone here was talking about that merged ideas from Zadeh’s fuzzy logic with Bayesianism. Instead of simple Bayesian probabilities which can be updated instantaneously, we may need to think of fuzzy probabilities which grow sharper as we devote cognitive resources to refining them. But with a good, simple theory we can get a sharper picture quicker.
I don’t understand your point about bounded rationality. If you know theory X is equivalent to theory Y, you can believe in X more, but use Y for calculations.
Thats the definition of a free-floating belief isn’t it? If you only have so much computational resources even storing theory X in your memory is a waste of space.
I think cousin_it’s point was that if you have a preference for both quickly solving problems and knowing the true nature of things, then if theory X tells you the true nature of things but theory Y is a hackjob approximation that nevertheless gives you the answer you need much faster (in computer terms, say, a simulation of the actual event vs a monte-carlo run with the probabilities just plugged in) then it might be positive utility even under bounded rationality to keep both theory X and theory Y.
edit: the assumption is that we have at least mild preferences for both and the bounds on our rationality are sufficiently high that this is the preferred option for most of science).
It’s one thing if you want to calculate a theory that is simpler because you don’t have a need for perfect accuracy. Newton is good enough for a large fraction of physics calculations and so even though it is strictly wrong I imagine most reasoners would have need to keep it handy because it is simpler. But if you have two empirically equivalent and complete theories X and Y, and X is computationally simpler so you rely on X for calculating predictions, it seems to me you believe x. What would saying “No, actually I believe in Y not X” even mean in this context? The statement is unconnected to anticipated experience and any conceivable payoff structure.
Better yet, taboo “belief”. Say you are an agent with a program that allows you to calculate, based on your observations, what your observations will be in the future contingent on various actions. You have another program that ranks those futures according to a utility function. What would it mean to add “belief” to this picture?
Your first paragraph looks misguided to me: does it imply we should “believe” matrix multiplication is defined by the naive algorithm for small n, and the Strassen and Coppersmith-Winograd algorithms for larger values of n? Your second paragraph, on the other hand, makes exactly the point I was trying to make in the original post: we can assign degrees of belief to equivalence classes of theories that give the same observable predictions.
Heh, I was just working on a post on that point.
I agree that that is true about equivalent formulations, literally isomorphic theories (as in this comment), but is that really the case about MWI vs. Copenhagen? Collapse is claimed as something that’s actually happening out there in reality, not just as another way of looking at the same thing. Doesn’t it have to be evaluated as a hypothesis on its own, such that the conjunction (MWI & Collapse) is necessarily less probable than just MWI?
Except the whole quantum suicide thing does create payoff structures. In determining weather or not to play a game of Quantum Russian Roulette you take your estimated winnings for playing if MWI and Quantum immortality is true and your estimated winnings if MWI or Quantum immortality is false and weigh them according to the probability you assign each theory.
(ETA: But this seems to be a quirky feature of QM interpretation, not a feature of empirically equivalent theories generally.)
(ETA 2: And it is a quirky feature of QM interpretation because MWI+Quantum Immortality is empirically equivalent to single world theories is a really quirky way.)
IMO quantum suicide/immortality is so mysterious that it can’t support any definite conclusions about the topic we’re discussing. I’m beginning to view it as a sort of thread-killer, like “consciousness”. See a comment that mentions QI, collapse the whole thread because you know it’s not gonna make you happier.
I agree that neither we nor anyone else do a good job discussing it. It seems like a pretty important issue though.
Since when is that the Bayesian tradition? Citation needed.
Well, I guess I am taking “observable consequences” to be something closely related to P(E|H)/P(E). And I am taking “the truth of a statement” to have something to do with P(H|E) adjusted for any bias that might have been present in the prior P(H).
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth. Things outside our past light cone exist despite being unobservable. Asking about a claim about some unobservable “Then how can we know whether it’s true?” is irrelevant to evaluating whether it is the sort of thing that could be a truth because we’re not talking about ourselves. Confusing truths with beliefs — even carefully-acquired accurate beliefs — is mind projection.
I can’t speak for everyone who’d call themselves Bayesians, but I would say: There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect. A statement is true to the extent that it mirrors some aspect of reality (or some other structure if specified).
…
If we’re going to distinguish ‘truth’ from our ‘observations’ then we need to be able to define ‘reality’ as something other than ‘experience generator’ (or else decouple truth and reality).
Personally, I suspect that we really need to think of reality as something other than an experience generator. What we can extract out of reality is only half of the story. The other half is the stuff we put in so as to create reality.
This is not a fully worked out philosophical position, but I do have some slogans:
You can’t do QM with only kets and no bras.
You can’t do Gentzen natural deduction with rules of elimination, but no rules of introduction.
You can’t write a program with GOTOs, but no COMEFROMs.
(That last slogan probably needs some work. Maybe I’ll try something involving causes and effects.)
How do you adjudicate a wager without observable consequences?
“More Wrong”. :)
I can think of two circumstances under which two theories would make the same predictions (that is, where they’d systematically make the same predictions, under all possible circumstances under which they could be called upon to do so):
They are mathematically isomorphic — in this case I would say they are the same theory.
They contain isomorphic substructures that are responsible for the identical predictions. In this case, the part outside what’s needed to actually generate the predictions counts as extra detail, and by the conjunction rule, this reduces the probability of the “outer” hypothesis.
The latter is where collapse vs. MWI falls, and where “we don’t know why the fundamental laws of physics are what they are” vs. “God designed the fundamental laws of physics, and we don’t know why there’s a God” falls, etc.
Well the second of those things already has very serious problems. See for example Quine’s Confirmation Holism. We’ve know for a long time that our theories are under-determined by our observations and that we need some other way of adjudicating empirically equivalent theories. This was our basis for preferring Special Relativity over Lorentz Ether Theory. Parsimony seems like one important criteria but involves two questions:
One man’s simple seems like another man’s complex. How do you rigorously identify the more parsimonious between two hypotheses. Lots of people thing God is a very simple hypothesis. The most seemingly productive approach that I know of is the algorithmic complexity one that is popular here.
Is parsimony important because parsimonious theories are more likely be ‘real’ or is the issue really one of developing clear and helpful prediction generating devices?
The way the algorithmic probability stuff has been leveraged is by building candidates for universal priors. But this doesn’t seem like the right way to do it. Beliefs are about anticipating future experience so they should take the form of ’Sensory experience x will occur at time t” (or something reducible to this). Theories aren’t like this. Theories are frameworks that let us take some sensory experience and generate beliefs about our future sensory experiences.
So I’m not sure it makes sense to have beliefs distinguishing empirically identical theories. That seems like a kind of category error- a map-territory confusion. The question is, what do we do with this algorithmic complexity stuff that was so promising. I think we still have good reasons to be thinking cleanly about complicated science- the QM interpretation debate isn’t totally irrelevant. But it isn’t obvious algorithmic simplicity is what we want out of our theories (nor is it clear that what we want is the same thing other agents might want out of their theories). (ETA: Though of course K-complexity might still be helpful in making predictions between two possible futures that are empirically distinct. For example, we can assign a low probability to finding evidence of a moon landing conspiracy since the theory that would predict discovering such evidence is unparsimonious. But if that is the case, if theories can be ruled improbable on the basis of the structure of the theory alone why can we only do this with empirically distinct theories? Shouldn’t all theories be understandable in this way?)
Thanks, your comment is a very clear formulation of the reason why I wrote the post. Probably even better than the post itself.
I’m halfway tempted to write yet another post about complexity (maybe in the discussion area), summarizing all the different positions expressed here in the comments and bringing out the key questions. The last 24 hours have been a very educational experience for me. Or maybe let someone else do it, because I don’t want to spam LW.
“Bayes’s rule only deals with the fraction of reality-space spanned by a sentence”
Well, that’s the thing: reality-space doesn’t concern just our observations of the universe. If two different theories make the same predictions about our observations but disagree about which mechanism produces those events we observe, those are two different slices of reality-space.
It’s actually the disjunction.
Yes, apologies. Fixed above.
Making the same predictions means making the same assignments of probabilities to outcomes.
Which brings us back to an issue which I was debating here a couple of weeks ago: Is there a difference between an event being impossible, and an event being of measure zero?
Orthodox Bayesianism says there is no difference and strongly advises against thinking either to be the case. I’m wondering whether there isn’t some way to make the idea work that there is a distinction to be made—that some things are completely impossible given a theory, while other things are merely of infinitesimal probability.
There’s a proposal to use surreal numbers for utilities. Such an approach was used for go by Conway.
It might be more accurate to say that surreal numbers are a subset of the numbers that were invented by Conway to describe the value of game positions.
Interesting suggestion. I ought to look into that. Thx.