Hmmm. But the very first posting in the sequences says something about “making your beliefs pay rent in expected experience”. If you don’t expect different experiences in choosing between the theories, it seems that you are making an unfalsifiable claim.
I’m not totally convinced that the two theories do not make different predictions in some sense. The evolution theory pretty much predicts that we are not going to see a Rapture any time soon, whereas the God theory leaves the question open. Not exactly “different predictions”, but something close.
If theories A and B pay rent on the same house, then the theory (A OR B) pays enough rent so that the stronger theory A need pay no additional rent at all. Yet you seem to prefer A to B, and also to (A OR B).
Let’s say A = (MWI is correct) and B = (Copenhagen)
The equivalent of “A OR B” is the statement “either Copenhagen or MWI is correct”, and I’m sure everyone here assigns “A OR B” a higher prior than either A or B separately.
But that’s not really a theory, it’s a disjunction between two different theories, so ofcourse we want to understand which of the two is actually the correct one. Not sure what your objection is here.
I’m not sure I have one. It is just a little puzzling how we might reconcile two things:
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
ETA: Bayes’s rule only deals with the fraction of reality-space spanned by a sentence, never with the number of characters needed to express the sentence.
There’s a useful heuristic to solve tricky questions about “truths” and “beliefs”: reduce them to questions about decisions and utilities. For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes trivial once you introduce any payoff structure. Maybe we could apply this heuristic here? Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Actually, it might help a reasoner saddled with bounded rationality. One theory might require less computation to get from theory to prediction, or it might require less memory resources to store. Having a fast, easy-to-use theory can be like money in the bank to someone who needs lots and lots of predictions.
It might be interesting to look at that idea someone here was talking about that merged ideas from Zadeh’s fuzzy logic with Bayesianism. Instead of simple Bayesian probabilities which can be updated instantaneously, we may need to think of fuzzy probabilities which grow sharper as we devote cognitive resources to refining them. But with a good, simple theory we can get a sharper picture quicker.
I don’t understand your point about bounded rationality. If you know theory X is equivalent to theory Y, you can believe in X more, but use Y for calculations.
Thats the definition of a free-floating belief isn’t it? If you only have so much computational resources even storing theory X in your memory is a waste of space.
I think cousin_it’s point was that if you have a preference for both quickly solving problems and knowing the true nature of things, then if theory X tells you the true nature of things but theory Y is a hackjob approximation that nevertheless gives you the answer you need much faster (in computer terms, say, a simulation of the actual event vs a monte-carlo run with the probabilities just plugged in) then it might be positive utility even under bounded rationality to keep both theory X and theory Y.
edit: the assumption is that we have at least mild preferences for both and the bounds on our rationality are sufficiently high that this is the preferred option for most of science).
It’s one thing if you want to calculate a theory that is simpler because you don’t have a need for perfect accuracy. Newton is good enough for a large fraction of physics calculations and so even though it is strictly wrong I imagine most reasoners would have need to keep it handy because it is simpler. But if you have two empirically equivalent and complete theories X and Y, and X is computationally simpler so you rely on X for calculating predictions, it seems to me you believe x. What would saying “No, actually I believe in Y not X” even mean in this context? The statement is unconnected to anticipated experience and any conceivable payoff structure.
Better yet, taboo “belief”. Say you are an agent with a program that allows you to calculate, based on your observations, what your observations will be in the future contingent on various actions. You have another program that ranks those futures according to a utility function. What would it mean to add “belief” to this picture?
Your first paragraph looks misguided to me: does it imply we should “believe” matrix multiplication is defined by the naive algorithm for small n, and the Strassen and Coppersmith-Winograd algorithms for larger values of n? Your second paragraph, on the other hand, makes exactly the point I was trying to make in the original post: we can assign degrees of belief to equivalence classes of theories that give the same observable predictions.
For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes completely clear once you introduce a payoff structure.
Heh, I was just working on a post on that point.
Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in. Therefore the reasoner should assign degrees of belief to equivalence classes of theories rather than individual theories.
I agree that that is true about equivalent formulations, literally isomorphic theories (as in this comment), but is that really the case about MWI vs. Copenhagen? Collapse is claimed as something that’s actually happening out there in reality, not just as another way of looking at the same thing. Doesn’t it have to be evaluated as a hypothesis on its own, such that the conjunction (MWI & Collapse) is necessarily less probable than just MWI?
Except the whole quantum suicide thing does create payoff structures. In determining weather or not to play a game of Quantum Russian Roulette you take your estimated winnings for playing if MWI and Quantum immortality is true and your estimated winnings if MWI or Quantum immortality is false and weigh them according to the probability you assign each theory.
(ETA: But this seems to be a quirky feature of QM interpretation, not a feature of empirically equivalent theories generally.)
(ETA 2: And it is a quirky feature of QM interpretation because MWI+Quantum Immortality is empirically equivalent to single world theories is a really quirky way.)
IMO quantum suicide/immortality is so mysterious that it can’t support any definite conclusions about the topic we’re discussing. I’m beginning to view it as a sort of thread-killer, like “consciousness”. See a comment that mentions QI, collapse the whole thread because you know it’s not gonna make you happier.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
Since when is that the Bayesian tradition? Citation needed.
the truth of a statement is to be found through its observable consequences.
Since when?
Well, I guess I am taking “observable consequences” to be something closely related to P(E|H)/P(E). And I am taking “the truth of a statement” to have something to do with P(H|E) adjusted for any bias that might have been present in the prior P(H).
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth. Things outside our past light cone exist despite being unobservable. Asking about a claim about some unobservable “Then how can we know whether it’s true?” is irrelevant to evaluating whether it is the sort of thing that could be a truth because we’re not talking about ourselves. Confusing truths with beliefs — even carefully-acquired accurate beliefs — is mind projection.
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
I can’t speak for everyone who’d call themselves Bayesians, but I would say: There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect. A statement is true to the extent that it mirrors some aspect of reality (or some other structure if specified).
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth.
…
There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect.
If we’re going to distinguish ‘truth’ from our ‘observations’ then we need to be able to define ‘reality’ as something other than ‘experience generator’ (or else decouple truth and reality).
Personally, I suspect that we really need to think of reality as something other than an experience generator. What we can extract out of reality is only half of the story. The other half is the stuff we put in so as to create reality.
This is not a fully worked out philosophical position, but I do have some slogans:
You can’t do QM with only kets and no bras.
You can’t do Gentzen natural deduction with rules of elimination, but no rules of introduction.
You can’t write a program with GOTOs, but no COMEFROMs.
(That last slogan probably needs some work. Maybe I’ll try something involving causes and effects.)
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
“More Wrong”. :)
I can think of two circumstances under which two theories would make the same predictions (that is, where they’d systematically make the same predictions, under all possible circumstances under which they could be called upon to do so):
They are mathematically isomorphic — in this case I would say they are the same theory.
They contain isomorphic substructures that are responsible for the identical predictions. In this case, the part outside what’s needed to actually generate the predictions counts as extra detail, and by the conjunction rule, this reduces the probability of the “outer” hypothesis.
The latter is where collapse vs. MWI falls, and where “we don’t know why the fundamental laws of physics are what they are” vs. “God designed the fundamental laws of physics, and we don’t know why there’s a God” falls, etc.
Well the second of those things already has very serious problems. See for example Quine’s Confirmation Holism. We’ve know for a long time that our theories are under-determined by our observations and that we need some other way of adjudicating empirically equivalent theories. This was our basis for preferring Special Relativity over Lorentz Ether Theory. Parsimony seems like one important criteria but involves two questions:
One man’s simple seems like another man’s complex. How do you rigorously identify the more parsimonious between two hypotheses. Lots of people thing God is a very simple hypothesis. The most seemingly productive approach that I know of is the algorithmic complexity one that is popular here.
Is parsimony important because parsimonious theories are more likely be ‘real’ or is the issue really one of developing clear and helpful prediction generating devices?
The way the algorithmic probability stuff has been leveraged is by building candidates for universal priors. But this doesn’t seem like the right way to do it. Beliefs are about anticipating future experience so they should take the form of ’Sensory experience x will occur at time t” (or something reducible to this). Theories aren’t like this. Theories are frameworks that let us take some sensory experience and generate beliefs about our future sensory experiences.
So I’m not sure it makes sense to have beliefs distinguishing empirically identical theories. That seems like a kind of category error- a map-territory confusion. The question is, what do we do with this algorithmic complexity stuff that was so promising. I think we still have good reasons to be thinking cleanly about complicated science- the QM interpretation debate isn’t totally irrelevant. But it isn’t obvious algorithmic simplicity is what we want out of our theories (nor is it clear that what we want is the same thing other agents might want out of their theories). (ETA: Though of course K-complexity might still be helpful in making predictions between two possible futures that are empirically distinct. For example, we can assign a low probability to finding evidence of a moon landing conspiracy since the theory that would predict discovering such evidence is unparsimonious. But if that is the case, if theories can be ruled improbable on the basis of the structure of the theory alone why can we only do this with empirically distinct theories? Shouldn’t all theories be understandable in this way?)
Thanks, your comment is a very clear formulation of the reason why I wrote the post. Probably even better than the post itself.
I’m halfway tempted to write yet another post about complexity (maybe in the discussion area), summarizing all the different positions expressed here in the comments and bringing out the key questions. The last 24 hours have been a very educational experience for me. Or maybe let someone else do it, because I don’t want to spam LW.
“Bayes’s rule only deals with the fraction of reality-space spanned by a sentence”
Well, that’s the thing: reality-space doesn’t concern just our observations of the universe. If two different theories make the same predictions about our observations but disagree about which mechanism produces those events we observe, those are two different slices of reality-space.
Which brings us back to an issue which I was debating here a couple of weeks ago: Is there a difference between an event being impossible, and an event being of measure zero?
Orthodox Bayesianism says there is no difference and strongly advises against thinking either to be the case. I’m wondering whether there isn’t some way to make the idea work that there is a distinction to be made—that some things are completely impossible given a theory, while other things are merely of infinitesimal probability.
It might be more accurate to say that surreal numbers are a subset of the numbers that were invented by Conway to describe the value of game positions.
Hmmm. But the very first posting in the sequences says something about “making your beliefs pay rent in expected experience”. If you don’t expect different experiences in choosing between the theories, it seems that you are making an unfalsifiable claim.
I’m not totally convinced that the two theories do not make different predictions in some sense. The evolution theory pretty much predicts that we are not going to see a Rapture any time soon, whereas the God theory leaves the question open. Not exactly “different predictions”, but something close.
Both theories are trying to pay rent on the same house; that’s the problem here, which is quite distinct from neither theory paying rent at all.
Clever. But …
If theories A and B pay rent on the same house, then the theory (A OR B) pays enough rent so that the stronger theory A need pay no additional rent at all. Yet you seem to prefer A to B, and also to (A OR B).
(A OR B) is more probable than A, but if A is much more probable than B, then saying “(A OR B)” instead of “A” is leaving out information.
Let’s say A = (MWI is correct) and B = (Copenhagen)
The equivalent of “A OR B” is the statement “either Copenhagen or MWI is correct”, and I’m sure everyone here assigns “A OR B” a higher prior than either A or B separately.
But that’s not really a theory, it’s a disjunction between two different theories, so ofcourse we want to understand which of the two is actually the correct one. Not sure what your objection is here.
EDITED to correct a wrong term.
I’m not sure I have one. It is just a little puzzling how we might reconcile two things:
EY’s very attractive intuition that of two theories making the same predictions, one is true and the other … what? False? Wrong? Well, … “not quite so true”.
The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.
ETA: Bayes’s rule only deals with the fraction of reality-space spanned by a sentence, never with the number of characters needed to express the sentence.
There’s a useful heuristic to solve tricky questions about “truths” and “beliefs”: reduce them to questions about decisions and utilities. For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes trivial once you introduce any payoff structure. Maybe we could apply this heuristic here? Believing in one formulation of a theory over a different equivalent formulation isn’t likely to win a Bayesian reasoner many dollars, no matter what observations come in.
Actually, it might help a reasoner saddled with bounded rationality. One theory might require less computation to get from theory to prediction, or it might require less memory resources to store. Having a fast, easy-to-use theory can be like money in the bank to someone who needs lots and lots of predictions.
It might be interesting to look at that idea someone here was talking about that merged ideas from Zadeh’s fuzzy logic with Bayesianism. Instead of simple Bayesian probabilities which can be updated instantaneously, we may need to think of fuzzy probabilities which grow sharper as we devote cognitive resources to refining them. But with a good, simple theory we can get a sharper picture quicker.
I don’t understand your point about bounded rationality. If you know theory X is equivalent to theory Y, you can believe in X more, but use Y for calculations.
Thats the definition of a free-floating belief isn’t it? If you only have so much computational resources even storing theory X in your memory is a waste of space.
I think cousin_it’s point was that if you have a preference for both quickly solving problems and knowing the true nature of things, then if theory X tells you the true nature of things but theory Y is a hackjob approximation that nevertheless gives you the answer you need much faster (in computer terms, say, a simulation of the actual event vs a monte-carlo run with the probabilities just plugged in) then it might be positive utility even under bounded rationality to keep both theory X and theory Y.
edit: the assumption is that we have at least mild preferences for both and the bounds on our rationality are sufficiently high that this is the preferred option for most of science).
It’s one thing if you want to calculate a theory that is simpler because you don’t have a need for perfect accuracy. Newton is good enough for a large fraction of physics calculations and so even though it is strictly wrong I imagine most reasoners would have need to keep it handy because it is simpler. But if you have two empirically equivalent and complete theories X and Y, and X is computationally simpler so you rely on X for calculating predictions, it seems to me you believe x. What would saying “No, actually I believe in Y not X” even mean in this context? The statement is unconnected to anticipated experience and any conceivable payoff structure.
Better yet, taboo “belief”. Say you are an agent with a program that allows you to calculate, based on your observations, what your observations will be in the future contingent on various actions. You have another program that ranks those futures according to a utility function. What would it mean to add “belief” to this picture?
Your first paragraph looks misguided to me: does it imply we should “believe” matrix multiplication is defined by the naive algorithm for small n, and the Strassen and Coppersmith-Winograd algorithms for larger values of n? Your second paragraph, on the other hand, makes exactly the point I was trying to make in the original post: we can assign degrees of belief to equivalence classes of theories that give the same observable predictions.
Heh, I was just working on a post on that point.
I agree that that is true about equivalent formulations, literally isomorphic theories (as in this comment), but is that really the case about MWI vs. Copenhagen? Collapse is claimed as something that’s actually happening out there in reality, not just as another way of looking at the same thing. Doesn’t it have to be evaluated as a hypothesis on its own, such that the conjunction (MWI & Collapse) is necessarily less probable than just MWI?
Except the whole quantum suicide thing does create payoff structures. In determining weather or not to play a game of Quantum Russian Roulette you take your estimated winnings for playing if MWI and Quantum immortality is true and your estimated winnings if MWI or Quantum immortality is false and weigh them according to the probability you assign each theory.
(ETA: But this seems to be a quirky feature of QM interpretation, not a feature of empirically equivalent theories generally.)
(ETA 2: And it is a quirky feature of QM interpretation because MWI+Quantum Immortality is empirically equivalent to single world theories is a really quirky way.)
IMO quantum suicide/immortality is so mysterious that it can’t support any definite conclusions about the topic we’re discussing. I’m beginning to view it as a sort of thread-killer, like “consciousness”. See a comment that mentions QI, collapse the whole thread because you know it’s not gonna make you happier.
I agree that neither we nor anyone else do a good job discussing it. It seems like a pretty important issue though.
Since when is that the Bayesian tradition? Citation needed.
Well, I guess I am taking “observable consequences” to be something closely related to P(E|H)/P(E). And I am taking “the truth of a statement” to have something to do with P(H|E) adjusted for any bias that might have been present in the prior P(H).
I’m afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of “That ain’t ‘truth’. ‘Truth’ is to a Bayesian”
Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth. Things outside our past light cone exist despite being unobservable. Asking about a claim about some unobservable “Then how can we know whether it’s true?” is irrelevant to evaluating whether it is the sort of thing that could be a truth because we’re not talking about ourselves. Confusing truths with beliefs — even carefully-acquired accurate beliefs — is mind projection.
I can’t speak for everyone who’d call themselves Bayesians, but I would say: There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect. A statement is true to the extent that it mirrors some aspect of reality (or some other structure if specified).
…
If we’re going to distinguish ‘truth’ from our ‘observations’ then we need to be able to define ‘reality’ as something other than ‘experience generator’ (or else decouple truth and reality).
Personally, I suspect that we really need to think of reality as something other than an experience generator. What we can extract out of reality is only half of the story. The other half is the stuff we put in so as to create reality.
This is not a fully worked out philosophical position, but I do have some slogans:
You can’t do QM with only kets and no bras.
You can’t do Gentzen natural deduction with rules of elimination, but no rules of introduction.
You can’t write a program with GOTOs, but no COMEFROMs.
(That last slogan probably needs some work. Maybe I’ll try something involving causes and effects.)
How do you adjudicate a wager without observable consequences?
“More Wrong”. :)
I can think of two circumstances under which two theories would make the same predictions (that is, where they’d systematically make the same predictions, under all possible circumstances under which they could be called upon to do so):
They are mathematically isomorphic — in this case I would say they are the same theory.
They contain isomorphic substructures that are responsible for the identical predictions. In this case, the part outside what’s needed to actually generate the predictions counts as extra detail, and by the conjunction rule, this reduces the probability of the “outer” hypothesis.
The latter is where collapse vs. MWI falls, and where “we don’t know why the fundamental laws of physics are what they are” vs. “God designed the fundamental laws of physics, and we don’t know why there’s a God” falls, etc.
Well the second of those things already has very serious problems. See for example Quine’s Confirmation Holism. We’ve know for a long time that our theories are under-determined by our observations and that we need some other way of adjudicating empirically equivalent theories. This was our basis for preferring Special Relativity over Lorentz Ether Theory. Parsimony seems like one important criteria but involves two questions:
One man’s simple seems like another man’s complex. How do you rigorously identify the more parsimonious between two hypotheses. Lots of people thing God is a very simple hypothesis. The most seemingly productive approach that I know of is the algorithmic complexity one that is popular here.
Is parsimony important because parsimonious theories are more likely be ‘real’ or is the issue really one of developing clear and helpful prediction generating devices?
The way the algorithmic probability stuff has been leveraged is by building candidates for universal priors. But this doesn’t seem like the right way to do it. Beliefs are about anticipating future experience so they should take the form of ’Sensory experience x will occur at time t” (or something reducible to this). Theories aren’t like this. Theories are frameworks that let us take some sensory experience and generate beliefs about our future sensory experiences.
So I’m not sure it makes sense to have beliefs distinguishing empirically identical theories. That seems like a kind of category error- a map-territory confusion. The question is, what do we do with this algorithmic complexity stuff that was so promising. I think we still have good reasons to be thinking cleanly about complicated science- the QM interpretation debate isn’t totally irrelevant. But it isn’t obvious algorithmic simplicity is what we want out of our theories (nor is it clear that what we want is the same thing other agents might want out of their theories). (ETA: Though of course K-complexity might still be helpful in making predictions between two possible futures that are empirically distinct. For example, we can assign a low probability to finding evidence of a moon landing conspiracy since the theory that would predict discovering such evidence is unparsimonious. But if that is the case, if theories can be ruled improbable on the basis of the structure of the theory alone why can we only do this with empirically distinct theories? Shouldn’t all theories be understandable in this way?)
Thanks, your comment is a very clear formulation of the reason why I wrote the post. Probably even better than the post itself.
I’m halfway tempted to write yet another post about complexity (maybe in the discussion area), summarizing all the different positions expressed here in the comments and bringing out the key questions. The last 24 hours have been a very educational experience for me. Or maybe let someone else do it, because I don’t want to spam LW.
“Bayes’s rule only deals with the fraction of reality-space spanned by a sentence”
Well, that’s the thing: reality-space doesn’t concern just our observations of the universe. If two different theories make the same predictions about our observations but disagree about which mechanism produces those events we observe, those are two different slices of reality-space.
It’s actually the disjunction.
Yes, apologies. Fixed above.
Making the same predictions means making the same assignments of probabilities to outcomes.
Which brings us back to an issue which I was debating here a couple of weeks ago: Is there a difference between an event being impossible, and an event being of measure zero?
Orthodox Bayesianism says there is no difference and strongly advises against thinking either to be the case. I’m wondering whether there isn’t some way to make the idea work that there is a distinction to be made—that some things are completely impossible given a theory, while other things are merely of infinitesimal probability.
There’s a proposal to use surreal numbers for utilities. Such an approach was used for go by Conway.
It might be more accurate to say that surreal numbers are a subset of the numbers that were invented by Conway to describe the value of game positions.
Interesting suggestion. I ought to look into that. Thx.