Hmm. Maybe. It depends on what you mean by “likelihood”, and by “selecting”.
Trivially, noticing a hypothesis and that it’s likely enough to justify being tested absolutely is making it subjectively more likely than it was before. I consider that tautological.
If someone is looking at n hypotheses and then decided to pick the kth one to test (maybe at random, or maybe because they all need to be tested at some point so why not start with the kth one), then I quite agree, that doesn’t change the likelihood of hypothesis #k.
But in my mind, it’s vividly clear that the process of plucking a likely hypothesis out of hypothesis space depends critically on moving probability mass around in said space. Any process that doesn’t do that is literally picking a hypothesis at random. (Frankly, I’m not sure a human mind even can do that.)
The core problem here is that most default human ways of moving probability mass around in hypothesis space (e.g. clever arguments) violate the laws of probability, whereas empirical tests aren’t nearly as prone to that.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
If you go about selecting a hypothesis by evaluating a space of hypotheses to see how they rate against your model of the world (whether you think they are true) and against each other (how much you stand to learn by testing them), you are essentially coming to reflective equilibrium regarding these hypothesis and your current beliefs. What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
So a clever argument might reveal an inconsistency in your priors, which in turn might make you want seek out new evidence. But the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
There very much is a difference.
Probability is a mathematical construct. Specifically, it’s a special kind of measurep on a measure space M such that p(M) = 1 and p obeys a set of axioms that we refer to as the axioms of probability (where an “event” from the Wikipedia page is to be taken as any measurable subset of M).
This is a bit like highlighting that Euclidean geometry is a mathematical construct based on following thus-and-such axioms for relating thus-and-such undefined terms. Of course, in normal ways of thinking we point at lines and dots and so on, pretend those are the things that the undefined terms refer to, and proceed to show pictures of what the axioms imply. Formally, mathematicians refer to this as building a model of an axiomatic system. (Another example of this is elliptic geometry, which is a type of non-Euclidean geometry, which you can model as doing geometry on a sphere.)
The Frequentist and Bayesian models of probability theory are relevantly different. They both think of M as the space of possible results (usually called the “sample space” but not always) and a measurable subset E ≤ M as an “event”. But they use different models of p:
Frequentists suggest that were you to look at how often all of the events in M occur, the one we’re looking at (i.e., E) would occur at a certain frequency, and that’s how we should interpret p(E). E.g., if M is the set of results from flipping a fair coin and E is “heads”, then it is a property of the setup that p(E) = 0.5. A different way of saying this is that Frequentists model p as describing a property of that which they are observing—i.e., that probability is a property of the world.
Bayesians, on the other hand, model p as describing their current state of confidence about the true state of the observed phenomenon. In other words, Bayesians model p as being a property of mental models, not of the world. So if M is again the results from flipping a fair coin and E is “heads”, then to a Bayesian the statement p(E) = 0.5 is equivalent to saying “I equally expect getting a heads to not getting a heads from this coin flip.” To a Bayesian, it doesn’t make sense to ask what the “true” probability is that their subjective probability is estimating; the very question violates the model of p by trying to sneak in a Frequentist presumption.
Now let’s suppose that M is a hypothesis space, including some sector for hypotheses that haven’t yet been considered. When we say that a given hypothesis H is “likely”, we’re working within a partial model, but we haven’t yet said what “likely” means. The formalism is easy: we require that H ≤ M is measurable, and the statement that “it’s likely” means that p(H) is larger than most other measurable subsets of M (and often we mean something stronger, like p(H) > 0.5). But we haven’t yet specified in our model what p(H) means. This is where the difference between Frequentism and Bayesianism matters. A Frequentist would say that the probability is a property of the hypothesis space, and noticing H doesn’t change that. (I’m honestly not sure how a Frequentist thinks about iterating over a hypothesis space to suggest that H in fact would occur at a frequency of p(H) in the limit—maybe by considering the frequency in counterfactual worlds?) A Bayesian, by contrast, will say that p(H) is their current confidence that H is the right hypothesis.
What I’m suggesting, in essence, is that figuring out which hypothesis H ≤ M is worth testing is equivalent to moving from p to p’ in the space of probability measures on M in a way that causes p’(H) > p(H). This is coming from using a Bayesian model of what p is.
Of course, if you’re using a Frequentist model of p, then “most likely hypothesis” actually refers to a property of the hypothesis space—though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model.
I’ll briefly note that although I find the Bayesian model more coherent with my sense of how the world works on a day-by-day basis, I think the Frequentist model makes more sense when thinking about quantum physics. The type of randomness we find there isn’t just about confidence, but is in fact a property of the quantum phenomena in question. In this case a well-calibrated Bayesian has to give a lot of probability mass to the hypothesis that there is a “true probability” in some quantum phenomena, which makes sense if we switch the model of p to be Frequentist.
But in short:
Yes, there’s a difference.
And things like “probability” and “belief” and “evidence” mean different things depending on what model you use.
What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
Yep, we disagree.
I think the disagreement is on two fronts. One is based on using different models of probability, which is basically not an interesting disagreement. (Arguing over which definition to use isn’t going to make either of us smarter.) But I think the other is substantive. I’ll focus on that.
In short, I think you underestimate the power of noticing implications of known facts. I think that if you look at a few common or well-known examples of incomplete deduction, it becomes pretty clear that figuring out how to finish thinking would be intensely powerful:
Many people make resolutions to exercise, be nicer, eat more vegetables, etc. And while making those resolutions, they often really think they mean it this time. And yet, there’s often a voice of doubt in the back of the mind, as though saying “Come on. You know this won’t work.” But people still quite often spend a bunch of time and money trying to follow through on their new resolution—often failing for reasons that they kind of already knew would happen (and yet often feeling guilty for not sticking to their plan!).
Religious or ideological deconversion often comes from letting in facts that are already known. E.g., I used to believe that the results of parapsychological research suggested some really important things about how to survive after physical death. I knew all the pieces of info that finally changed my mind months before my mind actually changed. I had even done experiments to test my hypotheses and it still took months. I’m under the impression that this is normal.
Most people reading this already know that if they put a ton of work into emptying their email inbox, they’ll feel good for a little while, and then it’ll fill up again, complete with the sense of guilt for not keeping up with it. And yet, somehow, it always feels like the right thing to do to go on an inbox-emptying flurry, and then get around to addressing the root cause “later” or maybe try things that will fail after a month or two. This is an agonizingly predictable cycle. (Of course, this isn’t how it goes for everyone, but it’s common enough that well over half the people who attend CFAR workshops seem to relate to it.)
Most of Einstein’s work in raising special relativity to consideration consisted of saying “Let’s take the Michelson-Morley result at face value and see where it goes.” Note that he is now considered the archetypal example of a brilliant person primarily for his ability to highlight worthy hypotheses via running with the implications of what is already known or supposed.
Ignaz Semmelweis found that hand-washing dramatically reduced mortality in important cases in hospitals. He was ignored, criticized, and committed to an insane asylum where guards beat him to death. At a cultural level, the fact that whether Semmelweis was right was (a) testable and (b) independent of opinion failed to propagate until after Louis Pasteur gave the medical community justification to believe that hand-washing could matter. This is a horrendous embarrassment, and thousands of people died unnecessarily because of a cultural inability to finish thinking. (Note that this also honors the need for empiricism—but the point here is that the ability to finish thinking was a prerequisite for empiricism mattering in this case.)
I could keep going. Hopefully you could too.
But my point is this:
Please note that there’s a baby in that bathwater you’re condemning as dirty.
Those are not different models. They are different interpretations of the utility of probability in different classes of applications.
though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model
You do it exactly the same as in your Bayesian example.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent. If you use probability to model the outcome of an inherently random event, people have called that “frequentist.” If instead you model the event as deterministic, but your knowledge over the outcome as uncertain, then people have applied the label “bayesian.” It’s the same probability, just used differently.
It’s like how if you apply your knowledge of mechanics to bridge and road building, it’s called civil engineering, but if you apply it to buildings it is architecture. It’s still mechanical engineering either way, just applied differently.
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent.
[…]
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
I know a fellow who has a Ph.D. in statistics and works for the Department of Defense on cryptography. I think he largely agrees with your point: professional statisticians need to use both methods fluidly in order to do useful work. But he also doesn’t claim that they’re both secretly the same thing. He says that strong Bayesianism is useless in some cases that Frequentism gets right, and vice versa, though his sympathies lie more with the Frequentist position on pragmatic grounds (i.e. that methods that are easier to understand in a Frequentist framing tend to be more useful in a wider range of circumstances in his experience).
I think the debate is silly. It’s like debating which model of hyperbolic geometry is “right”. Different models highlight different intuitions about the formal system, and they make different aspects of the formal theorems more or less relevant to specific cases.
I think Eliezer’s claim is that as a matter of psychology, using a Bayesian model of probability lets you think about the results of probability theory as laws of thought, and from that you can derive some useful results about how one ought to think and what results from experimental psychology ought to capture one’s attention. He might also be claiming somewhere that Frequentism is in fact inconsistent and therefore is simply a wrong model to adopt, but honestly if he’s arguing that then I’m inclined to ignore him because people who know a lot more about Frequentism than he does don’t seem to agree.
But there is a debate, even if I think it’s silly and quite pointless.
And also, the axiomatic models are different, even if statisticians use both.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
The concern about AI risk is also the result of an attempt to propagate implications of empirical data. It just goes farther than what I think you consider sensible, and I think you’re encouraging an unnecessary limitation on human reasoning power by calling such reasoning unjustified.
I agree, it should itch that there haven’t been empirical tests of several of the key ideas involved in AI risk, and I think there should be a visceral sense of making bullshit up attached to this speculation unless and until we can find ways to do those empirical tests.
But I think it’s the same kind of stupid to ignore these projections as it is to ignore that you already know how your New Year’s Resolution isn’t going to work. It’s not obviously as strong a stupidity, but the flavor is exactly the same.
If we could banish that taste from our minds, then even without better empiricism we would be vastly stronger.
I’m concerned that you’re underestimating the value of this strength, and viewing its pursuit as a memetic hazard.
I don’t think we have to choose between massively improving our ability to make correct clever arguments and massively improving the drive and cleverness with which we ask nature its opinion. I think we can have both, and I think that getting AI risk and things like it right requires both.
But just as measuring everything about yourself isn’t really a fully mature expression of empiricism, I’m concerned about the memes you’re spreading in the name of mature empiricism retarding the art of finishing thinking.
I don’t think that they have to oppose.
And I’m under the impression that you think otherwise.
But the argument itself is insufficient for drawing conclusions.
This seems like it would be true only if you’d already propagated all logical consequences of all observations you’ve made. But an argument can help me to propagate. Which means it can make me update my beliefs.
For example, is 3339799 a prime number?
One ought to assign some prior probability to it being a prime. A naive estimate might say, well, there are two options, so let’s assign it 50% probability.
You could also make a more sophisticated argument about the distribution of prime numbers spreading out as you go towards infinity, and given that only 25 of the first 100 numbers are prime, the chance that a randomly selected number in the millions should be prime is less than 25% and probably much lower.
I claim that in a case like this it is totally valid to update your beliefs on the basis of an argument. No additional empirical test required before updating.
I think the definition of ‘experiment’ gets tricky and confusing when you are talking about math specifically. When you talk about finding the distribution of prime numbers and using that to arrive at a more accurate model for your prior probability of 3339799 being prime, that is an experiment.
Math is unique in that regard though. For questions about the real world we must seek evidence that is outside of our heads.
[...] this shouldn’t change your actual beliefs [...] it does not, by itself, constitute evidence [...] the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
Is that a conclusion or a hypothesis? I don’t believe there is a fundamental distinction between “actual beliefs”, “conclusions” and “hypotheses”. What should it take to change my beliefs about this?
I’ll think about how this can be phrased differently such that it might sway you. Given that you are not Valentine, is there a difference of opinion between his posts above and your views?
That part you pulled out and quoted is essentially what I was writing about in the OP. There is a philosophy-over-hard-subjects which is pursued here, in the sequences, at FHI, and is exemplified in the conclusions drawn by Bostrom in Superintelligence, and Yudkowsky in the later sequences. Sometimes it works, e.g. the argument in the sequences about the compatibility of determinism and free will works because it essentially shows how non-determinism and free will are incompatible—it exposes a cached thought that free-will == non-deterministic choice which was never grounded in the first place. But over new subjects where you are not confused in the first place—e.g. the nature and risk of superintelligence—people seem to be using thought experiments alone to reach ungrounded conclusions, and not following up with empirical studies.
That is dangerous. If you allow yourself to reason from thought experiments alone, I can get you to believe almost anything. I can’t get you to believe the sky is green—unless you’ve never seen the sky—but anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself. That an AI takeoff would be hard, not soft, and basically uncontrollable. That boxing techniques are foredoomed to failure irregardless of circumstances. That intelligence and values are orthogonal under all circumstances. That cryonics is an open-and-shut case. On these sorts of questions we need more, not less experimentation.
When you hear a clever thought experiment that seems to demonstrate the truth of something you previously thought to have low probability, then (1) check if your priors here are inconsistent with each other; then (2) check if there is empirical data here that you have not fully updated on. If neither of those approaches resolves the issue, then (3) notice you are confused, and seek an experimental result to resolve the confusion. If you are truly unable to find an experimental test you can perform now, then (4) operate as if you do not know which of the possible theories is true.
You do not say “that thought experiment seemed convincing, so until I know otherwise I’ll update in favor of it.” That is the sort of thinking which led the ancients to believe that “All things come to rest eventually, so the natural state is a lack of motion. Planets continue in clockwork motion, so they must be a separate magisteria from earthly objects.” You may think we as rationalists are above that mistake, but history has shown otherwise. Hindsight bias makes the Greeks seem a lot stupider than they actually were.
Take a concrete example: the physical origin of consciousness. We can rule out the naïve my-atoms-constitute-my-consciousness view from biological arguments. However I have been unable to find or construct for myself an experiment which would definitively rule out either the information-identity or computational-process theories, both of which are supported by available empirical evidence.
How is this relevant? Some are arguing for brain preservation instead of cryonics. But this only achieves personal longevity if the information-identity theory is correct as it is destructive of the computational process. Cryonics on the other hand achieves personal longevity by preserving the computational substrate itself, which achieves both information- and computational-preservation. So unless there is a much larger difference in success likelihood than appears to be the case, my money (and my life) is on cryonics. Not because I think that computational-process theory is correct (although I do have other weak evidence that makes it more likely), but because I can’t rule it out as a possibility so I must consider the case where destructive brain preservation gets popularized but at the cost of fewer cryopreservations, and it turns out that personal longevity is only achieved with the preservation of computational processes. So I do not support the Brain Preservation Foundation.
To be clear, I think that arguing for destructive brain preservation at this point in time is a morally unconscionable thing to do, even though (exactly because!) we don’t know the nature of consciousness and personal identity, and there is an alternative which is likely to work no matter how that problem is resolved.
My point is that the very statements you are making, that we are all making all the time, are also very theory-loaded, “not followed up with empirical studies”. This includes the statements about the need to follow things up with empirical studies. You can’t escape the need for experimentally unverified theoretical judgement, and it does seem to work, even though I can’t give you a well-designed experimental verification of that. Some well-designed studies even prove that ghosts exist.
The degree to which discussion of familiar topics is closer to observations than discussion of more theoretical topics is unclear, and the distinction should be cashed out as uncertainty on a case-by-case basis. Some very theoretical things are crystal clear math, more certain than the measurement of the charge of an electron.
That is dangerous.
Being wrong is dangerous. Not taking theoretical arguments into account can result in error. This statement probably wouldn’t be much affected by further experimental verification. What specifically should be concluded depends on the problem, not on a vague outside measure of the problem like the degree to which it’s removed from empirical study.
[...] anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself.
Before considering the truth of a statement, we should first establish its meaning, which describes the conditions for judging its truth. For a vague idea, there are many alternative formulations of its meaning, and it may be unclear which one is interesting, but that’s separate from the issue of thinking about any specific formulation clearly.
Ghosts specifically seem like too complicated a hypothesis to extract from any experimental results I’m aware of. If we didn’t already have a concept of ghosts, I doubt any parapsychology experiments that have taken place would have caused us to develop one.
People select hypotheses for testing because they have previously weakly updated in the direction of them being true. Seeing empirical data produces a later, stronger update.
Except that when the hypothesis space is large, people test hypotheses because they strongly updated in the direction of them being true, and seeing empirical data produces a later, weaker update. Where an example of ‘strongly updating’ could be going from 9,999,999:1 odds against a hypothesis to 99:1 odds, and an example of ‘weakly updating’ could be going from 99:1 odds against the hypothesis to 1:99. The former update requires about 20 bits of evidence, while the latter update requires about 10 bits of evidence.
Interesting point. I guess my intuitive notion of a “strong update” has to do with absolute probability mass allocation rather than bits of evidence (probability mass is what affects behavior?), but that’s probably not a disagreement worth hashing out.
Thanks! Paul Graham is my hero when it comes to writing and I try to pack ideas as tightly as possible. (I recently reread this essay of his and got amazed by how many ideas it contains; I think it has more intellectual content than most published nonfiction books, in just 10 pages or so. I guess the downside of this style is that readers may not go slow enough to fully absorb all the ideas. Anyway, I’m convinced that Paul Graham is the Ben Franklin of our era.)
Selecting a likely hypothesis for consideration does not alter that hypothesis’ likelihood. Do we agree on that?
Hmm. Maybe. It depends on what you mean by “likelihood”, and by “selecting”.
Trivially, noticing a hypothesis and that it’s likely enough to justify being tested absolutely is making it subjectively more likely than it was before. I consider that tautological.
If someone is looking at n hypotheses and then decided to pick the kth one to test (maybe at random, or maybe because they all need to be tested at some point so why not start with the kth one), then I quite agree, that doesn’t change the likelihood of hypothesis #k.
But in my mind, it’s vividly clear that the process of plucking a likely hypothesis out of hypothesis space depends critically on moving probability mass around in said space. Any process that doesn’t do that is literally picking a hypothesis at random. (Frankly, I’m not sure a human mind even can do that.)
The core problem here is that most default human ways of moving probability mass around in hypothesis space (e.g. clever arguments) violate the laws of probability, whereas empirical tests aren’t nearly as prone to that.
So, if you mean to suggest that figuring out which hypothesis is worthy of testing does not involve altering our subjective likelihood that said hypothesis will turn out to be true, then I quite strongly disagree.
But if you mean that clever arguments can’t change what’s true even by a little bit, then of course I agree with you.
Perhaps you’re using a Frequentist definition of “likelihood” whereas I’m using a Bayesian one?
There’s a difference? Probability is probability.
If you go about selecting a hypothesis by evaluating a space of hypotheses to see how they rate against your model of the world (whether you think they are true) and against each other (how much you stand to learn by testing them), you are essentially coming to reflective equilibrium regarding these hypothesis and your current beliefs. What I’m saying is that this shouldn’t change your actual beliefs—it will flush out some stale caching, or at best identify an inconsistent belief, including empirical data that you haven’t fully updated on. But it does not, by itself, constitute evidence.
So a clever argument might reveal an inconsistency in your priors, which in turn might make you want seek out new evidence. But the argument itself is insufficient for drawing conclusions. Even if the hypothesis is itself hard to test.
There very much is a difference.
Probability is a mathematical construct. Specifically, it’s a special kind of measure p on a measure space M such that p(M) = 1 and p obeys a set of axioms that we refer to as the axioms of probability (where an “event” from the Wikipedia page is to be taken as any measurable subset of M).
This is a bit like highlighting that Euclidean geometry is a mathematical construct based on following thus-and-such axioms for relating thus-and-such undefined terms. Of course, in normal ways of thinking we point at lines and dots and so on, pretend those are the things that the undefined terms refer to, and proceed to show pictures of what the axioms imply. Formally, mathematicians refer to this as building a model of an axiomatic system. (Another example of this is elliptic geometry, which is a type of non-Euclidean geometry, which you can model as doing geometry on a sphere.)
The Frequentist and Bayesian models of probability theory are relevantly different. They both think of M as the space of possible results (usually called the “sample space” but not always) and a measurable subset E ≤ M as an “event”. But they use different models of p:
Frequentists suggest that were you to look at how often all of the events in M occur, the one we’re looking at (i.e., E) would occur at a certain frequency, and that’s how we should interpret p(E). E.g., if M is the set of results from flipping a fair coin and E is “heads”, then it is a property of the setup that p(E) = 0.5. A different way of saying this is that Frequentists model p as describing a property of that which they are observing—i.e., that probability is a property of the world.
Bayesians, on the other hand, model p as describing their current state of confidence about the true state of the observed phenomenon. In other words, Bayesians model p as being a property of mental models, not of the world. So if M is again the results from flipping a fair coin and E is “heads”, then to a Bayesian the statement p(E) = 0.5 is equivalent to saying “I equally expect getting a heads to not getting a heads from this coin flip.” To a Bayesian, it doesn’t make sense to ask what the “true” probability is that their subjective probability is estimating; the very question violates the model of p by trying to sneak in a Frequentist presumption.
Now let’s suppose that M is a hypothesis space, including some sector for hypotheses that haven’t yet been considered. When we say that a given hypothesis H is “likely”, we’re working within a partial model, but we haven’t yet said what “likely” means. The formalism is easy: we require that H ≤ M is measurable, and the statement that “it’s likely” means that p(H) is larger than most other measurable subsets of M (and often we mean something stronger, like p(H) > 0.5). But we haven’t yet specified in our model what p(H) means. This is where the difference between Frequentism and Bayesianism matters. A Frequentist would say that the probability is a property of the hypothesis space, and noticing H doesn’t change that. (I’m honestly not sure how a Frequentist thinks about iterating over a hypothesis space to suggest that H in fact would occur at a frequency of p(H) in the limit—maybe by considering the frequency in counterfactual worlds?) A Bayesian, by contrast, will say that p(H) is their current confidence that H is the right hypothesis.
What I’m suggesting, in essence, is that figuring out which hypothesis H ≤ M is worth testing is equivalent to moving from p to p’ in the space of probability measures on M in a way that causes p’(H) > p(H). This is coming from using a Bayesian model of what p is.
Of course, if you’re using a Frequentist model of p, then “most likely hypothesis” actually refers to a property of the hypothesis space—though I’m not sure how you would find out the frequency at which hypotheses turn out to be true the way you figure out the frequency at which a coin comes up heads. But that could just be my not being as familiar thinking in terms of the Frequentist model.
I’ll briefly note that although I find the Bayesian model more coherent with my sense of how the world works on a day-by-day basis, I think the Frequentist model makes more sense when thinking about quantum physics. The type of randomness we find there isn’t just about confidence, but is in fact a property of the quantum phenomena in question. In this case a well-calibrated Bayesian has to give a lot of probability mass to the hypothesis that there is a “true probability” in some quantum phenomena, which makes sense if we switch the model of p to be Frequentist.
But in short:
Yes, there’s a difference.
And things like “probability” and “belief” and “evidence” mean different things depending on what model you use.
Yep, we disagree.
I think the disagreement is on two fronts. One is based on using different models of probability, which is basically not an interesting disagreement. (Arguing over which definition to use isn’t going to make either of us smarter.) But I think the other is substantive. I’ll focus on that.
In short, I think you underestimate the power of noticing implications of known facts. I think that if you look at a few common or well-known examples of incomplete deduction, it becomes pretty clear that figuring out how to finish thinking would be intensely powerful:
Many people make resolutions to exercise, be nicer, eat more vegetables, etc. And while making those resolutions, they often really think they mean it this time. And yet, there’s often a voice of doubt in the back of the mind, as though saying “Come on. You know this won’t work.” But people still quite often spend a bunch of time and money trying to follow through on their new resolution—often failing for reasons that they kind of already knew would happen (and yet often feeling guilty for not sticking to their plan!).
Religious or ideological deconversion often comes from letting in facts that are already known. E.g., I used to believe that the results of parapsychological research suggested some really important things about how to survive after physical death. I knew all the pieces of info that finally changed my mind months before my mind actually changed. I had even done experiments to test my hypotheses and it still took months. I’m under the impression that this is normal.
Most people reading this already know that if they put a ton of work into emptying their email inbox, they’ll feel good for a little while, and then it’ll fill up again, complete with the sense of guilt for not keeping up with it. And yet, somehow, it always feels like the right thing to do to go on an inbox-emptying flurry, and then get around to addressing the root cause “later” or maybe try things that will fail after a month or two. This is an agonizingly predictable cycle. (Of course, this isn’t how it goes for everyone, but it’s common enough that well over half the people who attend CFAR workshops seem to relate to it.)
Most of Einstein’s work in raising special relativity to consideration consisted of saying “Let’s take the Michelson-Morley result at face value and see where it goes.” Note that he is now considered the archetypal example of a brilliant person primarily for his ability to highlight worthy hypotheses via running with the implications of what is already known or supposed.
Ignaz Semmelweis found that hand-washing dramatically reduced mortality in important cases in hospitals. He was ignored, criticized, and committed to an insane asylum where guards beat him to death. At a cultural level, the fact that whether Semmelweis was right was (a) testable and (b) independent of opinion failed to propagate until after Louis Pasteur gave the medical community justification to believe that hand-washing could matter. This is a horrendous embarrassment, and thousands of people died unnecessarily because of a cultural inability to finish thinking. (Note that this also honors the need for empiricism—but the point here is that the ability to finish thinking was a prerequisite for empiricism mattering in this case.)
I could keep going. Hopefully you could too.
But my point is this:
Please note that there’s a baby in that bathwater you’re condemning as dirty.
Those are not different models. They are different interpretations of the utility of probability in different classes of applications.
You do it exactly the same as in your Bayesian example.
I’m sorry, but this Bayesian vs Frequentist conflict is for the most part non-existent. If you use probability to model the outcome of an inherently random event, people have called that “frequentist.” If instead you model the event as deterministic, but your knowledge over the outcome as uncertain, then people have applied the label “bayesian.” It’s the same probability, just used differently.
It’s like how if you apply your knowledge of mechanics to bridge and road building, it’s called civil engineering, but if you apply it to buildings it is architecture. It’s still mechanical engineering either way, just applied differently.
One of the failings of the sequences is the amount of emphasis that is placed on “Frequentist” vs “Bayesian” interpretations. The conflict between the two exists mostly in Yudkowsky’s mind. Actual statisticians use probability to model events and knowledge of events simultaneously.
Regarding the other points, every single example you gave involves using empirical data that had not sufficiently propagated, which is exactly the sort of use I am in favor of. So I don’t know what it is that you disagree with.
That’s what a model is in this case.
How sure are you of that?
I know a fellow who has a Ph.D. in statistics and works for the Department of Defense on cryptography. I think he largely agrees with your point: professional statisticians need to use both methods fluidly in order to do useful work. But he also doesn’t claim that they’re both secretly the same thing. He says that strong Bayesianism is useless in some cases that Frequentism gets right, and vice versa, though his sympathies lie more with the Frequentist position on pragmatic grounds (i.e. that methods that are easier to understand in a Frequentist framing tend to be more useful in a wider range of circumstances in his experience).
I think the debate is silly. It’s like debating which model of hyperbolic geometry is “right”. Different models highlight different intuitions about the formal system, and they make different aspects of the formal theorems more or less relevant to specific cases.
I think Eliezer’s claim is that as a matter of psychology, using a Bayesian model of probability lets you think about the results of probability theory as laws of thought, and from that you can derive some useful results about how one ought to think and what results from experimental psychology ought to capture one’s attention. He might also be claiming somewhere that Frequentism is in fact inconsistent and therefore is simply a wrong model to adopt, but honestly if he’s arguing that then I’m inclined to ignore him because people who know a lot more about Frequentism than he does don’t seem to agree.
But there is a debate, even if I think it’s silly and quite pointless.
And also, the axiomatic models are different, even if statisticians use both.
The concern about AI risk is also the result of an attempt to propagate implications of empirical data. It just goes farther than what I think you consider sensible, and I think you’re encouraging an unnecessary limitation on human reasoning power by calling such reasoning unjustified.
I agree, it should itch that there haven’t been empirical tests of several of the key ideas involved in AI risk, and I think there should be a visceral sense of making bullshit up attached to this speculation unless and until we can find ways to do those empirical tests.
But I think it’s the same kind of stupid to ignore these projections as it is to ignore that you already know how your New Year’s Resolution isn’t going to work. It’s not obviously as strong a stupidity, but the flavor is exactly the same.
If we could banish that taste from our minds, then even without better empiricism we would be vastly stronger.
I’m concerned that you’re underestimating the value of this strength, and viewing its pursuit as a memetic hazard.
I don’t think we have to choose between massively improving our ability to make correct clever arguments and massively improving the drive and cleverness with which we ask nature its opinion. I think we can have both, and I think that getting AI risk and things like it right requires both.
But just as measuring everything about yourself isn’t really a fully mature expression of empiricism, I’m concerned about the memes you’re spreading in the name of mature empiricism retarding the art of finishing thinking.
I don’t think that they have to oppose.
And I’m under the impression that you think otherwise.
This seems like it would be true only if you’d already propagated all logical consequences of all observations you’ve made. But an argument can help me to propagate. Which means it can make me update my beliefs.
For example, is 3339799 a prime number?
One ought to assign some prior probability to it being a prime. A naive estimate might say, well, there are two options, so let’s assign it 50% probability.
You could also make a more sophisticated argument about the distribution of prime numbers spreading out as you go towards infinity, and given that only 25 of the first 100 numbers are prime, the chance that a randomly selected number in the millions should be prime is less than 25% and probably much lower.
I claim that in a case like this it is totally valid to update your beliefs on the basis of an argument. No additional empirical test required before updating.
Do you agree?
I think the definition of ‘experiment’ gets tricky and confusing when you are talking about math specifically. When you talk about finding the distribution of prime numbers and using that to arrive at a more accurate model for your prior probability of 3339799 being prime, that is an experiment.
Math is unique in that regard though. For questions about the real world we must seek evidence that is outside of our heads.
Is that a conclusion or a hypothesis? I don’t believe there is a fundamental distinction between “actual beliefs”, “conclusions” and “hypotheses”. What should it take to change my beliefs about this?
I’ll think about how this can be phrased differently such that it might sway you. Given that you are not Valentine, is there a difference of opinion between his posts above and your views?
That part you pulled out and quoted is essentially what I was writing about in the OP. There is a philosophy-over-hard-subjects which is pursued here, in the sequences, at FHI, and is exemplified in the conclusions drawn by Bostrom in Superintelligence, and Yudkowsky in the later sequences. Sometimes it works, e.g. the argument in the sequences about the compatibility of determinism and free will works because it essentially shows how non-determinism and free will are incompatible—it exposes a cached thought that free-will == non-deterministic choice which was never grounded in the first place. But over new subjects where you are not confused in the first place—e.g. the nature and risk of superintelligence—people seem to be using thought experiments alone to reach ungrounded conclusions, and not following up with empirical studies.
That is dangerous. If you allow yourself to reason from thought experiments alone, I can get you to believe almost anything. I can’t get you to believe the sky is green—unless you’ve never seen the sky—but anything you yourself don’t have available experimental evidence for or against, I can sway you in either way. E.g. that consciousness is in information being computed and not the computational process itself. That an AI takeoff would be hard, not soft, and basically uncontrollable. That boxing techniques are foredoomed to failure irregardless of circumstances. That intelligence and values are orthogonal under all circumstances. That cryonics is an open-and-shut case. On these sorts of questions we need more, not less experimentation.
When you hear a clever thought experiment that seems to demonstrate the truth of something you previously thought to have low probability, then (1) check if your priors here are inconsistent with each other; then (2) check if there is empirical data here that you have not fully updated on. If neither of those approaches resolves the issue, then (3) notice you are confused, and seek an experimental result to resolve the confusion. If you are truly unable to find an experimental test you can perform now, then (4) operate as if you do not know which of the possible theories is true.
You do not say “that thought experiment seemed convincing, so until I know otherwise I’ll update in favor of it.” That is the sort of thinking which led the ancients to believe that “All things come to rest eventually, so the natural state is a lack of motion. Planets continue in clockwork motion, so they must be a separate magisteria from earthly objects.” You may think we as rationalists are above that mistake, but history has shown otherwise. Hindsight bias makes the Greeks seem a lot stupider than they actually were.
Take a concrete example: the physical origin of consciousness. We can rule out the naïve my-atoms-constitute-my-consciousness view from biological arguments. However I have been unable to find or construct for myself an experiment which would definitively rule out either the information-identity or computational-process theories, both of which are supported by available empirical evidence.
How is this relevant? Some are arguing for brain preservation instead of cryonics. But this only achieves personal longevity if the information-identity theory is correct as it is destructive of the computational process. Cryonics on the other hand achieves personal longevity by preserving the computational substrate itself, which achieves both information- and computational-preservation. So unless there is a much larger difference in success likelihood than appears to be the case, my money (and my life) is on cryonics. Not because I think that computational-process theory is correct (although I do have other weak evidence that makes it more likely), but because I can’t rule it out as a possibility so I must consider the case where destructive brain preservation gets popularized but at the cost of fewer cryopreservations, and it turns out that personal longevity is only achieved with the preservation of computational processes. So I do not support the Brain Preservation Foundation.
To be clear, I think that arguing for destructive brain preservation at this point in time is a morally unconscionable thing to do, even though (exactly because!) we don’t know the nature of consciousness and personal identity, and there is an alternative which is likely to work no matter how that problem is resolved.
My point is that the very statements you are making, that we are all making all the time, are also very theory-loaded, “not followed up with empirical studies”. This includes the statements about the need to follow things up with empirical studies. You can’t escape the need for experimentally unverified theoretical judgement, and it does seem to work, even though I can’t give you a well-designed experimental verification of that. Some well-designed studies even prove that ghosts exist.
The degree to which discussion of familiar topics is closer to observations than discussion of more theoretical topics is unclear, and the distinction should be cashed out as uncertainty on a case-by-case basis. Some very theoretical things are crystal clear math, more certain than the measurement of the charge of an electron.
Being wrong is dangerous. Not taking theoretical arguments into account can result in error. This statement probably wouldn’t be much affected by further experimental verification. What specifically should be concluded depends on the problem, not on a vague outside measure of the problem like the degree to which it’s removed from empirical study.
Before considering the truth of a statement, we should first establish its meaning, which describes the conditions for judging its truth. For a vague idea, there are many alternative formulations of its meaning, and it may be unclear which one is interesting, but that’s separate from the issue of thinking about any specific formulation clearly.
I”m not aware of ghosts, Scott talks about telepathy and precognition studies.
Ghosts specifically seem like too complicated a hypothesis to extract from any experimental results I’m aware of. If we didn’t already have a concept of ghosts, I doubt any parapsychology experiments that have taken place would have caused us to develop one.
People select hypotheses for testing because they have previously weakly updated in the direction of them being true. Seeing empirical data produces a later, stronger update.
Except that when the hypothesis space is large, people test hypotheses because they strongly updated in the direction of them being true, and seeing empirical data produces a later, weaker update. Where an example of ‘strongly updating’ could be going from 9,999,999:1 odds against a hypothesis to 99:1 odds, and an example of ‘weakly updating’ could be going from 99:1 odds against the hypothesis to 1:99. The former update requires about 20 bits of evidence, while the latter update requires about 10 bits of evidence.
Interesting point. I guess my intuitive notion of a “strong update” has to do with absolute probability mass allocation rather than bits of evidence (probability mass is what affects behavior?), but that’s probably not a disagreement worth hashing out.
I like your way of saying it. It’s much more efficient than mine!
Thanks! Paul Graham is my hero when it comes to writing and I try to pack ideas as tightly as possible. (I recently reread this essay of his and got amazed by how many ideas it contains; I think it has more intellectual content than most published nonfiction books, in just 10 pages or so. I guess the downside of this style is that readers may not go slow enough to fully absorb all the ideas. Anyway, I’m convinced that Paul Graham is the Ben Franklin of our era.)