I’m going to focus on the overuse of the inside view, and the relative disuse of base rates and outside view. And it’s why I think Eliezer’s views on AI doom are probably not rational, and instead the product of a depression spiral, to quote John Maxwell.
On base rates of predictions of extinction, the obvious answer is that no extinction events happened out of 172 predicted ones, and while that’s not enough of a sample to draw strong conclusions, it does imply that very high confidence in doom by a specific date is not very rational, unless you believe that you have something special that changes this factor.
The issue is that LWers generally assume that certain things are entirely new every time and that everything is special, and I think this assumption is overused in both LW and the broader world, which probably leads to the problem of overvaluing your own special inside view compared to others outside views.
This is not sound reasoning because of selection bias. If any of those predictions had been correct, you would not be here to see it. Thus, you cannot use their failure as evidence.
I notice I’m a bit confused about that. Let’s say the only thing I know about the sun is “That bright yellow thing that provides heat”, and “The sun is really really old”, so I have no knowledge about how the sun mechanistically does what it does.
I want to know “How likely is the sun to explode in the next hour” because I’ve got a meeting to go to and it sure would be inconvenient for the sun to explode before I got there. My reasoning is “Well, the sun hasn’t exploded for billions of years, so it’s not about to explode in the next hour, with very high probability.”
Is this reasoning wrong? If so, what should my probability be? And how do I differentiate between “The sun will explode in the next hour” and “The sun will explode in the next year”?
Yes, IMO the reasoning is wrong: if you you definitely cannot survive an event, then observing that the event did not happened is not evidence at all that it will not explode in the future—and it continues to not be evidence as long as you continue to observe the non-explosion.
Since we can survive at least for a little while the sudden complete darkening of the sun the sun’s not having gone dark is evidence that it will not go dark in the future, but it is less strong evidence than it would be if we could survive the darkening of the sun indefinitely.
The law of the conservation of expected evidence requires us to take selection effects like those into account—and the law is a simple consequence of the axioms of probability, so to cast doubt on it is casting doubt on the validity of the whole idea of probability (in which case, Cox’s theorems would like to have a word with you).
This is not settled science: there is not widespread agreement among scholars or on this site on this point, but its counter-intuitiveness is not by itself a strong reason to disbelieve it because there are parts of settled science that are as counterintuitive as this is: for example, the twin paradox of special relativity and “particle identity in quantum physics”.
When you believe that the probability of a revolution in the US is low because the US government is 230 or so years old and hasn’t had a revolution yet, you are doing statistical reasoning. In contrast, noticing that if the sun exploded violently enough, we would immediately all die and consequently we would not be having this conversation—that is causal reasoning. Judea Pearl makes this distinction in the intro to his book Causality. Taking into account selection effects is using causal reasoning (your knowledge of the causal structure of reality) to modify a conclusion of statistical reasoning. You can still become confident that the sun will explode soon if you have a refined-enough causal model of the sun.
Off topic, but I’d just like to say this “good/bad comment” vs “I agree/disagree” voting distinction is amazing.
It allows us to separate our feeling on the content of the comment from our feeling on the appropriateness of the comment in the discussion. We can vote to disagree with a post without insulting the user for posting it. On reddit, this is sorely lacking, and it’s one (of many) reasons every sub is an unproductive circle jerk.
I upvoted both of your comments, while also voting to disagree. Thanks for posting them. What a great innovation to stimulate discussion.
So, I notice that still doesn’t answer the actual question of what my probability should actually be. To make things simple, let’s assume that, if the sun exploded, I would die instantly. In practice it would have to take at least eight minutes, but as a simplifying assumption, let’s assume it’s instantaneous.
In the absence of relevant evidence, it seems to me like Laplace’s Law of Succession would say the odds of the sun exploding in the next hour is 1⁄2. But I could also make that argument to say the odds of the sun exploding in the next year is also 1⁄2, which is nonsensical. So...what’s my actual probability, here, if I know nothing about how the sun works except that it has not yet exploded, the sun is very old (which shouldn’t matter, if I understand you correctly) and that if it exploded, we would all die?
In practice it would have to take at least eight minutes
We don’t need to consider that here because any evidence of the explosion would also take at least eight minutes to arrive, so there is approximately zero minutes during which you are able to observe the evidence of the explosion before you are converted into a plasma that has no ability to update on anything. That is when observational selection effects are at their strongest: namely, when you are vanishingly unlikely to be in one of those intervals between your having observed an event and that event’s destroying your ability to maintain any kind of mental model of reality.
We 21st-century types have so much causal information about reality that I have been unable during this reply to imagine any circumstance where I would resort to Laplace’s law of succession to estimate any probability in anger where observational selection effects also need to be considered. It’s not that I doubt the validity of the law; its just that I have been unable to imagine a situation in which the causal information I have about an “event” does not trump the statistical information I have about how many times the event has been observed to occur in the past and I also have enough causal information to entertain real doubts about my ability to survive if the event goes the wrong way while remaining confident in my survival if the event goes the right way.
Certainly we can imagine ourselves in the situation of the physicists of the 1800s who had no solid guess as to the energy source keeping the sun shining steadily. But even they had the analogy with fire. (The emissions spectra of the sun and of fire are both I believe well approximated as blackbody radiation and the 1800s had prisms and consequently at least primitive spectrographs.) A fire doesn’t explode unless you suddenly give it fuel—and not any fuel will do: adding logs to a fire will not cause an explosion, but adding enough gasoline will. “Where would the fuel come from that would cause the sun to explode?” the 1800s can ask. Planets are made mostly of rocks, which don’t burn, and comets aren’t big enough. Merely what I have written in this short paragraph would be enough to trump IMO statistical
considerations of how many days the sun has gone without exploding.
If I found myself in a star-trek episode in which every night during sleep I find myself transported into some bizarre realm of “almost-pure sensation” where none of my knowledge of reality seems to apply and where a sun-like thing rises and sets, then yeah, I can imagine using the law of succession, but then for observational selection effects to enter the calculation, I’d have to have enough causal information about this sun-like thing (and about my relationship to the bizarre realm) to doubt my ability to survive if it sets and never rises again, but that seems to contradict the assumption that none of my knowledge of reality applies to the bizarre realm.
My probability of the sun’s continuing to set and rise without exploding is determined exclusively by (causal) knowledge created by physicists and passed down to me in books, etc: how many times the sun has risen so far is in comparison of negligible importance. This knowledge is solid and “settled” enough that it is extremely unlikely that any sane physicist would announce that, well, actually, the sun is going to explode—probably within our lifetimes! But if a sane physicist did make such an announcement, I would focus on the physicist’s argument (causal knowledge) and pay almost no attention to the statistical information of how long there have been reliable observations of the sun’s not exploding—and this is true even if I were sure I could survive if the sun exploded—because the causal model is so solid (and the facts the model depends on, e.g., the absorption spectra of hydrogen and helium, are so easily checked). Consequently, the explosion of the sun is not a good example of where observational selection effects become important.
By the way, observational selection effects are hairy enough that I basically cannot calculate anything about them. Suppose for example that if Russia attacked the US with nukes, I would survive with p = .4 (which seems about right). (I live in the US.) Suppose further that my causal model of Russian politics makes my probability that Russia will attack the US with nukes some time in the next 365 days as .003 if Russia had deployed nukes for the first time today (i.e., if Russia didn’t have any nukes till right now). How should I adjust my probability (i.e., the .003) to take into account that fact that Russia’s nukes were in fact deployed starting in 1953 (year?) and so far Russia has never attacked the US with nukes? I don’t know! (And I have practical reasons for wanting to do this particular calculation, so I’ve thought a lot about it over the years. I do know that my probability should be greater than it should be if I and my ability to reason were impervious to nuclear attacks. In contrast to the solar-explosion situation, here is a situation in which the causal knowledge is uncertain enough that it would be genuinely useful to employ the statistical knowledge we have; it is just that I don’t know how to employ it in a calculation.) But things that are almost certain to end my life are much easier to reason about—when it comes to observational selection effects—than something that has a .4 chance of ending my life.
In particular, most of the expected negative utility from AGI research stems from scenarios in which without warning—more precisely, without anything that the average person would recognize as a warning—an AGI kills every one of us. The observational selection effects around such a happening are easier to reason about than those around a nuclear attack: specifically, the fact that the predicted event hasn’t happened yet is not evidence at all that it will not happen in the future. If a powerful magician kills everyone who tries to bring you the news that the Red Socks have won the World Series of Baseball, and if that magician is extremely effective at his task, then your having observed that the Yankees win the World Series every time it occurs (which is strangely not every year, but some years have no World Series as far as you have heard) is not evidence at all about how often the Red Socks have won the World Series.
And the fact that Eliezer has been saying for at least a few months now that AGI could kill us all any day now—that the probability that it will happen 15 years from now is greater than that probability that it will happen today, but the probability it will happen today is nothing to scoff at—is is very weak evidence against what he’s been saying if it is evidence against it at all. A sufficiently rational person will assign what he has been saying the same or very nearly the same probability he would have if Eliezer had started saying it today. In both cases, a sufficiently rational person will focus almost entirely on Eliezer’s argument (complicated though it is) and counterarguments and will give almost no weight to how long Eliezer’s been saying it or how long AGIs have been in existence. Or more precisely, that is what a sufficiently rational person would do if he or she believed that he or she is unlikely to receive any advance warning of a deadly strike by the AGI beyond the warnings given so far by Eliezer and other AGI pessimists.
Eliezer’s argument is more complicated than the reasoning that tells us that the sun will not explode any time soon. More complicated means more likely to contain a subtle flaw. Moreover, it has been reviewed by fewer experts than the solar argument. Consequently, here is a situation in which it would be genuinely useful to use statistical information (e.g., the fact that research labs have been running AGIs for years (ChatGPT is an AGI for example) combined with the fact that we are still alive) but the statistical information is in fact IMO useless because of the extremely strong observational selection effects.
I’m at a local convenient store. A thief routinely robs me. He points a gun at me, threatens me, but never shoots, even when I push back a little. At this point, it’s kind of like we both know what’s happening, even though, technically, there’s a chance of physical danger.
Had this guy shot me, I wouldn’t be alive to reason about his next visit.
Now consider a different thief comes in, also armed. What is my probability of getting shot, as compared with the first thief?
Much, much, higher with the second thief. My past experiences with the first thief act as evidence towards the update that I’m less likely to be shot. With this new thief, I don’t have that evidence, so my probability of being shot is just the based rate based on my read of the situation.
I believe updating on the non-fatal encounters with the first thief is correct, and it seems to me analogous to updating on the sun not having exploded. Thoughts?
Because a person has a significant chance of surviving a bullet wound—or more relevantly, of surviving an assault with a gun—your not having been assaulted by the first thief is evidence that you will not be assaulted in future encounters with him, but it is weaker evidence than it would be if you could be certain of your ability to survive (and your ability to retain your rationality skills and memories after) every encounter with him.
Humans are very good at reading the “motivational states” of the other people in the room with them. If for example the thief’s eyes are glassy and he looks like he is staring at something far away even though you know it is unlikely there there is anything of interest in his visual field far away, well that is a sign he is in a dissociated state, which makes it more likely he’ll do something unpredictable and maybe violent. If when he looks at you he seems to look right through you, that is a sign of a coldness that also makes it more likely he will be violent if he can thereby benefit himself personally by doing so. So, what is actually doing most of the work of lowering your probability about the danger to you posed the the first thief? The mere fact that you escaped all the previous encounters without having been assaulted or your observations of his body language, tone of voice and other details that give clues about his personality and his mental state?
If I know nothing about the boxes except that they have the same a priori probability of exploding and killing me, then I am indifferent between the two black boxes.
It is not terribly difficult to craft counter-intuitive examples of the principle. I anticipated I would be presented with such examples (because this is not my first time discussing this topic), which is why in my original comment I wrote, “its counter-intuitiveness is not by itself a strong reason to disbelieve it,” and the rest of that paragraph.
Let each black box have some probability to kill you, uniformly chosen from a set of possible probabilities. Let’s start with a simple one: that probability is 0 or 1.
The a prior chance to kill you is .5.
After the box doesn’t kill you, you update, and now the chance is 0.
What about if we use a uniform distribution from [0,1)? Some boxes are .3 to kill you, others .78.
Far more of the experiences of not dying are from the low p-kill boxes than from the high p-kill ones. When people select the same box, instead of a new one, after not being killed, that brings the average kill rate of selected boxes down. Run this experiment for long enough, and the only boxes still being selected are the extremely low p-kill boxes that haven’t killed all their subjects yet.
This time, could you make a stronger objection, that’s more directly addressed at my counter-example?
In your new scenario, if I understand correctly, you have postulated that one box always explodes and one never explodes; I must undergo 2 experiences: the first experience is with one of the boxes, picked at random; then I get to choose whether my second experience is with the same box or whether it is with the other box. But I don’t need to know the outcome of the first experience to know that I want to limit my exposure to just one of these dangerous boxes: I will always choose to undergo the second experience with the same box as I underwent the first one with. Note that I arrived at this choice without doing the thing that I have been warning people not to do, namely, to update on observation X when I know it would have been impossible for me to survive (or more precisely for my rationality, my ability to have and to refine a model of reality, to survive) the observation not X.
That takes care of the first of your two new scenarios. In your second new scenario, I have a .5 chance of dying during my first experience. Then I may choose whether my second experience is with the same box or a new one. Before I make my choice, I would dearly love to experiment with either box in a setting in which I could survive the box’s exploding. But by your postulate as I understand it, that is not possible, so I am indifferent about which box I have my second experience with: either way I choose, my probability that I will die during the second experience is .5.
Note the in your previous comment, in which there was some P such each time a box is used, it has a probability P of exploding, there is no benefit to my being able to experiment with a box in a setting in which I could survive an explosion, but in the scenario we are considering now there is a huge benefit.
Suppose my best friend is observing the scenario from a safe distance: he can see what is happening, but is protected from any exploding box. My surviving the first experience changes his probability that the box used in the first experience will explode the next time it is used from .5 to .333. Actually, I am not sure of that number (because I am not sure the law of succession applies here—it has been a long time since I read my E.T. Jaynes) but I am sure that his probability changes from .5 to something less than .5. And my best friend can communicate that fact to me: “Richard,” he can say, “stick with the same box used in your first experience.” But his message has the same defect that my directly observing the behavior of the box has: namely, since I cannot survive the outcome that would have led him to increase his probability that the box will explode the next time it is used, I cannot update on the fact that his probability has decreased.
Students of E.T. Jaynes know that observer A’s probability of hypothesis H can differ from observer B’s probability: this happens when A has seen evidence for or against H that B has not seen yet. Well, here we have a case where A’s probability can differ from B’s even though A and B have seen the same sequence of evidence about H: namely, that happens when one of the observers could not have survived having observed a sequence of events (different from the sequence that actually happened) that the other observer could have survived.
TropicalFruit and I have taken this discussion private (in order to avoid flooding this comment section with discussion on a point only very distantly related to the OP.) However if you have any interest in the discussion, ask one of us for a copy. (We have both agreed to provide a copy to whoever asks.)
It seems to me there is a distinction to be made: It is one thing to conclude that, 1) Eliezer doesn’t know how to predict the date of AI Doom. That’s different from asserting that 2) AI Doom is not going to happen. 1 is not evidence for 2.
I think it’s appropriate to draw some better lines through concept space for apocalyptic predictions, when determining a base rate, than just “here’s an apocalyptic prediction and a date.” They aren’t all created equal.
Herbert W Armstrong is on this list 4 times… each time with a new incorrect prediction. So you’re counting this guy who took 4 guesses, all wrong, as 4 independent samples on which we should form a base rate.
And by using this guy in the base rate, you’re implying Eliezer’s prediction is in the same general class as Armstrong’s, which is a stretch to say the least.
A pretty simple class distinction is: how accurate are other predictions the person has made? How has Eliezer’s prediction record been? How have his AI timeline predictions been?
I don’t know the answers to these questions, maybe they really have been bad, but I’m assuming they’re pretty good. If that’s the case, then clearly Eliezer’s prediction doesn’t deserve to classified with the predictions listed on that page.
I’m going to focus on the overuse of the inside view, and the relative disuse of base rates and outside view. And it’s why I think Eliezer’s views on AI doom are probably not rational, and instead the product of a depression spiral, to quote John Maxwell.
On base rates of predictions of extinction, the obvious answer is that no extinction events happened out of 172 predicted ones, and while that’s not enough of a sample to draw strong conclusions, it does imply that very high confidence in doom by a specific date is not very rational, unless you believe that you have something special that changes this factor.
Link is below:
https://en.m.wikipedia.org/wiki/List_of_dates_predicted_for_apocalyptic_events
The issue is that LWers generally assume that certain things are entirely new every time and that everything is special, and I think this assumption is overused in both LW and the broader world, which probably leads to the problem of overvaluing your own special inside view compared to others outside views.
This is not sound reasoning because of selection bias. If any of those predictions had been correct, you would not be here to see it. Thus, you cannot use their failure as evidence.
I notice I’m a bit confused about that. Let’s say the only thing I know about the sun is “That bright yellow thing that provides heat”, and “The sun is really really old”, so I have no knowledge about how the sun mechanistically does what it does.
I want to know “How likely is the sun to explode in the next hour” because I’ve got a meeting to go to and it sure would be inconvenient for the sun to explode before I got there. My reasoning is “Well, the sun hasn’t exploded for billions of years, so it’s not about to explode in the next hour, with very high probability.”
Is this reasoning wrong? If so, what should my probability be? And how do I differentiate between “The sun will explode in the next hour” and “The sun will explode in the next year”?
Yes, IMO the reasoning is wrong: if you you definitely cannot survive an event, then observing that the event did not happened is not evidence at all that it will not explode in the future—and it continues to not be evidence as long as you continue to observe the non-explosion.
Since we can survive at least for a little while the sudden complete darkening of the sun the sun’s not having gone dark is evidence that it will not go dark in the future, but it is less strong evidence than it would be if we could survive the darkening of the sun indefinitely.
The law of the conservation of expected evidence requires us to take selection effects like those into account—and the law is a simple consequence of the axioms of probability, so to cast doubt on it is casting doubt on the validity of the whole idea of probability (in which case, Cox’s theorems would like to have a word with you).
This is not settled science: there is not widespread agreement among scholars or on this site on this point, but its counter-intuitiveness is not by itself a strong reason to disbelieve it because there are parts of settled science that are as counterintuitive as this is: for example, the twin paradox of special relativity and “particle identity in quantum physics”.
When you believe that the probability of a revolution in the US is low because the US government is 230 or so years old and hasn’t had a revolution yet, you are doing statistical reasoning. In contrast, noticing that if the sun exploded violently enough, we would immediately all die and consequently we would not be having this conversation—that is causal reasoning. Judea Pearl makes this distinction in the intro to his book Causality. Taking into account selection effects is using causal reasoning (your knowledge of the causal structure of reality) to modify a conclusion of statistical reasoning. You can still become confident that the sun will explode soon if you have a refined-enough causal model of the sun.
Off topic, but I’d just like to say this “good/bad comment” vs “I agree/disagree” voting distinction is amazing.
It allows us to separate our feeling on the content of the comment from our feeling on the appropriateness of the comment in the discussion. We can vote to disagree with a post without insulting the user for posting it. On reddit, this is sorely lacking, and it’s one (of many) reasons every sub is an unproductive circle jerk.
I upvoted both of your comments, while also voting to disagree. Thanks for posting them. What a great innovation to stimulate discussion.
So, I notice that still doesn’t answer the actual question of what my probability should actually be. To make things simple, let’s assume that, if the sun exploded, I would die instantly. In practice it would have to take at least eight minutes, but as a simplifying assumption, let’s assume it’s instantaneous.
In the absence of relevant evidence, it seems to me like Laplace’s Law of Succession would say the odds of the sun exploding in the next hour is 1⁄2. But I could also make that argument to say the odds of the sun exploding in the next year is also 1⁄2, which is nonsensical. So...what’s my actual probability, here, if I know nothing about how the sun works except that it has not yet exploded, the sun is very old (which shouldn’t matter, if I understand you correctly) and that if it exploded, we would all die?
We don’t need to consider that here because any evidence of the explosion would also take at least eight minutes to arrive, so there is approximately zero minutes during which you are able to observe the evidence of the explosion before you are converted into a plasma that has no ability to update on anything. That is when observational selection effects are at their strongest: namely, when you are vanishingly unlikely to be in one of those intervals between your having observed an event and that event’s destroying your ability to maintain any kind of mental model of reality.
We 21st-century types have so much causal information about reality that I have been unable during this reply to imagine any circumstance where I would resort to Laplace’s law of succession to estimate any probability in anger where observational selection effects also need to be considered. It’s not that I doubt the validity of the law; its just that I have been unable to imagine a situation in which the causal information I have about an “event” does not trump the statistical information I have about how many times the event has been observed to occur in the past and I also have enough causal information to entertain real doubts about my ability to survive if the event goes the wrong way while remaining confident in my survival if the event goes the right way.
Certainly we can imagine ourselves in the situation of the physicists of the 1800s who had no solid guess as to the energy source keeping the sun shining steadily. But even they had the analogy with fire. (The emissions spectra of the sun and of fire are both I believe well approximated as blackbody radiation and the 1800s had prisms and consequently at least primitive spectrographs.) A fire doesn’t explode unless you suddenly give it fuel—and not any fuel will do: adding logs to a fire will not cause an explosion, but adding enough gasoline will. “Where would the fuel come from that would cause the sun to explode?” the 1800s can ask. Planets are made mostly of rocks, which don’t burn, and comets aren’t big enough. Merely what I have written in this short paragraph would be enough to trump IMO statistical considerations of how many days the sun has gone without exploding.
If I found myself in a star-trek episode in which every night during sleep I find myself transported into some bizarre realm of “almost-pure sensation” where none of my knowledge of reality seems to apply and where a sun-like thing rises and sets, then yeah, I can imagine using the law of succession, but then for observational selection effects to enter the calculation, I’d have to have enough causal information about this sun-like thing (and about my relationship to the bizarre realm) to doubt my ability to survive if it sets and never rises again, but that seems to contradict the assumption that none of my knowledge of reality applies to the bizarre realm.
My probability of the sun’s continuing to set and rise without exploding is determined exclusively by (causal) knowledge created by physicists and passed down to me in books, etc: how many times the sun has risen so far is in comparison of negligible importance. This knowledge is solid and “settled” enough that it is extremely unlikely that any sane physicist would announce that, well, actually, the sun is going to explode—probably within our lifetimes! But if a sane physicist did make such an announcement, I would focus on the physicist’s argument (causal knowledge) and pay almost no attention to the statistical information of how long there have been reliable observations of the sun’s not exploding—and this is true even if I were sure I could survive if the sun exploded—because the causal model is so solid (and the facts the model depends on, e.g., the absorption spectra of hydrogen and helium, are so easily checked). Consequently, the explosion of the sun is not a good example of where observational selection effects become important.
By the way, observational selection effects are hairy enough that I basically cannot calculate anything about them. Suppose for example that if Russia attacked the US with nukes, I would survive with p = .4 (which seems about right). (I live in the US.) Suppose further that my causal model of Russian politics makes my probability that Russia will attack the US with nukes some time in the next 365 days as .003 if Russia had deployed nukes for the first time today (i.e., if Russia didn’t have any nukes till right now). How should I adjust my probability (i.e., the .003) to take into account that fact that Russia’s nukes were in fact deployed starting in 1953 (year?) and so far Russia has never attacked the US with nukes? I don’t know! (And I have practical reasons for wanting to do this particular calculation, so I’ve thought a lot about it over the years. I do know that my probability should be greater than it should be if I and my ability to reason were impervious to nuclear attacks. In contrast to the solar-explosion situation, here is a situation in which the causal knowledge is uncertain enough that it would be genuinely useful to employ the statistical knowledge we have; it is just that I don’t know how to employ it in a calculation.) But things that are almost certain to end my life are much easier to reason about—when it comes to observational selection effects—than something that has a .4 chance of ending my life.
In particular, most of the expected negative utility from AGI research stems from scenarios in which without warning—more precisely, without anything that the average person would recognize as a warning—an AGI kills every one of us. The observational selection effects around such a happening are easier to reason about than those around a nuclear attack: specifically, the fact that the predicted event hasn’t happened yet is not evidence at all that it will not happen in the future. If a powerful magician kills everyone who tries to bring you the news that the Red Socks have won the World Series of Baseball, and if that magician is extremely effective at his task, then your having observed that the Yankees win the World Series every time it occurs (which is strangely not every year, but some years have no World Series as far as you have heard) is not evidence at all about how often the Red Socks have won the World Series.
And the fact that Eliezer has been saying for at least a few months now that AGI could kill us all any day now—that the probability that it will happen 15 years from now is greater than that probability that it will happen today, but the probability it will happen today is nothing to scoff at—is is very weak evidence against what he’s been saying if it is evidence against it at all. A sufficiently rational person will assign what he has been saying the same or very nearly the same probability he would have if Eliezer had started saying it today. In both cases, a sufficiently rational person will focus almost entirely on Eliezer’s argument (complicated though it is) and counterarguments and will give almost no weight to how long Eliezer’s been saying it or how long AGIs have been in existence. Or more precisely, that is what a sufficiently rational person would do if he or she believed that he or she is unlikely to receive any advance warning of a deadly strike by the AGI beyond the warnings given so far by Eliezer and other AGI pessimists.
Eliezer’s argument is more complicated than the reasoning that tells us that the sun will not explode any time soon. More complicated means more likely to contain a subtle flaw. Moreover, it has been reviewed by fewer experts than the solar argument. Consequently, here is a situation in which it would be genuinely useful to use statistical information (e.g., the fact that research labs have been running AGIs for years (ChatGPT is an AGI for example) combined with the fact that we are still alive) but the statistical information is in fact IMO useless because of the extremely strong observational selection effects.
Counterpoint:
I’m at a local convenient store. A thief routinely robs me. He points a gun at me, threatens me, but never shoots, even when I push back a little. At this point, it’s kind of like we both know what’s happening, even though, technically, there’s a chance of physical danger.
Had this guy shot me, I wouldn’t be alive to reason about his next visit.
Now consider a different thief comes in, also armed. What is my probability of getting shot, as compared with the first thief?
Much, much, higher with the second thief. My past experiences with the first thief act as evidence towards the update that I’m less likely to be shot. With this new thief, I don’t have that evidence, so my probability of being shot is just the based rate based on my read of the situation.
I believe updating on the non-fatal encounters with the first thief is correct, and it seems to me analogous to updating on the sun not having exploded. Thoughts?
Because a person has a significant chance of surviving a bullet wound—or more relevantly, of surviving an assault with a gun—your not having been assaulted by the first thief is evidence that you will not be assaulted in future encounters with him, but it is weaker evidence than it would be if you could be certain of your ability to survive (and your ability to retain your rationality skills and memories after) every encounter with him.
Humans are very good at reading the “motivational states” of the other people in the room with them. If for example the thief’s eyes are glassy and he looks like he is staring at something far away even though you know it is unlikely there there is anything of interest in his visual field far away, well that is a sign he is in a dissociated state, which makes it more likely he’ll do something unpredictable and maybe violent. If when he looks at you he seems to look right through you, that is a sign of a coldness that also makes it more likely he will be violent if he can thereby benefit himself personally by doing so. So, what is actually doing most of the work of lowering your probability about the danger to you posed the the first thief? The mere fact that you escaped all the previous encounters without having been assaulted or your observations of his body language, tone of voice and other details that give clues about his personality and his mental state?
Replace thief with a black box that either explodes and kills you, or doesn’t. It has some chance to kill you, but you don’t know what that chance is.
I was put in a room with black-box-one 5 times. Each time it didn’t explode.
Now, I have a choice: I can go back in the room with black-box-one, or I can go to a room with black-box-two.
I’ll take black-box-one, based on prior evidence.
If I know nothing about the boxes except that they have the same a priori probability of exploding and killing me, then I am indifferent between the two black boxes.
It is not terribly difficult to craft counter-intuitive examples of the principle. I anticipated I would be presented with such examples (because this is not my first time discussing this topic), which is why in my original comment I wrote, “its counter-intuitiveness is not by itself a strong reason to disbelieve it,” and the rest of that paragraph.
Okay but I just don’t agree.
Let each black box have some probability to kill you, uniformly chosen from a set of possible probabilities. Let’s start with a simple one: that probability is 0 or 1.
The a prior chance to kill you is .5.
After the box doesn’t kill you, you update, and now the chance is 0.
What about if we use a uniform distribution from [0,1)? Some boxes are .3 to kill you, others .78.
Far more of the experiences of not dying are from the low p-kill boxes than from the high p-kill ones. When people select the same box, instead of a new one, after not being killed, that brings the average kill rate of selected boxes down. Run this experiment for long enough, and the only boxes still being selected are the extremely low p-kill boxes that haven’t killed all their subjects yet.
This time, could you make a stronger objection, that’s more directly addressed at my counter-example?
In your new scenario, if I understand correctly, you have postulated that one box always explodes and one never explodes; I must undergo 2 experiences: the first experience is with one of the boxes, picked at random; then I get to choose whether my second experience is with the same box or whether it is with the other box. But I don’t need to know the outcome of the first experience to know that I want to limit my exposure to just one of these dangerous boxes: I will always choose to undergo the second experience with the same box as I underwent the first one with. Note that I arrived at this choice without doing the thing that I have been warning people not to do, namely, to update on observation X when I know it would have been impossible for me to survive (or more precisely for my rationality, my ability to have and to refine a model of reality, to survive) the observation not X.
That takes care of the first of your two new scenarios. In your second new scenario, I have a .5 chance of dying during my first experience. Then I may choose whether my second experience is with the same box or a new one. Before I make my choice, I would dearly love to experiment with either box in a setting in which I could survive the box’s exploding. But by your postulate as I understand it, that is not possible, so I am indifferent about which box I have my second experience with: either way I choose, my probability that I will die during the second experience is .5.
Note the in your previous comment, in which there was some P such each time a box is used, it has a probability P of exploding, there is no benefit to my being able to experiment with a box in a setting in which I could survive an explosion, but in the scenario we are considering now there is a huge benefit.
Suppose my best friend is observing the scenario from a safe distance: he can see what is happening, but is protected from any exploding box. My surviving the first experience changes his probability that the box used in the first experience will explode the next time it is used from .5 to .333. Actually, I am not sure of that number (because I am not sure the law of succession applies here—it has been a long time since I read my E.T. Jaynes) but I am sure that his probability changes from .5 to something less than .5. And my best friend can communicate that fact to me: “Richard,” he can say, “stick with the same box used in your first experience.” But his message has the same defect that my directly observing the behavior of the box has: namely, since I cannot survive the outcome that would have led him to increase his probability that the box will explode the next time it is used, I cannot update on the fact that his probability has decreased.
Students of E.T. Jaynes know that observer A’s probability of hypothesis H can differ from observer B’s probability: this happens when A has seen evidence for or against H that B has not seen yet. Well, here we have a case where A’s probability can differ from B’s even though A and B have seen the same sequence of evidence about H: namely, that happens when one of the observers could not have survived having observed a sequence of events (different from the sequence that actually happened) that the other observer could have survived.
TropicalFruit and I have taken this discussion private (in order to avoid flooding this comment section with discussion on a point only very distantly related to the OP.) However if you have any interest in the discussion, ask one of us for a copy. (We have both agreed to provide a copy to whoever asks.)
I would like a copy of the discussion.
It seems to me there is a distinction to be made: It is one thing to conclude that, 1) Eliezer doesn’t know how to predict the date of AI Doom. That’s different from asserting that 2) AI Doom is not going to happen. 1 is not evidence for 2.
I think it’s appropriate to draw some better lines through concept space for apocalyptic predictions, when determining a base rate, than just “here’s an apocalyptic prediction and a date.” They aren’t all created equal.
Herbert W Armstrong is on this list 4 times… each time with a new incorrect prediction. So you’re counting this guy who took 4 guesses, all wrong, as 4 independent samples on which we should form a base rate.
And by using this guy in the base rate, you’re implying Eliezer’s prediction is in the same general class as Armstrong’s, which is a stretch to say the least.
A pretty simple class distinction is: how accurate are other predictions the person has made? How has Eliezer’s prediction record been? How have his AI timeline predictions been?
I don’t know the answers to these questions, maybe they really have been bad, but I’m assuming they’re pretty good. If that’s the case, then clearly Eliezer’s prediction doesn’t deserve to classified with the predictions listed on that page.