One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.
My very rough model of how their beliefs flow forward is:
Paul
Low initial confidence on truth/coherence of ‘core of generality’
→
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example. Natural selection wasn’t intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won’t be a spectrum of powerful intermediately general AIs from the historical case of evolution—all we know is that it didn’t happen during evolution, and we’ve got good reasons to think it’s a lot more likely to happen for AI. For other reasons (precedents already exist—MuZero is insect-brained but better at chess or go than a chimp, plus that’s the default with technology we’re heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that’s where it will start) - No core of generality
Then, from there:
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the ‘intelligence’ of narrow AI and prediction of continuous takeoff
Eliezer
High initial confidence on truth/coherence of ‘core of generality’
→
Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are don’t matter that much. Rather, we learn a generalizable fact about the overall cognitive landscape from human evolution—that there is a way to reach the core of generality quickly. This doesn’t make it certain that AGI development will go the same way, but it’s fairly strong evidence. The disanalogies between evolution and ML are indeed a slight update in Paul’s direction and suggest that AI could in principle take a smoother route to general intelligence, but we’ve never historically seen this smoother route (and it has to be not just technically ‘smooth’ but sufficiently smooth to give us a full 4-year economic doubling) or these intermediate powerful agents, so this correction is weak compared to the broader knowledge we gain from evolution. In other words, all we know is that there is a fast route to the core of generality but that it’s imaginable that there’s a slow route we’ve not yet seen—Core of generality
Then, from there:
Core of generality and very common presence of huge secrets in relevant tech progress reference class → Qualitative prediction of less common continuous progress on the ‘intelligence’ of narrow AI and prediction of discontinuous takeoff
Eliezer doesn’t have especially divergent views about benchmarks like perplexity because he thinks they’re not informative, but differs from Paul on qualitative predictions of how smoothly various practical capabilities/signs of ‘intelligence’ will emerge—he’s getting his qualitative predictions about this ultimately from interrogating his ‘cognitive landscape’ abstraction, while Paul is getting his from trend extrapolation on measures of practical capabilities and then translating those to qualitative predictions. These are very different origins, but they do eventually give different predictions about the likelihood of the same real-world events.
Since they only reach the point of discussing the same things at a very vague, qualitative level of detail, in order to get to a bet you have to back-track from both of their qualitative predictions of how likely the sudden emergence of various types of narrow intelligent behaviour are, find some clear metric for the narrow intelligent behaviour that we can apply fairly, and then there should be a difference in beliefs about the world before AI takeoff.
Updates on this after reflection and discussion (thanks to Rohin):
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example
Saying Paul’s view is that the cognitive landscape of minds might be simply incoherent isn’t quite right—at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said ‘Paul doesn’t see this strong generality attractor in the cognitive landscape’ but it seems to me that it’s not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.
Also, on Paul’s view, it’s not that evolution is irrelevant as a counterexample. Rather, the specific fact of ‘evolution gave us general intelligence suddenly by evolutionary timescales’ is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class
These two initial claims are connected in a way I didn’t make explicit—No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.
Core of generality and very common presence of huge secrets in relevant tech progress reference class
The first clause already implies the second clause (since “how to get the core of generality” is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.
… Eliezer was saying something like “the fact that humans go around doing something vaguely like weighting outcomes by possibility and also by attractiveness, which they then roughly multiply, is quite sufficient evidence for my purposes, as one who does not pay tribute to the gods of modesty”, while Richard protested something more like “but aren’t you trying to use your concept to carry a whole lot more weight than that amount of evidence supports?”..
And, ofc, at this point, my Eliezer-model is again saying “This is why we should be discussing things concretely! It is quite telling that all the plans we can concretely visualize for saving our skins, are scary-adjacent; and all the non-scary plans, can’t save our skins!”
Nate’s summary brings up two points I more or less ignored in my summary because I wasn’t sure what I thought—one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of “civilization fails to do the x-risk-minimizing thing” if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?
The other is the role that the modesty worldview plays in Eliezer’s objections.
I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of—I thought it was something like “overuse of the outside view, especially in a social cognition context”.
Which of the following is:
a) probably the product of a Modesty world-view?
b) no good reason to think comes from a Modesty world-view but still bad epistemology?
c) good epistemology?
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion, so you’re sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
My Eliezer-model doesn’t categorically object to this. See, e.g., Fake Causality:
[Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may feel like the cause constrains the effect, when it was merely fitted to the effect.
[...] Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday.
Nineteenth century evolutionism made no quantitative predictions. It was not readily subject to falsification. It was largely an explanation of what had already been seen. It lacked an underlying mechanism, as no one then knew about DNA. It even contradicted the nineteenth century laws of physics. Yet natural selection was such an amazingly good post facto explanation that people flocked to it, and they turned out to be right. Science, as a human endeavor, requires advance prediction. Probability theory, as math, does not distinguish between post facto and advance prediction, because probability theory assumes that probability distributions are fixed properties of a hypothesis.
The rule about advance prediction is a rule of the social process of science—a moral custom and not a theorem. The moral custom exists to prevent human beings from making human mistakes that are hard to even describe in the language of probability theory, like tinkering after the fact with what you claim your hypothesis predicts. People concluded that nineteenth century evolutionism was an excellent explanation, even if it was post facto. That reasoning was correct as probability theory, which is why it worked despite all scientific sins. Probability theory is math. The social process of science is a set of legal conventions to keep people from cheating on the math.
Yet it is also true that, compared to a modern-day evolutionary theorist, evolutionary theorists of the late nineteenth and early twentieth century often went sadly astray. Darwin, who was bright enough to invent the theory, got an amazing amount right. But Darwin’s successors, who were only bright enough to accept the theory, misunderstood evolution frequently and seriously. The usual process of science was then required to correct their mistakes.
My Eliezer-model does object to things like ‘since I (from my position as someone who doesn’t understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model’s applicability’. Or ‘since the case for this model’s applicability isn’t iron-clad, you should sprinkle in a lot more expressions of verbal doubt’. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.
Part of his general anti-modesty and pro-Thielian-secrets view is that it’s very possible for other people to know things that justifiably make them much more confident than you are. So if you can’t pass the other person’s ITT / you don’t understand how they’re arriving at their conclusion (and you have no principled reason to think they can’t have a good model here), then you should be a lot more wary of inferring from their confidence that they’re biased.
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
My Eliezer-model thinks it’s possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren’t building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don’t share.)
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory. (He’s also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory’s impressiveness, of a sort that could be egalitarianly accepted even by people who don’t understand probability theory.)
One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.
My very rough model of how their beliefs flow forward is:
Paul
Low initial confidence on truth/coherence of ‘core of generality’
→
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example. Natural selection wasn’t intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won’t be a spectrum of powerful intermediately general AIs from the historical case of evolution—all we know is that it didn’t happen during evolution, and we’ve got good reasons to think it’s a lot more likely to happen for AI. For other reasons (precedents already exist—MuZero is insect-brained but better at chess or go than a chimp, plus that’s the default with technology we’re heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that’s where it will start) - No core of generality
Then, from there:
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the ‘intelligence’ of narrow AI and prediction of continuous takeoff
Eliezer
High initial confidence on truth/coherence of ‘core of generality’
→
Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are don’t matter that much. Rather, we learn a generalizable fact about the overall cognitive landscape from human evolution—that there is a way to reach the core of generality quickly. This doesn’t make it certain that AGI development will go the same way, but it’s fairly strong evidence. The disanalogies between evolution and ML are indeed a slight update in Paul’s direction and suggest that AI could in principle take a smoother route to general intelligence, but we’ve never historically seen this smoother route (and it has to be not just technically ‘smooth’ but sufficiently smooth to give us a full 4-year economic doubling) or these intermediate powerful agents, so this correction is weak compared to the broader knowledge we gain from evolution. In other words, all we know is that there is a fast route to the core of generality but that it’s imaginable that there’s a slow route we’ve not yet seen—Core of generality
Then, from there:
Core of generality and very common presence of huge secrets in relevant tech progress reference class → Qualitative prediction of less common continuous progress on the ‘intelligence’ of narrow AI and prediction of discontinuous takeoff
Eliezer doesn’t have especially divergent views about benchmarks like perplexity because he thinks they’re not informative, but differs from Paul on qualitative predictions of how smoothly various practical capabilities/signs of ‘intelligence’ will emerge—he’s getting his qualitative predictions about this ultimately from interrogating his ‘cognitive landscape’ abstraction, while Paul is getting his from trend extrapolation on measures of practical capabilities and then translating those to qualitative predictions. These are very different origins, but they do eventually give different predictions about the likelihood of the same real-world events.
Since they only reach the point of discussing the same things at a very vague, qualitative level of detail, in order to get to a bet you have to back-track from both of their qualitative predictions of how likely the sudden emergence of various types of narrow intelligent behaviour are, find some clear metric for the narrow intelligent behaviour that we can apply fairly, and then there should be a difference in beliefs about the world before AI takeoff.
Updates on this after reflection and discussion (thanks to Rohin):
Saying Paul’s view is that the cognitive landscape of minds might be simply incoherent isn’t quite right—at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said ‘Paul doesn’t see this strong generality attractor in the cognitive landscape’ but it seems to me that it’s not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.
Also, on Paul’s view, it’s not that evolution is irrelevant as a counterexample. Rather, the specific fact of ‘evolution gave us general intelligence suddenly by evolutionary timescales’ is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.
These two initial claims are connected in a way I didn’t make explicit—No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.
The first clause already implies the second clause (since “how to get the core of generality” is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.
Nate’s Summary
Nate’s summary brings up two points I more or less ignored in my summary because I wasn’t sure what I thought—one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of “civilization fails to do the x-risk-minimizing thing” if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?
The other is the role that the modesty worldview plays in Eliezer’s objections.
I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of—I thought it was something like “overuse of the outside view, especially in a social cognition context”.
Which of the following is:
a) probably the product of a Modesty world-view?
b) no good reason to think comes from a Modesty world-view but still bad epistemology?
c) good epistemology?
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion, so you’re sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
My Eliezer-model doesn’t categorically object to this. See, e.g., Fake Causality:
And A Technical Explanation of Technical Explanation:
My Eliezer-model does object to things like ‘since I (from my position as someone who doesn’t understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model’s applicability’. Or ‘since the case for this model’s applicability isn’t iron-clad, you should sprinkle in a lot more expressions of verbal doubt’. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.
Part of his general anti-modesty and pro-Thielian-secrets view is that it’s very possible for other people to know things that justifiably make them much more confident than you are. So if you can’t pass the other person’s ITT / you don’t understand how they’re arriving at their conclusion (and you have no principled reason to think they can’t have a good model here), then you should be a lot more wary of inferring from their confidence that they’re biased.
My Eliezer-model thinks it’s possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren’t building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don’t share.)
My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory. (He’s also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory’s impressiveness, of a sort that could be egalitarianly accepted even by people who don’t understand probability theory.)