Some thinking-out-loud on how I’d go about looking for testable/bettable prediction differences here...
I think my models overlap mostly with Eliezer’s in the relevant places, so I’ll use my own models as a proxy for his, and think about how to find testable/bettable predictions with Paul (or Ajeya, or someone else in their cluster).
One historical example immediately springs to mind where something-I’d-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve. The original Philips curve was based on just fitting a curve to inflation-vs-unemployment data; Friedman and Phelps both independently came up with theoretical models for that relationship in the late sixties (’67-‘68), and Friedman correctly forecasted that the curve would break down in the next recession (i.e. the “stagflation” of ‘73-’75). This all led up to the Lucas Critique, which I’d consider the canonical case-against-what-I’d-call-Paul-esque-worldviews within economics. The main idea which seems transportable to other contexts is that surface relations (like the Philips curve) break down under distribution shifts in the underlying factors.
So, how would I look for something analogous to that situation in today’s AI? We need something with an established trend, but where a distribution shift happens in some underlying factor. One possible place to look: I’ve heard that OpenAI plans to make the next generation of GPT not actually much bigger than the previous generation; they’re trying to achieve improvement through strategies other than Stack More Layers. Assuming that’s true, it seems like a naive Paul-esque model would predict that the next GPT is relatively unimpressive compared to e.g. the GPT2 → GPT 3 delta? Whereas my models (or I’d guess Eliezer’s models) would predict that it’s relatively more impressive, compared to the expectations of Paul-esque models (derived by e.g. extrapolating previous performance as a function of model size and then plugging in actual size of the next GPT)? I wouldn’t expect either view to make crisp high-certainty predictions here, but enough to get decent Bayesian evidence.
Other than distribution shifts, the other major place I’d look for different predictions is in the extent to which aggregates tell us useful things. The post got into that in a little detail, but I think there’s probably still room there. For instance, I recently sat down and played with some toy examples of GDP growth induced by tech shifts, and I was surprised by how smooth GDP was even in scenarios with tech shifts which seemed very impactful to me. I expect that Paul would be even more surprised by this if he were to do the same exercise. In particular, this quote seems relevant:
the point is that housing and healthcare are not central examples of things that scale up at the beginning of explosive growth, regardless of whether it’s hard or soft
It is surprisingly difficult to come up with a scenario where GDP growth looks smooth AND housing+healthcare don’t grow much AND GDP growth accelerates to a rate much faster than now. If everything except housing and healthcare are getting cheaper, then housing and healthcare will likely play a much larger role in GDP (and together they’re 30-35% already), eventually dominating GDP. This isn’t a logical necessity; in principle we could consume so much more of everything else that the housing+healthcare share shrinks, but I think that would probably diverge from past trends (though I have not checked). What I actually expect is that as people get richer, they spend a larger fraction on things which have a high capacity to absorb marginal income, of which housing and healthcare are central examples.
If housing and healthcare aren’t getting cheaper, and we’re not spending a smaller fraction of income on them (by buying way way more of the things which are getting cheaper), then that puts a pretty stiff cap on how much GDP can grow.
Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps. I think Paul would probably update toward that view if he spent some time playing around with examples (similar to this post).
Similarly, I’ve heard that during training of GPT-3, while aggregate performance improves smoothly, performance on any particular task (like e.g. addition) is usually pretty binary—i.e. performance on any particular task tends to jump quickly from near-zero to near-maximum-level. Assuming this is true, presumably Paul already knows about it, and would argue that what matters-for-impact is ability at lots of different tasks rather than one (or a few) particular tasks/kinds-of-tasks? If so, that opens up a different line of debate, about the extent to which individual humans’ success today hinges on lots of different skills vs a few, and in which areas.
The “continuous view” as I understand it doesn’t predict that all straight lines always stay straight. My version of it (which may or may not be Paul’s version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).
In its application to AI, this is combined with a prediction that people will in fact be putting in lots of effort into making AI systems intelligent / powerful / able to automate AI R&D / etc, before AI has reached a point where it can execute a pivotal act. This second prediction comes for totally different reasons, like “look at what AI researchers are already trying to do” combined with “it doesn’t seem like AI is anywhere near the point of executing a pivotal act yet”.
(I think on Paul’s view the second prediction is also bolstered by observing that most industries / things that had big economic impacts also seemed to have crappier predecessors. This feels intuitive to me but is not something I’ve checked and so isn’t my personal main reason for believing the second prediction.)
One historical example immediately springs to mind where something-I’d-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve.
I’m not very familiar with this (I’ve only seen your discussion and the discussion in IEM) but it does not seem like the sort of thing where the argument I laid out above would have had a strong opinion. Was the y-axis of the straight line graph a metric that people were trying to optimize? If so, did the change in policy not represent a change in the amount of effort put into optimizing the metric? (I haven’t looked at the details here, maybe the answer is yes to both, in which case I would be interested in looking at the details.)
Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps.
This seems plausible but it also seems like you can apply the above argument to a bunch of other topics besides GDP, like the ones listed in this comment, so it still seems like you should be able to exhibit a failure of the argument on those topics.
My version of it (which may or may not be Paul’s version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).
This is super helpful, thanks. Good explanation.
With this formulation of the “continuous view”, I can immediately think of places where I’d bet against it. The first which springs to mind is aging: I’d bet that we’ll see a discontinuous jump in achievable lifespan of mice. The gears here are nicely analogous to AGI too: I expect that there’s a “common core” (or shared cause) underlying all the major diseases of aging, and fixing that core issue will fix all of them at once, in much the same way that figuring out the “core” of intelligence will lead to a big discontinuous jump in AI capabilities. I can also point to current empirical evidence for the existence of a common core in aging, which might suggest analogous types of evidence to look at in the intelligence context.
Thinking about other analogous places… presumably we saw a discontinuous jump in flight range when Sputnik entered orbit. That one seems extremely closely analogous to AGI. There it’s less about the “common core” thing, and more about crossing some critical threshold. Nuclear weapons and superconductors both stand out a-priori as places where we’d expect a critical-threshold-related discontinuity, though I don’t think people were optimizing hard enough in superconductor-esque directions for the continuous view to make a strong prediction there (at least for the original discovery of superconductors).
I agree that when you know about a critical threshold, as with nukes or orbits, you can and should predict a discontinuity there. (Sufficient specific knowledge is always going to allow you to outperform a general heuristic.) I think that (a) such thresholds are rare in general and (b) in AI in particular there is no such threshold. (According to me (b) seems like the biggest difference between Eliezer and Paul.)
Some thoughts on aging:
It does in fact seem surprising, given the complexity of biology relative to physics, if there is a single core cause and core solution that leads to a discontinuity.
I would a priori guess that there won’t be a core solution. (A core cause seems more plausible, and I’ll roll with it for now.) Instead, we see a sequence of solutions that intervene on the core problem in different ways, each of which leads to some improvement on lifespan, and discovering these at different times leads to a smoother graph.
That being said, are people putting in a lot of effort into solving aging in mice? Everyone seems to constantly be saying that we’re putting in almost no effort whatsoever. If that’s true then a jumpy graph would be much less surprising.
As a more specific scenario, it seems possible that the graph of mouse lifespan over time looks basically flat, because we were making no progress due to putting in ~no effort. I could totally believe in this world that someone puts in some effort and we get a discontinuity, or even that the near-zero effort we’re putting in finds some intervention this year (but not in previous years) which then looks like a discontinuity.
If we had a good operationalization, and people are in fact putting in a lot of effort now, I could imagine putting my $100 to your $300 on this (not going beyond 1:3 odds simply because you know way more about aging than I do).
I’m not particularly enthusiastic about betting at 75%, that seems like it’s already in the right ballpark for where the probability should be. So I guess we’ve successfully Aumann agreed on that particular prediction.
presumably we saw a discontinuous jump in flight range when Sputnik entered orbit.
While I think orbit is the right sort of discontinuity for this, I think you need to specify ‘flight range’ in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke.
[But in terms of something like “altitude achieved”, I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]
My understanding is that Sputnik was a big discontinuous jump in “distance which a payload (i.e. nuclear bomb) can be delivered” (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.
So it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I’m not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.)
[And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France’s distance instead, then the same strategic considerations would have been hit much sooner.]
One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.
My very rough model of how their beliefs flow forward is:
Paul
Low initial confidence on truth/coherence of ‘core of generality’
→
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example. Natural selection wasn’t intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won’t be a spectrum of powerful intermediately general AIs from the historical case of evolution—all we know is that it didn’t happen during evolution, and we’ve got good reasons to think it’s a lot more likely to happen for AI. For other reasons (precedents already exist—MuZero is insect-brained but better at chess or go than a chimp, plus that’s the default with technology we’re heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that’s where it will start) - No core of generality
Then, from there:
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the ‘intelligence’ of narrow AI and prediction of continuous takeoff
Eliezer
High initial confidence on truth/coherence of ‘core of generality’
→
Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are don’t matter that much. Rather, we learn a generalizable fact about the overall cognitive landscape from human evolution—that there is a way to reach the core of generality quickly. This doesn’t make it certain that AGI development will go the same way, but it’s fairly strong evidence. The disanalogies between evolution and ML are indeed a slight update in Paul’s direction and suggest that AI could in principle take a smoother route to general intelligence, but we’ve never historically seen this smoother route (and it has to be not just technically ‘smooth’ but sufficiently smooth to give us a full 4-year economic doubling) or these intermediate powerful agents, so this correction is weak compared to the broader knowledge we gain from evolution. In other words, all we know is that there is a fast route to the core of generality but that it’s imaginable that there’s a slow route we’ve not yet seen—Core of generality
Then, from there:
Core of generality and very common presence of huge secrets in relevant tech progress reference class → Qualitative prediction of less common continuous progress on the ‘intelligence’ of narrow AI and prediction of discontinuous takeoff
Eliezer doesn’t have especially divergent views about benchmarks like perplexity because he thinks they’re not informative, but differs from Paul on qualitative predictions of how smoothly various practical capabilities/signs of ‘intelligence’ will emerge—he’s getting his qualitative predictions about this ultimately from interrogating his ‘cognitive landscape’ abstraction, while Paul is getting his from trend extrapolation on measures of practical capabilities and then translating those to qualitative predictions. These are very different origins, but they do eventually give different predictions about the likelihood of the same real-world events.
Since they only reach the point of discussing the same things at a very vague, qualitative level of detail, in order to get to a bet you have to back-track from both of their qualitative predictions of how likely the sudden emergence of various types of narrow intelligent behaviour are, find some clear metric for the narrow intelligent behaviour that we can apply fairly, and then there should be a difference in beliefs about the world before AI takeoff.
Updates on this after reflection and discussion (thanks to Rohin):
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example
Saying Paul’s view is that the cognitive landscape of minds might be simply incoherent isn’t quite right—at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said ‘Paul doesn’t see this strong generality attractor in the cognitive landscape’ but it seems to me that it’s not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.
Also, on Paul’s view, it’s not that evolution is irrelevant as a counterexample. Rather, the specific fact of ‘evolution gave us general intelligence suddenly by evolutionary timescales’ is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class
These two initial claims are connected in a way I didn’t make explicit—No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.
Core of generality and very common presence of huge secrets in relevant tech progress reference class
The first clause already implies the second clause (since “how to get the core of generality” is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.
… Eliezer was saying something like “the fact that humans go around doing something vaguely like weighting outcomes by possibility and also by attractiveness, which they then roughly multiply, is quite sufficient evidence for my purposes, as one who does not pay tribute to the gods of modesty”, while Richard protested something more like “but aren’t you trying to use your concept to carry a whole lot more weight than that amount of evidence supports?”..
And, ofc, at this point, my Eliezer-model is again saying “This is why we should be discussing things concretely! It is quite telling that all the plans we can concretely visualize for saving our skins, are scary-adjacent; and all the non-scary plans, can’t save our skins!”
Nate’s summary brings up two points I more or less ignored in my summary because I wasn’t sure what I thought—one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of “civilization fails to do the x-risk-minimizing thing” if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?
The other is the role that the modesty worldview plays in Eliezer’s objections.
I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of—I thought it was something like “overuse of the outside view, especially in a social cognition context”.
Which of the following is:
a) probably the product of a Modesty world-view?
b) no good reason to think comes from a Modesty world-view but still bad epistemology?
c) good epistemology?
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion, so you’re sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
My Eliezer-model doesn’t categorically object to this. See, e.g., Fake Causality:
[Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may feel like the cause constrains the effect, when it was merely fitted to the effect.
[...] Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday.
Nineteenth century evolutionism made no quantitative predictions. It was not readily subject to falsification. It was largely an explanation of what had already been seen. It lacked an underlying mechanism, as no one then knew about DNA. It even contradicted the nineteenth century laws of physics. Yet natural selection was such an amazingly good post facto explanation that people flocked to it, and they turned out to be right. Science, as a human endeavor, requires advance prediction. Probability theory, as math, does not distinguish between post facto and advance prediction, because probability theory assumes that probability distributions are fixed properties of a hypothesis.
The rule about advance prediction is a rule of the social process of science—a moral custom and not a theorem. The moral custom exists to prevent human beings from making human mistakes that are hard to even describe in the language of probability theory, like tinkering after the fact with what you claim your hypothesis predicts. People concluded that nineteenth century evolutionism was an excellent explanation, even if it was post facto. That reasoning was correct as probability theory, which is why it worked despite all scientific sins. Probability theory is math. The social process of science is a set of legal conventions to keep people from cheating on the math.
Yet it is also true that, compared to a modern-day evolutionary theorist, evolutionary theorists of the late nineteenth and early twentieth century often went sadly astray. Darwin, who was bright enough to invent the theory, got an amazing amount right. But Darwin’s successors, who were only bright enough to accept the theory, misunderstood evolution frequently and seriously. The usual process of science was then required to correct their mistakes.
My Eliezer-model does object to things like ‘since I (from my position as someone who doesn’t understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model’s applicability’. Or ‘since the case for this model’s applicability isn’t iron-clad, you should sprinkle in a lot more expressions of verbal doubt’. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.
Part of his general anti-modesty and pro-Thielian-secrets view is that it’s very possible for other people to know things that justifiably make them much more confident than you are. So if you can’t pass the other person’s ITT / you don’t understand how they’re arriving at their conclusion (and you have no principled reason to think they can’t have a good model here), then you should be a lot more wary of inferring from their confidence that they’re biased.
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
My Eliezer-model thinks it’s possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren’t building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don’t share.)
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory. (He’s also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory’s impressiveness, of a sort that could be egalitarianly accepted even by people who don’t understand probability theory.)
I don’t necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they’re not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they’re not doing that, an obvious guess is that it’s because they’re not getting a big win from that. As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won’t work, but maybe they’ve got enough people trying things to find some? But it’s hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.
While GPT-4 wouldn’t be a lot bigger than GPT-3, Sam Altman did indicate that it’d use a lot more compute. That’s consistent with Stack More Layers still working; they might just have found an even better use for compute.
(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)
I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.
Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly
Say more about the source for this claim? I’m pretty sure he didn’t say that during the Q&A I’m sourcing my info from. And my impression is that they’re doing something more than this, both on priors (scaling laws says that optimal compute usage means you shouldn’t train to convergence — why would they start now?) and based on what he said during that Q&A.
GPT-3 not being trained on even one pass of its training dataset
“Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)
Some experience with undertrained enormous language models underperforming relative to expectation
This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.
To be clear: Do you remember Sam Altman saying that “they’re simply training a GPT-3-variant for significantly longer”, or is that an inference from ~”it will use a lot more compute” and ~”it will not be much bigger”?
Because if you remember him saying that, then that contradicts my memory (and, uh, the notes that people took that I remember reading), and I’m confused.
While if it’s an inference: sure, that’s a non-crazy guess, and I take your point that smaller models are easier to deploy. I just want it to be flagged as a claimed deduction, not as a remembered statement.
(And I maintain my impression that something more is going on; especially since I remember Sam generally talking about how models might use more test-time compute in the future, and be able to think for longer on harder questions.)
One way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota).
From the minimizing-X-risk perspective, it might be the worst possible way to train AIs.
As Jeff Clune (Uber AI) put it:
[O]ne can imagine that some ways of configuring AI-GAs (i.e. ways of incentivizing progress) that would make AI-GAs more likely to succeed in producing general AI also make their value systems more dangerous. For example, some researchers might try to replicate a basic principle of Darwinian evolution: that it is ‘red in tooth and claw.’
If a researcher tried to catalyze the creation of an AI-GA by creating conditions similar to those on Earth, the results might be similar. We might thus produce an AI with human vices, such as violence, hatred, jealousy, deception, cunning, or worse, simply because those attributes make an AI more likely to survive and succeed in a particular type of competitive simulated world. Note that one might create such an unsavory AI unintentionally by not realizing that the incentive structure they defined encourages such behavior.
Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.
Some thinking-out-loud on how I’d go about looking for testable/bettable prediction differences here...
I think my models overlap mostly with Eliezer’s in the relevant places, so I’ll use my own models as a proxy for his, and think about how to find testable/bettable predictions with Paul (or Ajeya, or someone else in their cluster).
One historical example immediately springs to mind where something-I’d-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve. The original Philips curve was based on just fitting a curve to inflation-vs-unemployment data; Friedman and Phelps both independently came up with theoretical models for that relationship in the late sixties (’67-‘68), and Friedman correctly forecasted that the curve would break down in the next recession (i.e. the “stagflation” of ‘73-’75). This all led up to the Lucas Critique, which I’d consider the canonical case-against-what-I’d-call-Paul-esque-worldviews within economics. The main idea which seems transportable to other contexts is that surface relations (like the Philips curve) break down under distribution shifts in the underlying factors.
So, how would I look for something analogous to that situation in today’s AI? We need something with an established trend, but where a distribution shift happens in some underlying factor. One possible place to look: I’ve heard that OpenAI plans to make the next generation of GPT not actually much bigger than the previous generation; they’re trying to achieve improvement through strategies other than Stack More Layers. Assuming that’s true, it seems like a naive Paul-esque model would predict that the next GPT is relatively unimpressive compared to e.g. the GPT2 → GPT 3 delta? Whereas my models (or I’d guess Eliezer’s models) would predict that it’s relatively more impressive, compared to the expectations of Paul-esque models (derived by e.g. extrapolating previous performance as a function of model size and then plugging in actual size of the next GPT)? I wouldn’t expect either view to make crisp high-certainty predictions here, but enough to get decent Bayesian evidence.
Other than distribution shifts, the other major place I’d look for different predictions is in the extent to which aggregates tell us useful things. The post got into that in a little detail, but I think there’s probably still room there. For instance, I recently sat down and played with some toy examples of GDP growth induced by tech shifts, and I was surprised by how smooth GDP was even in scenarios with tech shifts which seemed very impactful to me. I expect that Paul would be even more surprised by this if he were to do the same exercise. In particular, this quote seems relevant:
It is surprisingly difficult to come up with a scenario where GDP growth looks smooth AND housing+healthcare don’t grow much AND GDP growth accelerates to a rate much faster than now. If everything except housing and healthcare are getting cheaper, then housing and healthcare will likely play a much larger role in GDP (and together they’re 30-35% already), eventually dominating GDP. This isn’t a logical necessity; in principle we could consume so much more of everything else that the housing+healthcare share shrinks, but I think that would probably diverge from past trends (though I have not checked). What I actually expect is that as people get richer, they spend a larger fraction on things which have a high capacity to absorb marginal income, of which housing and healthcare are central examples.
If housing and healthcare aren’t getting cheaper, and we’re not spending a smaller fraction of income on them (by buying way way more of the things which are getting cheaper), then that puts a pretty stiff cap on how much GDP can grow.
Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps. I think Paul would probably update toward that view if he spent some time playing around with examples (similar to this post).
Similarly, I’ve heard that during training of GPT-3, while aggregate performance improves smoothly, performance on any particular task (like e.g. addition) is usually pretty binary—i.e. performance on any particular task tends to jump quickly from near-zero to near-maximum-level. Assuming this is true, presumably Paul already knows about it, and would argue that what matters-for-impact is ability at lots of different tasks rather than one (or a few) particular tasks/kinds-of-tasks? If so, that opens up a different line of debate, about the extent to which individual humans’ success today hinges on lots of different skills vs a few, and in which areas.
The “continuous view” as I understand it doesn’t predict that all straight lines always stay straight. My version of it (which may or may not be Paul’s version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).
In its application to AI, this is combined with a prediction that people will in fact be putting in lots of effort into making AI systems intelligent / powerful / able to automate AI R&D / etc, before AI has reached a point where it can execute a pivotal act. This second prediction comes for totally different reasons, like “look at what AI researchers are already trying to do” combined with “it doesn’t seem like AI is anywhere near the point of executing a pivotal act yet”.
(I think on Paul’s view the second prediction is also bolstered by observing that most industries / things that had big economic impacts also seemed to have crappier predecessors. This feels intuitive to me but is not something I’ve checked and so isn’t my personal main reason for believing the second prediction.)
I’m not very familiar with this (I’ve only seen your discussion and the discussion in IEM) but it does not seem like the sort of thing where the argument I laid out above would have had a strong opinion. Was the y-axis of the straight line graph a metric that people were trying to optimize? If so, did the change in policy not represent a change in the amount of effort put into optimizing the metric? (I haven’t looked at the details here, maybe the answer is yes to both, in which case I would be interested in looking at the details.)
This seems plausible but it also seems like you can apply the above argument to a bunch of other topics besides GDP, like the ones listed in this comment, so it still seems like you should be able to exhibit a failure of the argument on those topics.
This is super helpful, thanks. Good explanation.
With this formulation of the “continuous view”, I can immediately think of places where I’d bet against it. The first which springs to mind is aging: I’d bet that we’ll see a discontinuous jump in achievable lifespan of mice. The gears here are nicely analogous to AGI too: I expect that there’s a “common core” (or shared cause) underlying all the major diseases of aging, and fixing that core issue will fix all of them at once, in much the same way that figuring out the “core” of intelligence will lead to a big discontinuous jump in AI capabilities. I can also point to current empirical evidence for the existence of a common core in aging, which might suggest analogous types of evidence to look at in the intelligence context.
Thinking about other analogous places… presumably we saw a discontinuous jump in flight range when Sputnik entered orbit. That one seems extremely closely analogous to AGI. There it’s less about the “common core” thing, and more about crossing some critical threshold. Nuclear weapons and superconductors both stand out a-priori as places where we’d expect a critical-threshold-related discontinuity, though I don’t think people were optimizing hard enough in superconductor-esque directions for the continuous view to make a strong prediction there (at least for the original discovery of superconductors).
I agree that when you know about a critical threshold, as with nukes or orbits, you can and should predict a discontinuity there. (Sufficient specific knowledge is always going to allow you to outperform a general heuristic.) I think that (a) such thresholds are rare in general and (b) in AI in particular there is no such threshold. (According to me (b) seems like the biggest difference between Eliezer and Paul.)
Some thoughts on aging:
It does in fact seem surprising, given the complexity of biology relative to physics, if there is a single core cause and core solution that leads to a discontinuity.
I would a priori guess that there won’t be a core solution. (A core cause seems more plausible, and I’ll roll with it for now.) Instead, we see a sequence of solutions that intervene on the core problem in different ways, each of which leads to some improvement on lifespan, and discovering these at different times leads to a smoother graph.
That being said, are people putting in a lot of effort into solving aging in mice? Everyone seems to constantly be saying that we’re putting in almost no effort whatsoever. If that’s true then a jumpy graph would be much less surprising.
As a more specific scenario, it seems possible that the graph of mouse lifespan over time looks basically flat, because we were making no progress due to putting in ~no effort. I could totally believe in this world that someone puts in some effort and we get a discontinuity, or even that the near-zero effort we’re putting in finds some intervention this year (but not in previous years) which then looks like a discontinuity.
If we had a good operationalization, and people are in fact putting in a lot of effort now, I could imagine putting my $100 to your $300 on this (not going beyond 1:3 odds simply because you know way more about aging than I do).
I’m not particularly enthusiastic about betting at 75%, that seems like it’s already in the right ballpark for where the probability should be. So I guess we’ve successfully Aumann agreed on that particular prediction.
While I think orbit is the right sort of discontinuity for this, I think you need to specify ‘flight range’ in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke.
[But in terms of something like “altitude achieved”, I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]
My understanding is that Sputnik was a big discontinuous jump in “distance which a payload (i.e. nuclear bomb) can be delivered” (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.
So it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I’m not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.)
[And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France’s distance instead, then the same strategic considerations would have been hit much sooner.]
One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.
My very rough model of how their beliefs flow forward is:
Paul
Low initial confidence on truth/coherence of ‘core of generality’
→
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example. Natural selection wasn’t intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won’t be a spectrum of powerful intermediately general AIs from the historical case of evolution—all we know is that it didn’t happen during evolution, and we’ve got good reasons to think it’s a lot more likely to happen for AI. For other reasons (precedents already exist—MuZero is insect-brained but better at chess or go than a chimp, plus that’s the default with technology we’re heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that’s where it will start) - No core of generality
Then, from there:
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the ‘intelligence’ of narrow AI and prediction of continuous takeoff
Eliezer
High initial confidence on truth/coherence of ‘core of generality’
→
Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are don’t matter that much. Rather, we learn a generalizable fact about the overall cognitive landscape from human evolution—that there is a way to reach the core of generality quickly. This doesn’t make it certain that AGI development will go the same way, but it’s fairly strong evidence. The disanalogies between evolution and ML are indeed a slight update in Paul’s direction and suggest that AI could in principle take a smoother route to general intelligence, but we’ve never historically seen this smoother route (and it has to be not just technically ‘smooth’ but sufficiently smooth to give us a full 4-year economic doubling) or these intermediate powerful agents, so this correction is weak compared to the broader knowledge we gain from evolution. In other words, all we know is that there is a fast route to the core of generality but that it’s imaginable that there’s a slow route we’ve not yet seen—Core of generality
Then, from there:
Core of generality and very common presence of huge secrets in relevant tech progress reference class → Qualitative prediction of less common continuous progress on the ‘intelligence’ of narrow AI and prediction of discontinuous takeoff
Eliezer doesn’t have especially divergent views about benchmarks like perplexity because he thinks they’re not informative, but differs from Paul on qualitative predictions of how smoothly various practical capabilities/signs of ‘intelligence’ will emerge—he’s getting his qualitative predictions about this ultimately from interrogating his ‘cognitive landscape’ abstraction, while Paul is getting his from trend extrapolation on measures of practical capabilities and then translating those to qualitative predictions. These are very different origins, but they do eventually give different predictions about the likelihood of the same real-world events.
Since they only reach the point of discussing the same things at a very vague, qualitative level of detail, in order to get to a bet you have to back-track from both of their qualitative predictions of how likely the sudden emergence of various types of narrow intelligent behaviour are, find some clear metric for the narrow intelligent behaviour that we can apply fairly, and then there should be a difference in beliefs about the world before AI takeoff.
Updates on this after reflection and discussion (thanks to Rohin):
Saying Paul’s view is that the cognitive landscape of minds might be simply incoherent isn’t quite right—at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said ‘Paul doesn’t see this strong generality attractor in the cognitive landscape’ but it seems to me that it’s not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.
Also, on Paul’s view, it’s not that evolution is irrelevant as a counterexample. Rather, the specific fact of ‘evolution gave us general intelligence suddenly by evolutionary timescales’ is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.
These two initial claims are connected in a way I didn’t make explicit—No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.
The first clause already implies the second clause (since “how to get the core of generality” is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.
Nate’s Summary
Nate’s summary brings up two points I more or less ignored in my summary because I wasn’t sure what I thought—one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of “civilization fails to do the x-risk-minimizing thing” if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?
The other is the role that the modesty worldview plays in Eliezer’s objections.
I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of—I thought it was something like “overuse of the outside view, especially in a social cognition context”.
Which of the following is:
a) probably the product of a Modesty world-view?
b) no good reason to think comes from a Modesty world-view but still bad epistemology?
c) good epistemology?
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion, so you’re sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
My Eliezer-model doesn’t categorically object to this. See, e.g., Fake Causality:
And A Technical Explanation of Technical Explanation:
My Eliezer-model does object to things like ‘since I (from my position as someone who doesn’t understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model’s applicability’. Or ‘since the case for this model’s applicability isn’t iron-clad, you should sprinkle in a lot more expressions of verbal doubt’. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.
Part of his general anti-modesty and pro-Thielian-secrets view is that it’s very possible for other people to know things that justifiably make them much more confident than you are. So if you can’t pass the other person’s ITT / you don’t understand how they’re arriving at their conclusion (and you have no principled reason to think they can’t have a good model here), then you should be a lot more wary of inferring from their confidence that they’re biased.
My Eliezer-model thinks it’s possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren’t building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don’t share.)
My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory. (He’s also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory’s impressiveness, of a sort that could be egalitarianly accepted even by people who don’t understand probability theory.)
I don’t necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they’re not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they’re not doing that, an obvious guess is that it’s because they’re not getting a big win from that. As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won’t work, but maybe they’ve got enough people trying things to find some? But it’s hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.
While GPT-4 wouldn’t be a lot bigger than GPT-3, Sam Altman did indicate that it’d use a lot more compute. That’s consistent with Stack More Layers still working; they might just have found an even better use for compute.
(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)
If they’ve found some way to put a lot more compute into GPT-4 without making the model bigger, that’s a very different—and unnerving—development.
I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.
Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly
Say more about the source for this claim? I’m pretty sure he didn’t say that during the Q&A I’m sourcing my info from. And my impression is that they’re doing something more than this, both on priors (scaling laws says that optimal compute usage means you shouldn’t train to convergence — why would they start now?) and based on what he said during that Q&A.
This is based on:
The Q&A you mention
GPT-3 not being trained on even one pass of its training dataset
“Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)
Some experience with undertrained enormous language models underperforming relative to expectation
This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.
To be clear: Do you remember Sam Altman saying that “they’re simply training a GPT-3-variant for significantly longer”, or is that an inference from ~”it will use a lot more compute” and ~”it will not be much bigger”?
Because if you remember him saying that, then that contradicts my memory (and, uh, the notes that people took that I remember reading), and I’m confused.
While if it’s an inference: sure, that’s a non-crazy guess, and I take your point that smaller models are easier to deploy. I just want it to be flagged as a claimed deduction, not as a remembered statement.
(And I maintain my impression that something more is going on; especially since I remember Sam generally talking about how models might use more test-time compute in the future, and be able to think for longer on harder questions.)
Honestly, at this point, I don’t remember if it’s inferred or primary-sourced. Edited the above for clarity.
One way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota).
From the minimizing-X-risk perspective, it might be the worst possible way to train AIs.
As Jeff Clune (Uber AI) put it:
Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.
I was under the impression that GPT-4 would be gigantic, according to this quote from this Wired article:
Sam Altman explicitly contradicted that in a later q&a, when someone asked him about that quote.