I feel like the biggest subjective thing is that I don’t feel like there is a “core of generality” that GPT-3 is missing
I just expect it to gracefully glide up to a human-level foom-ing intelligence
This is a place where I suspect we have a large difference of underlying models. What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers? Particularly if you have an answer to anything that sounds like it’s in the style of Gwern’s questions, because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody’s model of “what kind of generality makes it into GPT-3′s successors”.
If you give me 1 or 10 examples of surface capabilities I’m happy to opine. If you want me to name industries or benchmarks, I’m happy to opine on rates of progress. I don’t like the game where you say “Hey, say some stuff. I’m not going to predict anything and I probably won’t engage quantitatively with it since I don’t think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3.”
I don’t even know which of Gwern’s questions you think are interesting/meaningful. “Good meta-learning”—I don’t know what this means but if actually ask a real question I can guess. Qualitative descriptions—what is even a qualitative description of GPT-3? “Causality”—I think that’s not very meaningful and will be used to describe quantitative improvements at some level made up by the speaker. The spikes in capabilities Gwern talks about seem to be basically measurement artifacts, but if you want to describe a particular measurements I can tell you whether they will have similar artifacts. (How much economic value I can talk about, but you don’t seem interested.)
Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it’s over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day. The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, “Oh, no, it definitely couldn’t start in 2022” and then I say “Starting in 2022 would not surprise me” by way of making an antiprediction that contradicts them. It may sound bold and startling to them, but from my own perspective I’m just expressing my ignorance. That’s one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it—why wait for me to ask you?
If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3′s current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning. We haven’t figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we’ve already applied using the right loss functions.
So there’s a qualitative guess at a surface capability we might see soon—but when is “soon”? I don’t know; history suggests that even what predictably happens later is extremely hard to time. There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, “170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent” and “the architecture can potentially look like somebody Stacked More Layers and like it didn’t need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways” and “once things are sufficiently well understood, it will look clear in retrospect that we could’ve gotten this translation ability in 2020 if we’d spent compute the right way”.
It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates. Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would’ve gotten it done in 2020, a la ReLUs vs sigmoids. Nor can I say that such a thing will happen in 2022 or 2025, because I don’t know how long it takes to figure out how to do what you clearly ought to be able to do.
I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn’t achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don’t need to have explicitly bet against me to have gained more virtue points.
I’m mostly not looking for virtue points, I’m looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.
I don’t think it’s surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren’t happy just predicting numbers for overall value added from machine translation, I’d kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.
It seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn’t seem like you’ll be able to find (ii) by looking at predictions for the near future.
It seems to me like Eliezer rejects a lot of important heuristics like “things change slowly” and “most innovations aren’t big deals” and so on. One reason he may do that is because he literally doesn’t know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he’d see that actual gradualists are much better predictors than he imagines.
That seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I’d guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don’t apply, and that this is foreseeable because of e.g. the nature of recursion. I’d love to hear more about what sort of knowledge about “operating these heuristics” you think he’s missing!
Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be “shaken out” of his fast-takeoff view due to successful future predictions (until it’s too late).
He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.
I agree that after shaking out the other disagreements, we could just end up with Eliezer saying “yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we’ve applied AI” (or “AI improving AI will be fundamentally unlike automating humans improving AI”) but I don’t think that’s the core of his position right now.
I agree we seem to have some kind of deeper disagreement here.
I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn’t use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.
I think these won’t get to human level in the next 5 years. We’ll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren’t currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won’t just say “the Future is hard to predict.” (Though separately I expect to make somewhat better predictions than you in most of these domains.)
A plausible example is that I think it’s pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) “on track” to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I’d guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that’s 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.
How long time do you see between “1 AI clearly on track to Foom” and “First AI to actually Foom”?
My weak guess is Eliezer would say “Probably quite little time”, but your model of the world requires the GWP to double over a 4 year period, and I’m guessing that period probably starts later than 2026.
I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.
This is a place where I suspect we have a large difference of underlying models. What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers? Particularly if you have an answer to anything that sounds like it’s in the style of Gwern’s questions, because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody’s model of “what kind of generality makes it into GPT-3′s successors”.
If you give me 1 or 10 examples of surface capabilities I’m happy to opine. If you want me to name industries or benchmarks, I’m happy to opine on rates of progress. I don’t like the game where you say “Hey, say some stuff. I’m not going to predict anything and I probably won’t engage quantitatively with it since I don’t think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3.”
I don’t even know which of Gwern’s questions you think are interesting/meaningful. “Good meta-learning”—I don’t know what this means but if actually ask a real question I can guess. Qualitative descriptions—what is even a qualitative description of GPT-3? “Causality”—I think that’s not very meaningful and will be used to describe quantitative improvements at some level made up by the speaker. The spikes in capabilities Gwern talks about seem to be basically measurement artifacts, but if you want to describe a particular measurements I can tell you whether they will have similar artifacts. (How much economic value I can talk about, but you don’t seem interested.)
Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it’s over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day. The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, “Oh, no, it definitely couldn’t start in 2022” and then I say “Starting in 2022 would not surprise me” by way of making an antiprediction that contradicts them. It may sound bold and startling to them, but from my own perspective I’m just expressing my ignorance. That’s one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it—why wait for me to ask you?
If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3′s current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning. We haven’t figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we’ve already applied using the right loss functions.
So there’s a qualitative guess at a surface capability we might see soon—but when is “soon”? I don’t know; history suggests that even what predictably happens later is extremely hard to time. There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, “170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent” and “the architecture can potentially look like somebody Stacked More Layers and like it didn’t need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways” and “once things are sufficiently well understood, it will look clear in retrospect that we could’ve gotten this translation ability in 2020 if we’d spent compute the right way”.
It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates. Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would’ve gotten it done in 2020, a la ReLUs vs sigmoids. Nor can I say that such a thing will happen in 2022 or 2025, because I don’t know how long it takes to figure out how to do what you clearly ought to be able to do.
I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn’t achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don’t need to have explicitly bet against me to have gained more virtue points.
I’m mostly not looking for virtue points, I’m looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.
I don’t think it’s surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren’t happy just predicting numbers for overall value added from machine translation, I’d kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.
It seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn’t seem like you’ll be able to find (ii) by looking at predictions for the near future.
It seems to me like Eliezer rejects a lot of important heuristics like “things change slowly” and “most innovations aren’t big deals” and so on. One reason he may do that is because he literally doesn’t know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he’d see that actual gradualists are much better predictors than he imagines.
That seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I’d guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don’t apply, and that this is foreseeable because of e.g. the nature of recursion. I’d love to hear more about what sort of knowledge about “operating these heuristics” you think he’s missing!
Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be “shaken out” of his fast-takeoff view due to successful future predictions (until it’s too late).
He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.
I agree that after shaking out the other disagreements, we could just end up with Eliezer saying “yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we’ve applied AI” (or “AI improving AI will be fundamentally unlike automating humans improving AI”) but I don’t think that’s the core of his position right now.
I agree we seem to have some kind of deeper disagreement here.
I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn’t use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.
I think these won’t get to human level in the next 5 years. We’ll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren’t currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won’t just say “the Future is hard to predict.” (Though separately I expect to make somewhat better predictions than you in most of these domains.)
A plausible example is that I think it’s pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) “on track” to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I’d guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that’s 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.
How long time do you see between “1 AI clearly on track to Foom” and “First AI to actually Foom”? My weak guess is Eliezer would say “Probably quite little time”, but your model of the world requires the GWP to double over a 4 year period, and I’m guessing that period probably starts later than 2026.
I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.
I think “on track to foom” is a very long way before “actually fooms.”