So: Bing is scary, I agree. But it’s scary in expected ways,
Every new indication we get that the dumb just-pump-money-into-transformers curves aren’t starting to bend at yet another scale causes an increase in worry. Unless you were completely sure that the scaling hypothesis for LLMs is completely correct, every new datapoint in its favor should make you shorten your timelines. Bing Chat could have underperformed the trend, the fact that it didn’t is what’s causing the update.
I expected that the scaling law would hold at least this long yeah. I’m much more uncertain about it holding to GPT-5 (let alone AGI) because of various reasons, but I didn’t expect GPT-4 to be the point where scaling laws stopped working. It’s Bayesian evidence toward increased worry, but in a way that feels borderline trivial.
By my definition of the word, that would be the point at which we’re either dead or we’ve won, so I expect it to be pretty noticeable on many dimensions. Specific examples vary based on the context, like with language models I would think we have AGI if it could simulate a deceptive simulacrum with the ability to do long-horizon planning and that was high-fidelity enough to do something dangerous (entirely autonomously without being driven toward this after a seed prompt) like upload its weights onto a private server it controls, or successfully acquire resources on the internet.
I know that there are other definitions people use however, and under some of them I would count GPT-3 as a weak AGI and Bing/GPT-4 as being slightly stronger. I don’t find those very useful definitions though, because then we don’t have as clear and evocative a term for the point at which model capabilities become dangerous.
I’m much more uncertain about it holding to GPT-5 (let alone AGI) because of various reasons
As someone who shares the intuition that scaling laws break down “eventually, but probably not immediately” (loosely speaking), can I ask you why you think that?
A mix of hitting a ceiling on available data to train on, increased scaling not giving obvious enough returns through an economic lens (for regulatory reasons, or from trying to get the model to do something it’s just tangentially good at) to be incentivized heavily for long (this is more of a practical note than a theoretical one), and general affordances for wide confidence intervals over periods longer than a year or two. To be clear, I don’t think it’s much more probable than not that these would break scaling laws. I can think of plausible-sounding ways all of these don’t end up being problems. But I don’t have high credence in those predictions, hence why I’m much more uncertain about them.
Every new indication we get that the dumb just-pump-money-into-transformers curves aren’t starting to bend at yet another scale causes an increase in worry. Unless you were completely sure that the scaling hypothesis for LLMs is completely correct, every new datapoint in its favor should make you shorten your timelines. Bing Chat could have underperformed the trend, the fact that it didn’t is what’s causing the update.
I expected that the scaling law would hold at least this long yeah. I’m much more uncertain about it holding to GPT-5 (let alone AGI) because of various reasons, but I didn’t expect GPT-4 to be the point where scaling laws stopped working. It’s Bayesian evidence toward increased worry, but in a way that feels borderline trivial.
What would you need to see to convince you that AGI had arrived?
By my definition of the word, that would be the point at which we’re either dead or we’ve won, so I expect it to be pretty noticeable on many dimensions. Specific examples vary based on the context, like with language models I would think we have AGI if it could simulate a deceptive simulacrum with the ability to do long-horizon planning and that was high-fidelity enough to do something dangerous (entirely autonomously without being driven toward this after a seed prompt) like upload its weights onto a private server it controls, or successfully acquire resources on the internet.
I know that there are other definitions people use however, and under some of them I would count GPT-3 as a weak AGI and Bing/GPT-4 as being slightly stronger. I don’t find those very useful definitions though, because then we don’t have as clear and evocative a term for the point at which model capabilities become dangerous.
As someone who shares the intuition that scaling laws break down “eventually, but probably not immediately” (loosely speaking), can I ask you why you think that?
A mix of hitting a ceiling on available data to train on, increased scaling not giving obvious enough returns through an economic lens (for regulatory reasons, or from trying to get the model to do something it’s just tangentially good at) to be incentivized heavily for long (this is more of a practical note than a theoretical one), and general affordances for wide confidence intervals over periods longer than a year or two. To be clear, I don’t think it’s much more probable than not that these would break scaling laws. I can think of plausible-sounding ways all of these don’t end up being problems. But I don’t have high credence in those predictions, hence why I’m much more uncertain about them.