The reason he offered that bet was because Elon Musk had predicted that we’d likely have AGI by 2029, so you’re drawing the wrong conclusion from that. Other people joined in with Marcus to push the wager up to $500k, but Musk didn’t take the bet of course, so you might infer something from that! The bet itself is quite insightful, and I would be very interested to hear your thoughts on its 5 conditions: https://garymarcus.substack.com/p/dear-elon-musk-here-are-five-things In fact anyone thinking that AGI is imminent would do well to read it—it focusses the mind on specific capabilities and how you might build them, which I think it more useful than thinking in vague terms like ‘well AI has this much smartness already, how much will it have in 20 / 80 years!’. I think it’s useful and necessary to understand at that level of detail, otherwise we might be watching someone building a taller and taller ladder, and somehow thinking that’s going to get us to the moon.
I didn’t actually update my timelines shorter in response to his bets since I was aware his motivations were partially to poke Elon and maybe get some (from what I understand his perspective to be) risk-free money. I’d just be far more persuaded had he offered odds that actually approached his apparent beliefs. As it is, it’s uninformative.
His 5 tests are indeed a solid test of capability, though some of the tests seem much harder than others. If an AI could do 3⁄5 of them, I would be inclined to say AGI is extremely close, if not present.
I would be surprised if we see the cook one before AGI, given the requirement that it works in an arbitrary kitchen. I expect physical world applications to lag purely digital applications just because of the huge extra layer of difficulty imposed by working in a real time environment, all the extra variables that are difficult to capture in a strictly digital context, and the reliability requirements.
The “read a book and talk about it” one seems absolutely trivial in comparison.
I would really like to see him make far more predictions on a bunch of different timescales. If he predicted things correctly about GPT-4, the state of {whatever architecture} in 2025, the progress on the MATH dataset by 2025, and explained how all of these things aren’t concerning and so on, I would be much more inclined to step towards his position. (I don’t expect him to get everything right, that would be silly, I just want to see evidence, and greater details, of a generally functioning mental model.)
I agree it’s an attempt to poke Elon, although I suspect he knew that he’d never take the bet. Also agree that anything involving real world robotics in unknown environments is massively more difficult. Having said that, the criteria from Effective Altuirism here:
for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less
do say ‘any job’, and we often seem to forget how many jobs require insane levels of dexterity and dealing with the unknown. We could think about the difficulty of building a robot plasterer or car mechanic for example, and see similar levels of complexity, if we pay attention to all the tasks they actually have to do. So I think it fair to have it part of AGI. I do agree that more detailed predictions would be hugely helpful. Marcus’s colleague, Rodney Brooks, has a fun scorecard of predictions for robotics and AI here:
https://rodneybrooks.com/predictions-scorecard-2022-january-01/ which I think is quite useful. As an aside, I had a fun 20 minute chat with GPT-3 today and convinced myself that it doesn’t have the slightest understand of meaning at all! Can send the transcript if interested.
I’d agree with that, I just strongly suspect we can hit dangerous capability without running this experiment first given how research proceeds. If there’s an AI system displaying other blatant signs of being an AGI (by this post’s definition, and assuming non-foom situation, and assuming we’re not dead yet), I won’t bother spending much time wondering about whether it could be a cook.
As an aside, I had a fun 20 minute chat with GPT-3 today and convinced myself that it doesn’t have the slightest understand of meaning at all!
Yup- GPT-3 is shallow in a lot of important ways. It often relies on what appears to be interpolation and memorization. The part that worries me is that architectures like it can still do very difficult reasoning tasks that many humans can’t, like the MATH dataset and minerva. When I look at those accomplishments, I’m not thinking “wow this ML architecture is super duper smart and amazing,” I think “uh oh that part of reasoning is apparently easy if current transformers can do it, while simultaneously failing at trivial things.” We keep getting signals that more and more of our ineffable cognitive skills are… just not that hard.
As we push into architectures that rely more on generalization through explicit reasoning (or maybe even interpolation/memorization at sufficiently absurd scales), a lot of those goofy little mistakes are going to collapse. I’m really worried that an AI that is built for actual reasoning with an architecture able to express what reasoning entails algorithmically is going to be a massive discontinuity, and that it might show up in less than 2 years. It might not take us all the way to AGI in one step, but I’m not looking forward to it.
I really dislike that, as a byproduct of working on safety research, I keep coming up with what look like promising avenues of research for massive capability gain. They seem so much easier to find than good safety ideas, or good ideas in the other fields I work in. I’ve done enough research that I know they wouldn’t all pan out, but the apparent ease is unsettling.
I think you need to be sceptical about what kind of reasoning these systems are actually doing. My contention is that they are all shallow. A system that is trained on near-infinite training sets can look indistinguishable from one that can do deep reasoning, but is in fact just pattern-matching. Or might be. This paper is very pertinent I think: https://arxiv.org/abs/2205.11502 short summary: train a deep network on examples from a logical reasoning task, obtain near-perfect validation error, but find it hasn’t learnt the task at all! It’s learned arbitrary statistical properties of the dataset, completely unrelated to the task. Which is what deep learning does by default. That isn’t going to go away with scale—if anything, it will get worse. And if we say we’ll fix it by adding ‘actual reasoning’, well… good luck! AI spent 2 decades trying to build symbolic reasoning systems, getting that to work is incredibly hard. Now I haven’t actually read up on the Minerva results yet, and will do so, but I do think we need to exercise caution before attributing reasoning to something, if there are dumber ways to get the same behaviour. To me all this says is that we need a new paradigm entirely to get anywhere close to AGI. That’s not impossible, but it makes me sufficiently confident that it’s going to be decades, if not a couple of centuries.
My contention is that they are all shallow. A system that is trained on near-infinite training sets can look indistinguishable from one that can do deep reasoning, but is in fact just pattern-matching.
I agree.
This is a big part of what my post is about.
We have AI that is obviously dumb, in the sense of failing on trivial tasks and having mathematically provable strict bounds.
That type of AI is eating progressively larger chunks of things we used to call “intelligence.”
The things we used to call intelligence are, apparently, easy.
We should expect (and have good reason to believe) that more of what we currently call intelligence to be easy, and it may very well be consumed by dumb architectures.
Less dumb architectures are being worked on, and do not require paradigm shifts.
Uh oh.
This is a statement mostly about the problem, not the problem solver. The problem we thought was hard just isn’t.
And if we say we’ll fix it by adding ‘actual reasoning’, well… good luck! AI spent 2 decades trying to build symbolic reasoning systems, getting that to work is incredibly hard.
Going to be deliberately light on details here again, sorry. When I say ‘actual reasoning,’ I mean AI that is trained in a way that learning the capabilities provided by reasoning is a more direct byproduct, rather than a highly indirect feature that arises from its advantages in blind token prediction. (Though a sufficiently large dumb system might manage to capture way too much anyway.)
I’m not suggesting we need a new SHRDLU. There are paths fully contained within the current deep learning paradigm. There is empirical support for this.
That’s a very well-argued point. I have precisely the opposite intuition of course, but I can’t deny the strength of your argument.. I tend to be less interested in tasks that are well-bounded, than those that are open-ended and uncertain. I agree that much of what we call intelligent might be much simpler. But then I think common sense reasoning is much harder. I think maybe I’ll try to draw up my own list of tasks for AGI :)
Is this research into ‘actual reasoning’ that you’re deliberately being light on details about something that is out in the public (e.g. on arxiv), or is this something you’ve witnessed privately and anticipate will become public in the near future?
Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an “oh no we’re doomed” proof of concept. Just some very obvious “yup that’ll work” stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I’m just not aware of it. I just don’t want to be the one to oops and push on the wrong side of the capability-safety balance.
Other people joined in with Marcus to push the wager up to $500k, but Musk didn’t take the bet of course, so you might infer something from that!
That Musk generally doesn’t let other people set the agenda? I don’t remember any time where someone challenged Musk publically to a bet and he took it.
Quite possibly. I just meant: you can’t conclude from the bet that AGI is even more imminent.
Genuinely, I would love to hear people’s thoughts on Marcus’s 5 conditions, and hear their reasoning. For me, the one of having a robot cook that can work in pretty much anyone’s kitchen is a severe test, and a long way from current capabilities.
Little code that’s written by humans that’s 10000 lines long is bug free. Bug-freeness seems to me like to high of a standard.
When it comes to kitchen work it matters a lot for the practical problems of taking the job of existing people. On the other hand it has less relevance to whether or not the AI will speed up AI development.
Otherwise, I do agree that that the other items are good one’s to make predictions. It would be worthwhile to make metaculus questions for them.
The reason he offered that bet was because Elon Musk had predicted that we’d likely have AGI by 2029, so you’re drawing the wrong conclusion from that. Other people joined in with Marcus to push the wager up to $500k, but Musk didn’t take the bet of course, so you might infer something from that!
The bet itself is quite insightful, and I would be very interested to hear your thoughts on its 5 conditions:
https://garymarcus.substack.com/p/dear-elon-musk-here-are-five-things
In fact anyone thinking that AGI is imminent would do well to read it—it focusses the mind on specific capabilities and how you might build them, which I think it more useful than thinking in vague terms like ‘well AI has this much smartness already, how much will it have in 20 / 80 years!’. I think it’s useful and necessary to understand at that level of detail, otherwise we might be watching someone building a taller and taller ladder, and somehow thinking that’s going to get us to the moon.
FWIW, I work in DL, and I agree with his analysis
I didn’t actually update my timelines shorter in response to his bets since I was aware his motivations were partially to poke Elon and maybe get some (from what I understand his perspective to be) risk-free money. I’d just be far more persuaded had he offered odds that actually approached his apparent beliefs. As it is, it’s uninformative.
His 5 tests are indeed a solid test of capability, though some of the tests seem much harder than others. If an AI could do 3⁄5 of them, I would be inclined to say AGI is extremely close, if not present.
I would be surprised if we see the cook one before AGI, given the requirement that it works in an arbitrary kitchen. I expect physical world applications to lag purely digital applications just because of the huge extra layer of difficulty imposed by working in a real time environment, all the extra variables that are difficult to capture in a strictly digital context, and the reliability requirements.
The “read a book and talk about it” one seems absolutely trivial in comparison.
I would really like to see him make far more predictions on a bunch of different timescales. If he predicted things correctly about GPT-4, the state of {whatever architecture} in 2025, the progress on the MATH dataset by 2025, and explained how all of these things aren’t concerning and so on, I would be much more inclined to step towards his position. (I don’t expect him to get everything right, that would be silly, I just want to see evidence, and greater details, of a generally functioning mental model.)
I agree it’s an attempt to poke Elon, although I suspect he knew that he’d never take the bet. Also agree that anything involving real world robotics in unknown environments is massively more difficult. Having said that, the criteria from Effective Altuirism here:
for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less
do say ‘any job’, and we often seem to forget how many jobs require insane levels of dexterity and dealing with the unknown. We could think about the difficulty of building a robot plasterer or car mechanic for example, and see similar levels of complexity, if we pay attention to all the tasks they actually have to do. So I think it fair to have it part of AGI. I do agree that more detailed predictions would be hugely helpful. Marcus’s colleague, Rodney Brooks, has a fun scorecard of predictions for robotics and AI here:
https://rodneybrooks.com/predictions-scorecard-2022-january-01/
which I think is quite useful. As an aside, I had a fun 20 minute chat with GPT-3 today and convinced myself that it doesn’t have the slightest understand of meaning at all! Can send the transcript if interested.
I’d agree with that, I just strongly suspect we can hit dangerous capability without running this experiment first given how research proceeds. If there’s an AI system displaying other blatant signs of being an AGI (by this post’s definition, and assuming non-foom situation, and assuming we’re not dead yet), I won’t bother spending much time wondering about whether it could be a cook.
Yup- GPT-3 is shallow in a lot of important ways. It often relies on what appears to be interpolation and memorization. The part that worries me is that architectures like it can still do very difficult reasoning tasks that many humans can’t, like the MATH dataset and minerva. When I look at those accomplishments, I’m not thinking “wow this ML architecture is super duper smart and amazing,” I think “uh oh that part of reasoning is apparently easy if current transformers can do it, while simultaneously failing at trivial things.” We keep getting signals that more and more of our ineffable cognitive skills are… just not that hard.
As we push into architectures that rely more on generalization through explicit reasoning (or maybe even interpolation/memorization at sufficiently absurd scales), a lot of those goofy little mistakes are going to collapse. I’m really worried that an AI that is built for actual reasoning with an architecture able to express what reasoning entails algorithmically is going to be a massive discontinuity, and that it might show up in less than 2 years. It might not take us all the way to AGI in one step, but I’m not looking forward to it.
I really dislike that, as a byproduct of working on safety research, I keep coming up with what look like promising avenues of research for massive capability gain. They seem so much easier to find than good safety ideas, or good ideas in the other fields I work in. I’ve done enough research that I know they wouldn’t all pan out, but the apparent ease is unsettling.
I think you need to be sceptical about what kind of reasoning these systems are actually doing. My contention is that they are all shallow. A system that is trained on near-infinite training sets can look indistinguishable from one that can do deep reasoning, but is in fact just pattern-matching. Or might be. This paper is very pertinent I think:
https://arxiv.org/abs/2205.11502
short summary: train a deep network on examples from a logical reasoning task, obtain near-perfect validation error, but find it hasn’t learnt the task at all! It’s learned arbitrary statistical properties of the dataset, completely unrelated to the task. Which is what deep learning does by default. That isn’t going to go away with scale—if anything, it will get worse. And if we say we’ll fix it by adding ‘actual reasoning’, well… good luck! AI spent 2 decades trying to build symbolic reasoning systems, getting that to work is incredibly hard.
Now I haven’t actually read up on the Minerva results yet, and will do so, but I do think we need to exercise caution before attributing reasoning to something, if there are dumber ways to get the same behaviour.
To me all this says is that we need a new paradigm entirely to get anywhere close to AGI. That’s not impossible, but it makes me sufficiently confident that it’s going to be decades, if not a couple of centuries.
I agree.
This is a big part of what my post is about.
We have AI that is obviously dumb, in the sense of failing on trivial tasks and having mathematically provable strict bounds.
That type of AI is eating progressively larger chunks of things we used to call “intelligence.”
The things we used to call intelligence are, apparently, easy.
We should expect (and have good reason to believe) that more of what we currently call intelligence to be easy, and it may very well be consumed by dumb architectures.
Less dumb architectures are being worked on, and do not require paradigm shifts.
Uh oh.
This is a statement mostly about the problem, not the problem solver. The problem we thought was hard just isn’t.
Going to be deliberately light on details here again, sorry. When I say ‘actual reasoning,’ I mean AI that is trained in a way that learning the capabilities provided by reasoning is a more direct byproduct, rather than a highly indirect feature that arises from its advantages in blind token prediction. (Though a sufficiently large dumb system might manage to capture way too much anyway.)
I’m not suggesting we need a new SHRDLU. There are paths fully contained within the current deep learning paradigm. There is empirical support for this.
That’s a very well-argued point. I have precisely the opposite intuition of course, but I can’t deny the strength of your argument.. I tend to be less interested in tasks that are well-bounded, than those that are open-ended and uncertain. I agree that much of what we call intelligent might be much simpler. But then I think common sense reasoning is much harder. I think maybe I’ll try to draw up my own list of tasks for AGI :)
Is this research into ‘actual reasoning’ that you’re deliberately being light on details about something that is out in the public (e.g. on arxiv), or is this something you’ve witnessed privately and anticipate will become public in the near future?
Here is a paper from January 2022 on arXiv that details the sort of generalization-hop we’re seeing models doing.
Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an “oh no we’re doomed” proof of concept. Just some very obvious “yup that’ll work” stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I’m just not aware of it. I just don’t want to be the one to oops and push on the wrong side of the capability-safety balance.
That Musk generally doesn’t let other people set the agenda? I don’t remember any time where someone challenged Musk publically to a bet and he took it.
Quite possibly. I just meant: you can’t conclude from the bet that AGI is even more imminent.
Genuinely, I would love to hear people’s thoughts on Marcus’s 5 conditions, and hear their reasoning. For me, the one of having a robot cook that can work in pretty much anyone’s kitchen is a severe test, and a long way from current capabilities.
Little code that’s written by humans that’s 10000 lines long is bug free. Bug-freeness seems to me like to high of a standard.
When it comes to kitchen work it matters a lot for the practical problems of taking the job of existing people. On the other hand it has less relevance to whether or not the AI will speed up AI development.
Otherwise, I do agree that that the other items are good one’s to make predictions. It would be worthwhile to make metaculus questions for them.