What I mean is that it cannot take a task it faces, simulate what would happen if it does various different things, and use this to expand its task capabilities. It “just” does mimicry.
Well, it could learn to do it. But that’d be like a human doing math to predict how a system works, rather than a human intuiting how a system works. Massive difference in speed means some other algorithm would probably go AGI first?
While I don’t dispute that it could learn to do it, the current trained model cannot do this.
I mean, in what sense has a Decision Transformer like Gato not already learned to do it by extensive 1-step prediction?
I mean for one, its architecture does not permit its weights to change without receiving training data, and it does not generate training data itself.
As we know perfectly well by now, frozen weights do not preclude runtime learning, and Gato is trained on meta-learning tasks (MetaWorld and Procgen, plus the real-world datasets which are longtailed and elicit meta-learning in GPT-3 etc). They also mention adding Transformer-XL recurrent memory at runtime.
I mean, in what sense has a Decision Transformer like Gato not already learned to do it by extensive 1-step prediction?
I don’t think Gato does the sort of training-in-simulation that Dreamer does. And that training-in-simulation seems like a major part of intelligence. So I think Dreamer has a component needed[1] for AGI that Gato lacks.
As we know perfectly well by now, frozen weights do not preclude runtime learning, and Gato is trained on meta-learning tasks (MetaWorld and Procgen, plus the real-world datasets which are longtailed and elicit meta-learning in GPT-3 etc). They also mention adding Transformer-XL recurrent memory at runtime.
Gato supports a sequence length of only 1048, which means that it cannot remember its meta-”learned” things for very long. Non-frozen weights would eliminate that problem.
Well, “needed”, you could perhaps brute-force your way to a solution to AGI without this component, but then the problem is that Gato does not have enough dakka to be generally intelligent.
I’m also a bit concerned we may be moving the goalposts here a bit. Not sure if there’s a clear way to quantify how that’s being done, just a general impression I’m getting
I don’t agree that I’m moving the goalposts, these were the sorts of ingredients I was thinking about before seeing Gato, as I was inspired by e.g. Dreamer.
I’m curious to understand your view better. Would you predict that as we keep making bigger versions of this trained longer on a wider range of tasks… eventually it could automate away all jobs in the economy, but only after literally being trained on all jobs—it wouldn’t be able to eventually start generalizing across jobs to stuff it hasn’t done before?O
r would you predict that it wouldn’t even get that far—the performance improvements would s-curve and plateau?
I’d predict that as you scale it up and train it on more and more things, it would continually improve its performance at a steady and predictable pace, but that different methods would eventually start improving faster than it because they are able to exploit additional strategies that this one does not have built-in and can at best simulate at the cost of orders of magnitude of efficiency.
One could argue that I should call it an AGI since I do believe it could be generally intelligent when scaled up, but I wouldn’t agree with this. “When scaled up” would involve not just scaling up the network, but also scaling up e.g. the training demonstrations. It would be those demonstrations that would contain most of the intelligence that it would gain by scaling up, not the algorithm itself. Whereas an algorithm that would be capable of experimenting, planning in simulation, and adjusting itself to improve its performance would have the intelligence built-in in a more fundamental way.
(I should add that I don’t necessarily think these sorts of planning and other capabilities require much innovation. There are already AIs that I would label as capable of planning, e.g. Dreamer. The point is just that this AI doesn’t have those components and therefore doesn’t deserve to be called AGI. Dreamer of course has its own limitations.)
An Artificial general intelligence, or AGI, is a machine capable of behaving intelligently over many domains. The term can be taken as a contrast to narrow AI, systems that do things that would be considered intelligent if a human were doing them, but that lack the sort of general, flexible learning ability that would let them tackle entirely new domains. Though modern computers have drastically more ability to calculate than humans, this does not mean that they are generally intelligent, as they have little ability to invent new problem-solving techniques, and their abilities are targeted in narrow domains.
If we consider only the first sentence, then yes. The rest of the paragraph points to something like “being able to generalize to new domains”. Not sure if Gato counts. (NB: this is just a LW tag, not a full-fledged definition.)
If by “sort of general, flexible learning ability that would let them tackle entirely new domains” we include adding new tokenised vectors in the training set, then this fit the definition. Of course this is “cheating” since the system is not learning purely by itself, but for the purpose of building a product or getting the tasks done this does not really matter.
And it’s not unconcievable to imagine self-supervised tokens generation to get more skills and perhaps a K-means algorithm to make sure that the new embeddings do not interfere with previous knowledge. It’s a dumb way of getting smarter, but apparently it works thanks to scale effects!
So perhaps a “proto-AGI” is a better term to use for it. Not quite the full thing just yet, but shows clear generality across a wide number of domains. If it can spread out further and become much larger, as well as have recursivity (which might require an entirely different architecture), it could become what we’ve all been waiting for.
I would agree with “proto-AGI”. I might soon write a blog on this, but ideally we could define a continuous value to track how close we are to AGI, which is increasing if:
-the tasks to solve are very different from each other
-the tasks are complex
-how well a task have been solved
-few experience (or info) is fed to the system
-experience is not directly related to the task
-experience is very raw
-computation is done in few steps
Then adding new tasks and changing the environment.
Would it be fair to call this AGI, albeit not superintelligent yet?
đź‘€
Yes. Sub-human-level AGI.
Seems questionable since it can’t do planning?
It can’t? Stacking and atari requires at least some of that.
But it was trained on stacking and Atari, right?
What I mean is that it cannot take a task it faces, simulate what would happen if it does various different things, and use this to expand its task capabilities. It “just” does mimicry.
It is a generative Transformer trained offline to predict tokens. Why can’t it?
Well, it could learn to do it. But that’d be like a human doing math to predict how a system works, rather than a human intuiting how a system works. Massive difference in speed means some other algorithm would probably go AGI first?
While I don’t dispute that it could learn to do it, the current trained model cannot do this.
I mean, in what sense has a Decision Transformer like Gato not already learned to do it by extensive 1-step prediction?
As we know perfectly well by now, frozen weights do not preclude runtime learning, and Gato is trained on meta-learning tasks (MetaWorld and Procgen, plus the real-world datasets which are longtailed and elicit meta-learning in GPT-3 etc). They also mention adding Transformer-XL recurrent memory at runtime.
I don’t think Gato does the sort of training-in-simulation that Dreamer does. And that training-in-simulation seems like a major part of intelligence. So I think Dreamer has a component needed[1] for AGI that Gato lacks.
Gato supports a sequence length of only 1048, which means that it cannot remember its meta-”learned” things for very long. Non-frozen weights would eliminate that problem.
Well, “needed”, you could perhaps brute-force your way to a solution to AGI without this component, but then the problem is that Gato does not have enough dakka to be generally intelligent.
I’m also a bit concerned we may be moving the goalposts here a bit. Not sure if there’s a clear way to quantify how that’s being done, just a general impression I’m getting
I don’t agree that I’m moving the goalposts, these were the sorts of ingredients I was thinking about before seeing Gato, as I was inspired by e.g. Dreamer.
I’m curious to understand your view better. Would you predict that as we keep making bigger versions of this trained longer on a wider range of tasks… eventually it could automate away all jobs in the economy, but only after literally being trained on all jobs—it wouldn’t be able to eventually start generalizing across jobs to stuff it hasn’t done before?O
r would you predict that it wouldn’t even get that far—the performance improvements would s-curve and plateau?
I’d predict that as you scale it up and train it on more and more things, it would continually improve its performance at a steady and predictable pace, but that different methods would eventually start improving faster than it because they are able to exploit additional strategies that this one does not have built-in and can at best simulate at the cost of orders of magnitude of efficiency.
One could argue that I should call it an AGI since I do believe it could be generally intelligent when scaled up, but I wouldn’t agree with this. “When scaled up” would involve not just scaling up the network, but also scaling up e.g. the training demonstrations. It would be those demonstrations that would contain most of the intelligence that it would gain by scaling up, not the algorithm itself. Whereas an algorithm that would be capable of experimenting, planning in simulation, and adjusting itself to improve its performance would have the intelligence built-in in a more fundamental way.
(I should add that I don’t necessarily think these sorts of planning and other capabilities require much innovation. There are already AIs that I would label as capable of planning, e.g. Dreamer. The point is just that this AI doesn’t have those components and therefore doesn’t deserve to be called AGI. Dreamer of course has its own limitations.)
Why do you think it can’t?
PS: Mimicry is a fine art too, check this out: https://​​www.deepmind.com/​​publications/​​creating-interactive-agents-with-imitation-learning
I mean for one, its architecture does not permit its weights to change without receiving training data, and it does not generate training data itself.
Mimicry is limited by the availability of illustrations in various ways. E.g. it can’t much exceed the demonstrations or use radically
from the lesswrong docs
If we consider only the first sentence, then yes. The rest of the paragraph points to something like “being able to generalize to new domains”. Not sure if Gato counts. (NB: this is just a LW tag, not a full-fledged definition.)
If by “sort of general, flexible learning ability that would let them tackle entirely new domains” we include adding new tokenised vectors in the training set, then this fit the definition. Of course this is “cheating” since the system is not learning purely by itself, but for the purpose of building a product or getting the tasks done this does not really matter.
And it’s not unconcievable to imagine self-supervised tokens generation to get more skills and perhaps a K-means algorithm to make sure that the new embeddings do not interfere with previous knowledge. It’s a dumb way of getting smarter, but apparently it works thanks to scale effects!
I have always been cautios, but I would say yes this time.
With the caveat that it learns new tasks only from supervised data, and not reusing previous experience.
So perhaps a “proto-AGI” is a better term to use for it. Not quite the full thing just yet, but shows clear generality across a wide number of domains. If it can spread out further and become much larger, as well as have recursivity (which might require an entirely different architecture), it could become what we’ve all been waiting for.
I would agree with “proto-AGI”. I might soon write a blog on this, but ideally we could define a continuous value to track how close we are to AGI, which is increasing if:
-the tasks to solve are very different from each other
-the tasks are complex
-how well a task have been solved
-few experience (or info) is fed to the system
-experience is not directly related to the task
-experience is very raw
-computation is done in few steps
Then adding new tasks and changing the environment.