I agree that will happen eventually, and the more nuanced version of my position is the one I outlined in my comment on CAIS:
Now I would say that there is some level of data, model capacity, and compute at which an end-to-end / monolithic approach outperforms a structured approach on the training distribution (this is related to but not the same as the bitter lesson). However, at low levels of these three, the structured approach will typically perform better. The required levels at which the end-to-end approach works better depends on the particular task, and increases with task difficulty.
Since we expect all three of these factors to grow over time, I then expect that there will be an expanding Pareto frontier where at any given point the most complex tasks are performed by structured approaches, but as time progresses these are replaced by end-to-end / monolithic systems (but at the same time new, even more complex tasks are found, that can be done in a structured way).
I think when we are first in the situation where AI systems are sufficiently competent to wrest control away from humanity if they wanted to, we would plausibly have robots that take in audiovisual input and can flexibly perform tasks that a human says to them (think of e.g. a household robot butler). So in that sense I agree that eventually we’ll have agents that link together language, vision, and robotics.
The thing I’m not that interested in (from a “how scared should we be” or “timelines” perspective) is when you take a bunch of different tasks, shove them into a single “generic agent”, and the resulting agent is worse on most of the tasks and isn’t correspondingly better at some new task that none of the previous systems could do.
So if for example you could draw an arrow on an image showing what you wanted a robot to do, and the robot then did that, that would be a novel capability that couldn’t be done by previous specialized systems (probably), and I’d be interested in that. It doesn’t look like this agent does that.
Does that mean the socratic models result from a few weeks ago, which does involve connecting more specialised models together, is a better example of progress?
I agree that will happen eventually, and the more nuanced version of my position is the one I outlined in my comment on CAIS:
I think when we are first in the situation where AI systems are sufficiently competent to wrest control away from humanity if they wanted to, we would plausibly have robots that take in audiovisual input and can flexibly perform tasks that a human says to them (think of e.g. a household robot butler). So in that sense I agree that eventually we’ll have agents that link together language, vision, and robotics.
The thing I’m not that interested in (from a “how scared should we be” or “timelines” perspective) is when you take a bunch of different tasks, shove them into a single “generic agent”, and the resulting agent is worse on most of the tasks and isn’t correspondingly better at some new task that none of the previous systems could do.
So if for example you could draw an arrow on an image showing what you wanted a robot to do, and the robot then did that, that would be a novel capability that couldn’t be done by previous specialized systems (probably), and I’d be interested in that. It doesn’t look like this agent does that.
Does that mean the socratic models result from a few weeks ago, which does involve connecting more specialised models together, is a better example of progress?
Yes