Early transformative AI systems will probably do impressive technological projects by being trained on smaller tasks with shorter feedback loops and then composing these abilities in the context of large collaborative projects (initially involving a lot of humans but over time increasingly automated). When Eliezer dismisses the possibility of AI systems performing safer tasks millions of times in training and then safely transferring to “build nanotechnology” (point 11 of list of lethalities) he is not engaging with the kind of system that is likely to be built or the kind of hope people have in mind.
It seems like Paul is imagining something CAIS-like, where you compose a bunch of AI abilities that are fairly robust in their behavior, and then conglomerate them into large projects that do big things, much like human organizations.
(Unless I’m misunderstanding, in which case the rest of this comment is obviated.)
It seems like this working depends on two factors:
First of all, it needs to be the case that conglomerations like this are competitive with giant models that are a single unified brain.
On first pass, this assumption seems pretty untrue? The communication bandwidth, and ability to operate as a unit, of people in an organization, is much much lower than that of the sub-modules of a person’s brain.
Second it supposes that when you compose a bunch of AI systems, to do something big and novel like design APM systems, that each individual component will still be operating within it’s training distribution, as opposed to this requiring some AIs in the engineering project being fed inputs that are really weird, and might produce unanticipated behavior.
This seems like a much weaker concern though. For one thing, it seems like you ought to be able to put checks on whether a given AI component is being fed out-of-distribution inputs, and raising a flag for oversight whenever that happens.
It seems like Paul is imagining something CAIS-like, where you compose a bunch of AI abilities that are fairly robust in their behavior, and then conglomerate them into large projects that do big things, much like human organizations.
(Unless I’m misunderstanding, in which case the rest of this comment is obviated.)
It seems like this working depends on two factors:
First of all, it needs to be the case that conglomerations like this are competitive with giant models that are a single unified brain.
On first pass, this assumption seems pretty untrue? The communication bandwidth, and ability to operate as a unit, of people in an organization, is much much lower than that of the sub-modules of a person’s brain.
Second it supposes that when you compose a bunch of AI systems, to do something big and novel like design APM systems, that each individual component will still be operating within it’s training distribution, as opposed to this requiring some AIs in the engineering project being fed inputs that are really weird, and might produce unanticipated behavior.
This seems like a much weaker concern though. For one thing, it seems like you ought to be able to put checks on whether a given AI component is being fed out-of-distribution inputs, and raising a flag for oversight whenever that happens.