Nate [replying to Eric Drexler]: I expect that, if you try to split these systems into services, then you either fail to capture the heart of intelligence and your siloed AIs are irrelevant, or you wind up with enough AGI in one of your siloes that you have a whole alignment problem (hard parts and all) in there. Like, I see this plan as basically saying “yep, that hard problem is in fact too hard, let’s try to dodge it, by having humans + narrow AI services perform the pivotal act”. Setting aside how I don’t particularly expect this to work, we can at least hopefully agree that it’s attempting to route around the problems that seem to me to be central, rather than attempting to solve them.
I think, in an open agency architecture, the silo that gets “enough AGI” is in step 2, and it is pointed at the desired objective by having formal specifications and model-checking against them.
But I also wouldn’t object to the charge that an open agency architecture would “route around the central problem,” if you define the central problem as something like building a system that you’d be happy for humanity to defer to forever. In the long run, something like more ambitious value learning (or value discovery) will be needed, on pain of astronomical waste. This would be, in a sense, a compromise (or, if you’re optimistic, a contingency plan), motivated by short timelines and insufficient theoretical progress toward full normative alignment.
if you define the central problem as something like building a system that you’d be happy for humanity to defer to forever.
[I at most skimmed the post, but] IMO this is a more ambitious goal than the IMO central problem. IMO the central problem (phrased with more assumptions than strictly necessary) is more like “building system that’s gaining a bunch of understanding you don’t already have, in whatever domains are necessary for achieving some impressive real-world task, without killing you”. So I’d guess that’s supposed to happen in step 1. It’s debatable how much you have to do that to end the acute risk period, for one thing because humanity collectively is already a really slow (too slow) version of that, but it’s a different goal than deferring permanently to an autonomous agent.
I’d say the scientific understanding happens in step 1, but I think that would be mostly consolidating science that’s already understood. (And some patching up potentially exploitable holes where AI can deduce that “if this is the best theory, the real dynamics must actually be like that instead”. But my intuition is that there aren’t many of these holes, and that unknown physics questions are mostly underdetermined by known data, at least for quite a long way toward the infinite-compute limit of Solomonoff induction, and possibly all the way.)
Engineering understanding would happen in step 2, and I think engineering is more “the generator of large effects on the world,” the place where much-faster-than-human ingenuity is needed, rather than hoping to find new science.
(Although the formalization of the model of scientific reality is important for the overall proposal—to facilitate validating that the engineering actually does what is desired—and building such a formalization would be hard for unaided humans.)
Nate [replying to Eric Drexler]: I expect that, if you try to split these systems into services, then you either fail to capture the heart of intelligence and your siloed AIs are irrelevant, or you wind up with enough AGI in one of your siloes that you have a whole alignment problem (hard parts and all) in there….
GTP-Nate is confusing the features of the AI services model with the argument that “Collusion among superintelligent oracles can readily be avoided”. As it says on the tin, there’s no assumption that intelligence must be limited. It is, instead, an argument that collusion among (super)intelligent systems is fragile under conditions that are quite natural to implement.
I think, in an open agency architecture, the silo that gets “enough AGI” is in step 2, and it is pointed at the desired objective by having formal specifications and model-checking against them.
But I also wouldn’t object to the charge that an open agency architecture would “route around the central problem,” if you define the central problem as something like building a system that you’d be happy for humanity to defer to forever. In the long run, something like more ambitious value learning (or value discovery) will be needed, on pain of astronomical waste. This would be, in a sense, a compromise (or, if you’re optimistic, a contingency plan), motivated by short timelines and insufficient theoretical progress toward full normative alignment.
[I at most skimmed the post, but] IMO this is a more ambitious goal than the IMO central problem. IMO the central problem (phrased with more assumptions than strictly necessary) is more like “building system that’s gaining a bunch of understanding you don’t already have, in whatever domains are necessary for achieving some impressive real-world task, without killing you”. So I’d guess that’s supposed to happen in step 1. It’s debatable how much you have to do that to end the acute risk period, for one thing because humanity collectively is already a really slow (too slow) version of that, but it’s a different goal than deferring permanently to an autonomous agent.
(I’d also flag this kind of proposal as being at risk of playing shell games with the generator of large effects on the world, though not particularly more than other proposals in a similar genre.)
I’d say the scientific understanding happens in step 1, but I think that would be mostly consolidating science that’s already understood. (And some patching up potentially exploitable holes where AI can deduce that “if this is the best theory, the real dynamics must actually be like that instead”. But my intuition is that there aren’t many of these holes, and that unknown physics questions are mostly underdetermined by known data, at least for quite a long way toward the infinite-compute limit of Solomonoff induction, and possibly all the way.)
Engineering understanding would happen in step 2, and I think engineering is more “the generator of large effects on the world,” the place where much-faster-than-human ingenuity is needed, rather than hoping to find new science.
(Although the formalization of the model of scientific reality is important for the overall proposal—to facilitate validating that the engineering actually does what is desired—and building such a formalization would be hard for unaided humans.)
GTP-Nate is confusing the features of the AI services model with the argument that “Collusion among superintelligent oracles can readily be avoided”. As it says on the tin, there’s no assumption that intelligence must be limited. It is, instead, an argument that collusion among (super)intelligent systems is fragile under conditions that are quite natural to implement.