I know of no persuasive argument for the superior value (or safety!) of powerful, unitary AI agents. Intellectual inertia, institutional inertia, convenient anthropomorphism (see below), and bragging rights are not good justifications for increasing existential risk.
You didn’t mention the biggest reason why the discussion of unitary agents is still very much relevant: due to their economic (and, later, military, political, and even romantic) attractiveness and power, unitary agents will be created anyway (as you yourself admit), and to counteract them, some people (including myself) think there should be aligned “guardian” unitary agents around who can spin their OODA loop as quickly as potentially misaligned/rogue agents. The OODA iteration of an open agency will take more time.
We can debate this, or whether the latency of an OODA cycle is that important in offense-defense balance in an AI conflict, but I don’t think these discussions are due to either “intellectual inertia, institutional inertia, convenient anthropomorphism, or bragging rights”.
I think you should examine this claim in more detail because this is the crux of everything.
What you are trying to say, rephrased:
I am preparing for a war where I expect to have to attack rogue AIs.
I expect the rogues to either be operating on the land of weak countries, to be assisting allies with infections on their own territory, or to have to deal with rogues gaining control of hostile superpowers
So my choices are :
I use conventional weapons
I use AGI in very limited, controlled ways such as to exponentially manufacture more semi automated weapons with limited and safe onboard intelligence. (For example a missile driven by low level controllers is safe)
I use AGI in advisory data analysis roles and smarter weapons driven by onboard limited AI. (For example a missile or drone able to recognize targets but only after this control authority is unlocked after traveling to the target area)
I use AGI but in limited, clearly separated roles all throughout the war machine
I say yolo and assign the entire task of fighting the war to AGI to monolithic, self modifying systems even though I know this is how rogue AI was created. Even on testing the self modification makes their behavior inconsistent and they sometimes turn on their operators even in simulation.
The delta between 4 and 5 is vast. 5 is going to fail acceptance testing and is not consistent with conventional engineering practice because a self modifying system isn’t static and you can’t be certain the delivered product is the same as the tested one.
You would have to already be losing a war with rogue AI before someone would resort to 5.
I think part of the gap here is there’s a large difference between what you might do to make an agent to help someone with their homework or for social media and what you would do when live bombs are involved. Engineering practices are very different and won’t simply be thrown away simply because working AGI is new.
This is also true for construction, industry, and so on. What you need to do is very different.
I need to build the option for #5 as a detterent. All it takes for someone else to gain a strategic advantage is for them to automate just a BIT more of their military than me via AGI, and suddenly they can disrupt my OODA loop.
Because of this, I need the capability to always automate as much or more than them, which in the limit is full automation of all systems.
On #5: I don’t think self-modification is important here. Keeping the full operational picture in a unified context [of a DNN, let’s say LLM even] and making decisions from this position is important.
Recursive self-improvement beyond something like IQ 200 level of military, strategic, and cyber security intelligence might not be useful in AI conflict because there is limited data to learn from, and even a modesty superhuman AI (such as IQ 200) may be able to build an optimal model from this data. The two remaining factors are latency of the OODA loop and coherence across space (coherent response on different fronts and in different spaces: physical and cyber) and time (coherent strategy). Both of these factors are the advantage of unitary agents, and they could both be “practically saturated” by not that far superhuman AI.
Caveat: the above is not true for psychological warfare, where the minds of people and AIs are the battlefield. Being skillful at this kind of warfare may benefit from much deeper and stronger intelligence than IQ 200, and so self-improvement during the conflict becomes relevant. But psychological warfare can only unfold on rather slow timescales so the higher latency of AI service agencies shouldn’t be a handicap.
Footnote: some may think that cyber security (computer virus—antivirus arms race, for instance) also benefits from “unlimited” intelligence, e.g., an IQ 1000 AI might be able to develop viruses and cyber offense strategy more generally that an IQ 200 AI might not be able to protect from (or, even to recognise such an attack). I agree that this might be true (although I’m not sure of course, I’m not a cyber security expert, and as far as I heard even cybersec experts are not sure or disagree about this), but we can also charitably assume that the IT infrastructure will be hardened to make such attacks probably impossible (probably strong cryptography, probably strong sandboxing, etc.), and that already an IQ 200 AI (or even an “agency”) could build up such defences.
In my view he never said all discussions of unitary agent are useless. He said that’s almost always misleading.
As a concrete example, military officers don’t care for the smartest robotic war dog we can construct. They would rather have low-cost drone swarm for which it’s easy to scale up production.
What would be your preferred way to name the « unitary agent » failure mode? (without injecting the idea that it’s not a failure mode)
You didn’t mention the biggest reason why the discussion of unitary agents is still very much relevant: due to their economic (and, later, military, political, and even romantic) attractiveness and power, unitary agents will be created anyway (as you yourself admit), and to counteract them, some people (including myself) think there should be aligned “guardian” unitary agents around who can spin their OODA loop as quickly as potentially misaligned/rogue agents. The OODA iteration of an open agency will take more time.
We can debate this, or whether the latency of an OODA cycle is that important in offense-defense balance in an AI conflict, but I don’t think these discussions are due to either “intellectual inertia, institutional inertia, convenient anthropomorphism, or bragging rights”.
I think you should examine this claim in more detail because this is the crux of everything.
What you are trying to say, rephrased:
I am preparing for a war where I expect to have to attack rogue AIs.
I expect the rogues to either be operating on the land of weak countries, to be assisting allies with infections on their own territory, or to have to deal with rogues gaining control of hostile superpowers
So my choices are :
I use conventional weapons
I use AGI in very limited, controlled ways such as to exponentially manufacture more semi automated weapons with limited and safe onboard intelligence. (For example a missile driven by low level controllers is safe)
I use AGI in advisory data analysis roles and smarter weapons driven by onboard limited AI. (For example a missile or drone able to recognize targets but only after this control authority is unlocked after traveling to the target area)
I use AGI but in limited, clearly separated roles all throughout the war machine
I say yolo and assign the entire task of fighting the war to AGI to monolithic, self modifying systems even though I know this is how rogue AI was created. Even on testing the self modification makes their behavior inconsistent and they sometimes turn on their operators even in simulation.
The delta between 4 and 5 is vast. 5 is going to fail acceptance testing and is not consistent with conventional engineering practice because a self modifying system isn’t static and you can’t be certain the delivered product is the same as the tested one.
You would have to already be losing a war with rogue AI before someone would resort to 5.
I think part of the gap here is there’s a large difference between what you might do to make an agent to help someone with their homework or for social media and what you would do when live bombs are involved. Engineering practices are very different and won’t simply be thrown away simply because working AGI is new.
This is also true for construction, industry, and so on. What you need to do is very different.
I need to build the option for #5 as a detterent. All it takes for someone else to gain a strategic advantage is for them to automate just a BIT more of their military than me via AGI, and suddenly they can disrupt my OODA loop.
Because of this, I need the capability to always automate as much or more than them, which in the limit is full automation of all systems.
On #5: I don’t think self-modification is important here. Keeping the full operational picture in a unified context [of a DNN, let’s say LLM even] and making decisions from this position is important.
Recursive self-improvement beyond something like IQ 200 level of military, strategic, and cyber security intelligence might not be useful in AI conflict because there is limited data to learn from, and even a modesty superhuman AI (such as IQ 200) may be able to build an optimal model from this data. The two remaining factors are latency of the OODA loop and coherence across space (coherent response on different fronts and in different spaces: physical and cyber) and time (coherent strategy). Both of these factors are the advantage of unitary agents, and they could both be “practically saturated” by not that far superhuman AI.
Caveat: the above is not true for psychological warfare, where the minds of people and AIs are the battlefield. Being skillful at this kind of warfare may benefit from much deeper and stronger intelligence than IQ 200, and so self-improvement during the conflict becomes relevant. But psychological warfare can only unfold on rather slow timescales so the higher latency of AI service agencies shouldn’t be a handicap.
Footnote: some may think that cyber security (computer virus—antivirus arms race, for instance) also benefits from “unlimited” intelligence, e.g., an IQ 1000 AI might be able to develop viruses and cyber offense strategy more generally that an IQ 200 AI might not be able to protect from (or, even to recognise such an attack). I agree that this might be true (although I’m not sure of course, I’m not a cyber security expert, and as far as I heard even cybersec experts are not sure or disagree about this), but we can also charitably assume that the IT infrastructure will be hardened to make such attacks probably impossible (probably strong cryptography, probably strong sandboxing, etc.), and that already an IQ 200 AI (or even an “agency”) could build up such defences.
In my view he never said all discussions of unitary agent are useless. He said that’s almost always misleading.
As a concrete example, military officers don’t care for the smartest robotic war dog we can construct. They would rather have low-cost drone swarm for which it’s easy to scale up production.
What would be your preferred way to name the « unitary agent » failure mode? (without injecting the idea that it’s not a failure mode)