AI alignment work typically treats AI’s as single entities. While I agree that this is a good approximation, I think they will be better described as a highly coordinated population of agents. Rather than simply grow in size and acquire more resources, AI’s will find it in their best interests to split into many smaller agents.
One reason to multiply is to save resources. It may be more efficient to break into smaller, simpler agents to handle multiple tasks rather than use a single large agent to handle all tasks.
Having many copies can leverage certain efficiencies of scale. It also offers the AI a way to increase it’s capability without needing to worry about creating an aligned agent.
Splitting into smaller components also has the benefit of reducing overall risk. A single agent faces a much higher probability of extinction compared to a population of agents. Having many copies allows the AI to diversify its strategy.
Under certain circumstances, copies may also be able to carry out a Sybil attack that a singleton could not, giving the AI more influence in the world.
Copies can gather resources more effectively as well. When an AI needs to cover a large area but cannot adequately control actions at every point, it makes sense to split into independent agents. This is particularly true for space expeditions, where the speed of light makes communication too slow to be useful for making quick decisions.
For these reasons, making copies is a convergent instrumental subgoal.
This whole post is hard to evaluate because you haven’t said what you mean by “single agent” vs “multiple agents”.
If you cut a human brain in half, put each half in different tanks and put laggy transmission and lossy compression between the halves, is that one agent or 2. (When cut in half, both halves of the human brain remain somewhat functional on there own. Does this make normal humans 2 agents?)
Maybe, or maybe not. We don’t really know what the efficiency limits on future AI’s are.
I do know that even on some very simple problems, there are amortization improvements. Solving 10 copies of the same problem sometimes takes < 10 times the compute of solving the problem once.
I also know that Fermats last theorem was solved by 1 human, not a barn full of chickens, even though the chickens have more total brain mass. (Maybe a sufficiently vast barn containing a really huge number of chickens could solve it (no they couldn’t), but that would at the very least be less efficient)
Its also possible to imagine situations where correlating observations would be really useful. Several workers at a hotel, each of which didn’t see anything too odd. But when sherlock homes interviews them all and ask them what they all saw, he can put together a clear picture of the whole crime. Any split into distinct agents that reduces ability to communicate and correlate observations between agents will make it harder to notice patterns split between multiple agents outputs.
An article comparing more people to fewer people. It doesn’t compare one big mind to lots of little ones.
If the AI is a neural net so utterly opaque that even the AI can’t understand it, this is true. Probably as a temporary measure while it works on alignment.
Suppose the AI is running code distributed across millions of computers, if half of those turn off, the code still runs just slower. (All the important data is on several hard drives) The algorithm is a single self contained thing that isn’t split up according to splits in the underlying hardware. Thats 1 agent and is low risk.
A singleton AI is incapable of making a million fake twitter accounts to help its posts go viral?
Why?
Imagine a computer on earth and one on mars. Each is able to take small decisions on short timescales in less than the 8 minute light lag. For their large scale, long timelines plans, that is split between both computers. Raw cognitive content is being beamed back and forth, not translated into any language.
Is this 2 agents? Kind of.
I agree there’s an important concept here.
One important countervailing consideration not mentioned in the OP or comments is indexical objectives/values[1]. In the presence of such indexical objectives, even a (computational) perfect clone may give rise to an adversary, because both instances will receive different inputs and accrue different state/context for their objectives to relate to.
cf nature where even perfect genetic clones can be in competition.
Meaning relative to context: location, person, whatnot. Not sure what resource is good but hopefully you get what I mean https://plato.stanford.edu/entries/indexicals/
I think the key contribution is uncovering the implicit assumptions behind the concept of a singleton agent. Is a nation-state a singleton? If not, why not?
I agree. I think it is pretty hard to cleanly divide something into a single agent or multiple agents in general. Practically, this means that it can be useful to model something using different frames (e.g. a single agent or as a collection of agents).