Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together
Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). (And even if some AI developers are cautious enough to not build such systems, less cautious AI developers will, in the absence of strong coordination.)
Also, (2) and (3) seem like reasons why a single AI system may be unable to disempower humanity. Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)? Their numbers might make up for their limited knowledge and limited plans.
(Admittedly, in these scenarios, people might have significantly more time to figure things out.)
Or as Paul Christiano puts it (potentially in making a different point):
At the same time, it becomes increasingly difficult for humans to directly control what happens in a world where nearly all productive work, including management, investment, and the design of new machines, is being done by machines. We can imagine a scenario in which humans continue to make all goal-oriented decisions about the management of PepsiCo but are assisted by an increasingly elaborate network of prosthetics and assistants. But I think human management becomes increasingly implausible as the size of the world grows (imagine a minority of 7 billion humans trying to manage the equivalent of 7 trillion knowledge workers; then imagine 70 trillion), and as machines’ abilities to plan and decide outstrip humans’ by a widening margin. In this world, the AIs that are left to do their own thing outnumber and outperform those which remain under close management of humans.
Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic
Yes, that might be true. It can also be true that. there are really no limits to the things that can be planned, It can also be true that the machine does really want to kill us all for some reason. My problem, in general, is not that AGI doom cannot happen. My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)?
This is fantastic, you just formulated a new reason:
5. The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.
Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don’t in principle need to go together, in practice they’ll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). (And even if some AI developers are cautious enough to not build such systems, less cautious AI developers will, in the absence of strong coordination.)
Also, (2) and (3) seem like reasons why a single AI system may be unable to disempower humanity. Even if we accept that, how relevant will these points be when there is a huge number of highly capable AI systems (which may happen because of the ease and economic benefits of replicating highly capable AI systems)? Their numbers might make up for their limited knowledge and limited plans.
(Admittedly, in these scenarios, people might have significantly more time to figure things out.)
Or as Paul Christiano puts it (potentially in making a different point):
Yes, that might be true. It can also be true that. there are really no limits to the things that can be planned, It can also be true that the machine does really want to kill us all for some reason. My problem, in general, is not that AGI doom cannot happen. My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
This is fantastic, you just formulated a new reason:
5. The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.