My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don’t need to be wrong in all those steps, one of them is just enough.
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.
This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step (“reasons not to believe in doom: [...] Orthogonality of intelligence and agency”), rather than just pointing out that there were many argumentative steps.
Anyway, addressing the point about there being many argumentative steps: I partially agree, although I’m not very convinced since there seems to be significant redundancy in arguments for AI risk (e.g., multiple fuzzy heuristics suggesting there’s risk, multiple reasons to expect misalignment, multiple actors who could be careless, multiple ways misaligned AI could gain influence under multiple scenarios).
Maybe, although here are six reasons to think otherwise:
There are reasons to think they will have an easy time coordinating:
(1) As mentioned, a very plausible scenario is that many of these AI systems will be copies of some specific model. To the extent that the model has goals, all these copies of any single model would have the same goal. This seems like it would make coordination much easier.
(2) Computer programs may be able to give credible signals through open-source code, facilitating cooperation.
(3) Focal points of coordination may come up and facilitate coordination, as they often do with humans.
(4) If they are initially in conflict, this will create competitive selection pressures for well-coordinated groups (much like how coordinated human states arise from anarchy).
(5) They may coordinate due to decision theoretic considerations.
(Humans may be able to mitigate coordination earlier on, but this gets harder as their number and/or capabilities grow.)
(6) Regardless, they might not need to (widely) coordinate; overwhelming numbers of uncoordinated actors may be risky enough (especially if there is some local coordination, which seems likely for the above reasons).
With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task.
Just like human cognitive diversity is useful to act in the world it’s likely also more effective to have slight divergence in AGI models.