A) You seem to agree that in principle more goal-directed agents would be more capable. I think this alone implies that those will be the dominant force in the future no matter if they are rare among many less goal-directed agents.
B) I’m deeply unsure about this and have conflicting intuitions. On the one hand, if you thing total utilitarianism is true any world where AI is not explicitly maximizing for total utility is much much worse than one where it is. On the other hand, I agree that humans are able to agree.
C) I think you are missing two key features of AI: a) it can hide for many years (e.g., on servers or distributed across many local computers) and move very slowly. Thus, even if it is not much smarter than we are today, as long as it has goals conflicting with ours, it would try to devise plans to acquire power, e.g., through manipulation, thoughtful financial management, or hacking. b) AI can just copy itself thousands of times, and it will be able to cooperate very easily since it can model the other instances of itself well. If I were copied 100,000 times, I’m reasonably confident that I could devise plans to take over the world collectively.
A) You seem to agree that in principle more goal-directed agents would be more capable. I think this alone implies that those will be the dominant force in the future no matter if they are rare among many less goal-directed agents.
B) I’m deeply unsure about this and have conflicting intuitions. On the one hand, if you thing total utilitarianism is true any world where AI is not explicitly maximizing for total utility is much much worse than one where it is. On the other hand, I agree that humans are able to agree.
C) I think you are missing two key features of AI: a) it can hide for many years (e.g., on servers or distributed across many local computers) and move very slowly. Thus, even if it is not much smarter than we are today, as long as it has goals conflicting with ours, it would try to devise plans to acquire power, e.g., through manipulation, thoughtful financial management, or hacking. b) AI can just copy itself thousands of times, and it will be able to cooperate very easily since it can model the other instances of itself well. If I were copied 100,000 times, I’m reasonably confident that I could devise plans to take over the world collectively.