Has the “alignment roadblock” scenario been argued for anywhere?
Like Lanrian, I think it sounds implausible. My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem. For example, the AI which can talk its way out of a box probably has a very deep understanding of humans—a deeper understanding than most humans have of humans! In order to have such a deep understanding, it must have lower-level building blocks for making sense of the world which work extremely well, and could be used for a value learning system.
BTW, coincidentally, I quoted this same passage in a post I wrote recently which discussed this scenario (among others). Is there a particular subscenario of this I outlined which seems especially plausible to you?
My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem.
Especially because taking over the world requires you to be much better than other agents who want to stop you from taking over the world, which could very well include other AIs.
ETA: That said, upon reflection, there have been instances of people taking over large parts of the world without being superhuman. All world leaders qualify, and it isn’t that unusual. However, what would be unusual is if someone wanted to take over the world and everyone else didn’t want that yet it still happened.
In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.
Additionally, AGI differ fundamentally from humans because the can spin up multiple copies of themselves when they get more resources while human beings can’t similarly scale their power when they have access to more food.
The best human hacker can’t run a cyber war alone but if he could spin of 100,000 copies of themselves he could find enough 0 days to hack into all important computer systems.
In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.
Agreed this is a risk, but I wouldn’t call this an alignment roadblock.
Has the “alignment roadblock” scenario been argued for anywhere?
Like Lanrian, I think it sounds implausible. My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem. For example, the AI which can talk its way out of a box probably has a very deep understanding of humans—a deeper understanding than most humans have of humans! In order to have such a deep understanding, it must have lower-level building blocks for making sense of the world which work extremely well, and could be used for a value learning system.
BTW, coincidentally, I quoted this same passage in a post I wrote recently which discussed this scenario (among others). Is there a particular subscenario of this I outlined which seems especially plausible to you?
Especially because taking over the world requires you to be much better than other agents who want to stop you from taking over the world, which could very well include other AIs.
ETA: That said, upon reflection, there have been instances of people taking over large parts of the world without being superhuman. All world leaders qualify, and it isn’t that unusual. However, what would be unusual is if someone wanted to take over the world and everyone else didn’t want that yet it still happened.
In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.
Additionally, AGI differ fundamentally from humans because the can spin up multiple copies of themselves when they get more resources while human beings can’t similarly scale their power when they have access to more food.
The best human hacker can’t run a cyber war alone but if he could spin of 100,000 copies of themselves he could find enough 0 days to hack into all important computer systems.
Agreed this is a risk, but I wouldn’t call this an alignment roadblock.