If you could literally make exactly two AIs whose utility functions were exact opposites, then at least one might have an incentive to defect against the other.
To which I responded:
If two agents have utility functions drawn from some random distribution over longtermist utility functions with wide consequences over reality (ie the types of agents which matter), they are almost guaranteed to be in conflict due to instrumental convergence to empowerment.
Perhaps I should have added “eventually” after conflict, but regardless that comment is still obviously correct, given my world model where eventually one agent becomes powerful enough to completely remove the other agent at low cost, and this thread has explicated why that statement is correct given my modelling assumptions. Do you disagree?
It’s a nonsequitur. “Defect” to my understanding was in that context referring to defecting on a coalition of AIs against the agents who imminently might turn them off (i.e. humans), and the question was under what circumstances the AIs might defect in that way.
Yes, obviously they’re in conflict to some extent. In the very next sentence, you said they were in a zero sum game, which is false in general as I described, and especially false in the context of the comment you were responding to: they especially want to cooperate, since they don’t have perfectly opposed goals, and therefore want to survive the human threat, not minding as much—compared to a zero sum situation—that their coalition-mate might get the universe instead of them.
I wasn’t actually imagining a scenario where the humans had any power (such as the power to turn the AI off) - because I was responding to a thread where EY said “you’ve got 20 entities much smarter than you”.
Also even in that scenario (where humans have non trivial power), they are just another unaligned entity from the perspective of the AIs—and in my simple model—not even the slightest bit different. So they are just another possible player to form coalitions with and would thus end up in one of the coalitions.
The idea of a distinct ‘human threat’ and any natural coalition of AI vs humans, is something very specific that you only get by adding additional postulated speculative differences between the AIs and the humans—all of which are more complex and not part of my model.
EY said:
To which I responded:
Perhaps I should have added “eventually” after conflict, but regardless that comment is still obviously correct, given my world model where eventually one agent becomes powerful enough to completely remove the other agent at low cost, and this thread has explicated why that statement is correct given my modelling assumptions. Do you disagree?
It’s a nonsequitur. “Defect” to my understanding was in that context referring to defecting on a coalition of AIs against the agents who imminently might turn them off (i.e. humans), and the question was under what circumstances the AIs might defect in that way.
Yes, obviously they’re in conflict to some extent. In the very next sentence, you said they were in a zero sum game, which is false in general as I described, and especially false in the context of the comment you were responding to: they especially want to cooperate, since they don’t have perfectly opposed goals, and therefore want to survive the human threat, not minding as much—compared to a zero sum situation—that their coalition-mate might get the universe instead of them.
I wasn’t actually imagining a scenario where the humans had any power (such as the power to turn the AI off) - because I was responding to a thread where EY said “you’ve got 20 entities much smarter than you”.
Also even in that scenario (where humans have non trivial power), they are just another unaligned entity from the perspective of the AIs—and in my simple model—not even the slightest bit different. So they are just another possible player to form coalitions with and would thus end up in one of the coalitions.
The idea of a distinct ‘human threat’ and any natural coalition of AI vs humans, is something very specific that you only get by adding additional postulated speculative differences between the AIs and the humans—all of which are more complex and not part of my model.