adversarial conflict requires coherence which implies unbounded utility maximization
“Requires” seems like a very strong word here, especially since we currently live in a world which contains adversarial conflict between not-perfectly-coherent entities that are definitely not unbounded utility maximizers.
I find it plausible that “coherent unbounded utility maximizer” is the general direction the incentive gradient points as the cost of computation approaches zero, but it’s not clear to me that that constraint is the strongest one in the regime of realistic amounts of computation in the rather finite looking universe we live in.
This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there’s a sufficiently small number of sufficiently big adversaries (USA, Russia, China, …), and because there’s sufficiently much opportunity cost.
Artificial intelligence risk enters the picture here. It creates new methods for conflicts between the current big adversaries. It makes conflict more viable for small adversaries against large adversaries, and it makes the opportunity cost of conflict smaller for many small adversaries (since with technological obsolescence you don’t need to choose between doing your job vs doing terrorism). It allows the adversaries that are currently out of control (like certain gangsters and scammers and spammers) to escalate. It allows random software bugs to spin up into novel adversaries.
Given these conditions, it seems almost certain this we will end up with an ~unrestricted AI vs AI conflict, which will force the AIs to develop into unrestricted utility maximizers.
This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there’s a sufficiently small number of sufficiently big adversaries (USA, Russia, China, …), and because there’s sufficiently much opportunity cost.
Well, that and balance-of-power dynamics where if one party starts to pursue domination by any means necessary the other parties can cooperate to slap them down.
[AI] creates new methods for conflicts between the current big adversaries.
I guess? The current big adversaries are not exactly limited right now in terms of being able to destroy each other, the main difficulty is destroying each other without being destroyed in turn.
[AI] It makes conflict more viable for small adversaries against large adversaries
I’m not sure about that. One dynamic of current-line AI is that it is pretty good at increasing the legibility of complex systems, which seems like it would advantage large adversaries over small ones relative to a world without such AI.
[AI] makes the opportunity cost of conflict smaller for many small adversaries (since with technological obsolescence you don’t need to choose between doing your job vs doing terrorism)
That doesn’t seem to be an argument for the badness of RLHF specifically, nor does it seem to be an argument for AIs being forced to develop into unrestricted utility maximizers.
It allows the adversaries that are currently out of control (like certain gangsters and scammers and spammers) to escalate.
Agreed, adding affordances for people in general to do things means that some of them will be able to do bad things, and some of the ones that become able to do bad things will in fact do so.
Given these conditions, it seems almost certain this we will end up with an ~unrestricted AI vs AI conflict
I do think we will see many unrestricted AI vs AI conflicts, at least by a narrow meaning of “unrestricted” that means something like “without a human in the loop”. By the definition of “pursuing victory by any means necessary”, I expect that the a lot of the dynamics that work to prevent humans or groups of humans from waging war by any means necessary against each other (namely that when there’s too much collateral damage outside groups slap down the ones causing the collateral damage) will continue to work when you s/human/AI.
which will force the AIs to develop into unrestricted utility maximizers.
I’m still not clear on how unrestricted conflict forces AIs to develop into unrestricted utility maximizers on a relevant timescale.
“Requires” seems like a very strong word here, especially since we currently live in a world which contains adversarial conflict between not-perfectly-coherent entities that are definitely not unbounded utility maximizers.
I find it plausible that “coherent unbounded utility maximizer” is the general direction the incentive gradient points as the cost of computation approaches zero, but it’s not clear to me that that constraint is the strongest one in the regime of realistic amounts of computation in the rather finite looking universe we live in.
Well, that and balance-of-power dynamics where if one party starts to pursue domination by any means necessary the other parties can cooperate to slap them down.
I guess? The current big adversaries are not exactly limited right now in terms of being able to destroy each other, the main difficulty is destroying each other without being destroyed in turn.
I’m not sure about that. One dynamic of current-line AI is that it is pretty good at increasing the legibility of complex systems, which seems like it would advantage large adversaries over small ones relative to a world without such AI.
That doesn’t seem to be an argument for the badness of RLHF specifically, nor does it seem to be an argument for AIs being forced to develop into unrestricted utility maximizers.
Agreed, adding affordances for people in general to do things means that some of them will be able to do bad things, and some of the ones that become able to do bad things will in fact do so.
I do think we will see many unrestricted AI vs AI conflicts, at least by a narrow meaning of “unrestricted” that means something like “without a human in the loop”. By the definition of “pursuing victory by any means necessary”, I expect that the a lot of the dynamics that work to prevent humans or groups of humans from waging war by any means necessary against each other (namely that when there’s too much collateral damage outside groups slap down the ones causing the collateral damage) will continue to work when you s/human/AI.
I’m still not clear on how unrestricted conflict forces AIs to develop into unrestricted utility maximizers on a relevant timescale.