This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there’s a sufficiently small number of sufficiently big adversaries (USA, Russia, China, …), and because there’s sufficiently much opportunity cost.
Well, that and balance-of-power dynamics where if one party starts to pursue domination by any means necessary the other parties can cooperate to slap them down.
[AI] creates new methods for conflicts between the current big adversaries.
I guess? The current big adversaries are not exactly limited right now in terms of being able to destroy each other, the main difficulty is destroying each other without being destroyed in turn.
[AI] It makes conflict more viable for small adversaries against large adversaries
I’m not sure about that. One dynamic of current-line AI is that it is pretty good at increasing the legibility of complex systems, which seems like it would advantage large adversaries over small ones relative to a world without such AI.
[AI] makes the opportunity cost of conflict smaller for many small adversaries (since with technological obsolescence you don’t need to choose between doing your job vs doing terrorism)
That doesn’t seem to be an argument for the badness of RLHF specifically, nor does it seem to be an argument for AIs being forced to develop into unrestricted utility maximizers.
It allows the adversaries that are currently out of control (like certain gangsters and scammers and spammers) to escalate.
Agreed, adding affordances for people in general to do things means that some of them will be able to do bad things, and some of the ones that become able to do bad things will in fact do so.
Given these conditions, it seems almost certain this we will end up with an ~unrestricted AI vs AI conflict
I do think we will see many unrestricted AI vs AI conflicts, at least by a narrow meaning of “unrestricted” that means something like “without a human in the loop”. By the definition of “pursuing victory by any means necessary”, I expect that the a lot of the dynamics that work to prevent humans or groups of humans from waging war by any means necessary against each other (namely that when there’s too much collateral damage outside groups slap down the ones causing the collateral damage) will continue to work when you s/human/AI.
which will force the AIs to develop into unrestricted utility maximizers.
I’m still not clear on how unrestricted conflict forces AIs to develop into unrestricted utility maximizers on a relevant timescale.
Well, that and balance-of-power dynamics where if one party starts to pursue domination by any means necessary the other parties can cooperate to slap them down.
I guess? The current big adversaries are not exactly limited right now in terms of being able to destroy each other, the main difficulty is destroying each other without being destroyed in turn.
I’m not sure about that. One dynamic of current-line AI is that it is pretty good at increasing the legibility of complex systems, which seems like it would advantage large adversaries over small ones relative to a world without such AI.
That doesn’t seem to be an argument for the badness of RLHF specifically, nor does it seem to be an argument for AIs being forced to develop into unrestricted utility maximizers.
Agreed, adding affordances for people in general to do things means that some of them will be able to do bad things, and some of the ones that become able to do bad things will in fact do so.
I do think we will see many unrestricted AI vs AI conflicts, at least by a narrow meaning of “unrestricted” that means something like “without a human in the loop”. By the definition of “pursuing victory by any means necessary”, I expect that the a lot of the dynamics that work to prevent humans or groups of humans from waging war by any means necessary against each other (namely that when there’s too much collateral damage outside groups slap down the ones causing the collateral damage) will continue to work when you s/human/AI.
I’m still not clear on how unrestricted conflict forces AIs to develop into unrestricted utility maximizers on a relevant timescale.