The least convenient possible world is one with superhumanly intelligent AIs that can have complete confidence in their source code, and predict with complete confidence that these means (thuggishness) will in fact lead to those ends (saving the world).
However in that world the world has already been saved (or destroyed) and so this is not relevant. In any relevant world the actor who is resorting to thuggishness to save the world is a human running on hostile hardware, and would be stupid not to take that into consideration.
I consider the “P” in LCPW to be important. If the agents in question are post-human then it’s too late to worry about saving the world. If you still have to save the world, then standard human failure modes do apply.
Consider the least convenient possible world
The least convenient possible world is one with superhumanly intelligent AIs that can have complete confidence in their source code, and predict with complete confidence that these means (thuggishness) will in fact lead to those ends (saving the world).
However in that world the world has already been saved (or destroyed) and so this is not relevant. In any relevant world the actor who is resorting to thuggishness to save the world is a human running on hostile hardware, and would be stupid not to take that into consideration.
Then it isn’t the LCPW
I consider the “P” in LCPW to be important. If the agents in question are post-human then it’s too late to worry about saving the world. If you still have to save the world, then standard human failure modes do apply.