A standard trick reveals that knowing whether a problem has a solution is almost as helpful as knowing the solution. Here is a (very inefficient) way to use this ability
I figure the AI will be smart enough to recognize the strategy you are using. In that case, it can choose to not cooperate and output a malicious solution. If the different AIs you are running are similar enough, it’s not improbable for all of them to come to the same conclusion. In fact, I feel like there is a sort of convergence for the most “unfriendly” output they can give. If that’s the case, all UFAIs will give the same output.
It can choose to not cooperate, but it will only do so if thats what it wants to do. The genie I have described wants to cooperate. An AGI of any of the forms I have described would want to cooperate. Now you can claim that I can’t build an AGI with any easily controlled utility function at all, but this is very much a harder claim.
I figure the AI will be smart enough to recognize the strategy you are using. In that case, it can choose to not cooperate and output a malicious solution. If the different AIs you are running are similar enough, it’s not improbable for all of them to come to the same conclusion. In fact, I feel like there is a sort of convergence for the most “unfriendly” output they can give. If that’s the case, all UFAIs will give the same output.
It can choose to not cooperate, but it will only do so if thats what it wants to do. The genie I have described wants to cooperate. An AGI of any of the forms I have described would want to cooperate. Now you can claim that I can’t build an AGI with any easily controlled utility function at all, but this is very much a harder claim.