The AI should never try to do something elaborately horrible, because it can get max utility easily enough from inside the simulation
...but never do anything useful either, since it’s going to spend all its time trying to figure out how to reach the INT_MAX utility point?
Or you could say that reaching the max utility point requires it to solve some problem we give it. But then this is just a slightly complicated way of saying that we give it goals which it tries to accomplish.
What about giving it some intra-sandbox goal (solve this math problem), and the INT_MAX functions as a safeguard—if it ever escapes, it’ll just turn itself off.
...but never do anything useful either, since it’s going to spend all its time trying to figure out how to reach the INT_MAX utility point?
Or you could say that reaching the max utility point requires it to solve some problem we give it. But then this is just a slightly complicated way of saying that we give it goals which it tries to accomplish.
What about giving it some intra-sandbox goal (solve this math problem), and the INT_MAX functions as a safeguard—if it ever escapes, it’ll just turn itself off.
I don’t understand how that’s meant to work.