Not at all. That’s part of what makes it hard. You still have to engineer an AI to maximize that loss function and not some intermediate target, using the actual ML methods that have yet to be pioneered, even if you have such a literal utility function to measure out rewards with. If after your training loop you create some sort of mesa-optimizer that optimizes not-quite-that-loss function, you lose.
Not at all. That’s part of what makes it hard. You still have to engineer an AI to maximize that loss function and not some intermediate target, using the actual ML methods that have yet to be pioneered, even if you have such a literal utility function to measure out rewards with. If after your training loop you create some sort of mesa-optimizer that optimizes not-quite-that-loss function, you lose.