My understanding is that you can’t safely do even A with an arbitrarily powerful optimizer. An arbitrarily powerful optimizer who’s reward function is solely “beat the grandmaster” would do everything possible to ensure it’s reward function is maximised with the highest probability. For instance, it might amass as much compute as possible to ensure that it’s made no errors at all, it might armor it’s servers to ensure no one switches it off, and of course, it might pharmacologically mess with the grandmaster to inhibit their performance.
The fact that it can be done safely by a weak AI isn’t to say that it’s safe to do with a powerful AI.
For the purposes of this argument, I’m interested in what can be done safely by some AI we can build. If you can solve alignment safely with some AI, then you’re in a good situation. What an arbitrarily powerful optimiser will do isn’t the crux, we all agree that’s dangerous.
My understanding is that you can’t safely do even A with an arbitrarily powerful optimizer. An arbitrarily powerful optimizer who’s reward function is solely “beat the grandmaster” would do everything possible to ensure it’s reward function is maximised with the highest probability. For instance, it might amass as much compute as possible to ensure that it’s made no errors at all, it might armor it’s servers to ensure no one switches it off, and of course, it might pharmacologically mess with the grandmaster to inhibit their performance.
The fact that it can be done safely by a weak AI isn’t to say that it’s safe to do with a powerful AI.
For the purposes of this argument, I’m interested in what can be done safely by some AI we can build. If you can solve alignment safely with some AI, then you’re in a good situation. What an arbitrarily powerful optimiser will do isn’t the crux, we all agree that’s dangerous.
I see what you’re getting at. Interesting question.