cousin_it comments on Am I understanding the problem of fully updated deference correctly?

cousin_it 2 Oct 2018 9:33 UTC
2 points
The problem is more about corrigibility—making an AI that will shut down if you ask it to. The idea is that uncertainty about utility function isn’t a good way to implement corrigibility. When you tell the AI to shut down because you know more about the true utility function, it might reply “nah, I won’t shut down, I’ve learned enough to optimize the true utility now”.