baturinsky comments on To determine alignment difficulty, we need to know the absolute difficulty of alignment generalization

baturinsky 14 Mar 2023 7:29 UTC
1 point
0
1. As I see it, aligned AI should understand the humanity’s value function and choose actions that lead to reality where this value is expected to be bigger.
But it also should understand that both it’s understanding of the value function, it’s ability to approximate value for given reality, and it’s ability to prognose which action leads to which reality, is flawed. And so is ability of people to do the same.
So, AI should not just choose the action that gives the bigger expected value for most likely interpretation of value and most likely outcome. It should consider all the spectrum of the possibilities, especially the worst possible ones. Even the possibility that it’s understanding is wrong completely, or will become completely wrong in the future due to hack, coding mistake or anyother reason. So, it should take care of protecting people from itself too.
AI should keep looking to refine it’s understanding of human value. Including by seeking feedback from humans, but the feedback that is based on honesty, knowledge and the free will. Response that is given under the influence of extortion, maniplation or ignorance are misleading, so AI should not try to “cheat” the convenient answer this way.