We wrote a bit about this in this post. In short, I don’t think it’s a necessary assumption. In my view, it does seem like we need objectives that can (and will) be updated over time, and for this we probably need some kind of counteracting reward system, since an individual agent is going to be incentivized to avoid a modification to its goal. The design of this kind of adaptive counteracting reward system (like what we humans have) is certainly a very difficult problem, but probably not any harder than aligning a superintelligence with a fixed goal.
Stuart Russell also makes an uncertain objective that can change one of the centerpieces of his agenda (that being said, not many seem to be actually working on it).
We wrote a bit about this in this post. In short, I don’t think it’s a necessary assumption. In my view, it does seem like we need objectives that can (and will) be updated over time, and for this we probably need some kind of counteracting reward system, since an individual agent is going to be incentivized to avoid a modification to its goal. The design of this kind of adaptive counteracting reward system (like what we humans have) is certainly a very difficult problem, but probably not any harder than aligning a superintelligence with a fixed goal.
Stuart Russell also makes an uncertain objective that can change one of the centerpieces of his agenda (that being said, not many seem to be actually working on it).