My main note is that my comment was just about the concept of rigging a learning process given a fixed prior over rewards. I certainly agree that the general strategy of “update a distribution over reward functions” has lots of as-yet-unsolved problems.
Ah, ok, I see ^_^ Thanks for making me write this post, though, as it has useful things for other people to see, that I had been meaning to write up for some time.
On your main point: if the prior and updating process are over things that are truly beyond the AI’s influence, then there will be no rigging (or, in my terms: uninfluenceable->unriggable). But there are many things that look like this, that are entirely riggable. For example, “have a prior 50-50 on cake and death, and update according to what the programmer says”. This seems to be a prior-and-update combination, but it’s entirely riggable.
So, another way of seeing my paper is “this thing looks like a prior-and-update process. If it’s also unriggable, then (given certain assumptions) it’s truly beyond the AI’s influence”.
Ah, ok, I see ^_^ Thanks for making me write this post, though, as it has useful things for other people to see, that I had been meaning to write up for some time.
On your main point: if the prior and updating process are over things that are truly beyond the AI’s influence, then there will be no rigging (or, in my terms: uninfluenceable->unriggable). But there are many things that look like this, that are entirely riggable. For example, “have a prior 50-50 on cake and death, and update according to what the programmer says”. This seems to be a prior-and-update combination, but it’s entirely riggable.
So, another way of seeing my paper is “this thing looks like a prior-and-update process. If it’s also unriggable, then (given certain assumptions) it’s truly beyond the AI’s influence”.