But why would that strategy be selected? Where does the selection come from? Will the designers toss a really impressive AI for not getting reward on that one timestep? I think not.
And once it has the ability, it is likely to be pushed in the direction of exercising that ability (since doing so would increase its reward).
But why would that strategy be selected? Where does the selection come from? Will the designers toss a really impressive AI for not getting reward on that one timestep? I think not.
Why? I maintain that the agent would not do so, unless it were already terminally motivated by reward. For empirical example, some neuroscientists know that brain stimulation reward leads to higher reward, and the brain very likely does some kind of reinforcement learning, so why don’t neuroscientists wirehead themselves?
I was talking about gradient descent here, not designers.