Long before they knew about reward circuitry, humans noticed that e.g. vices are behavioral attractors, with vice → more propensity to do the vice next time → vice, in a vicious cycle. They noticed that far before they noticed that they had reward circuitry causing the internal reinforcement events. If you’re predicting future observations via eg SSL, I think it becomes important to (at least crudely) model effects of value drift during training.
I’m not saying the AI won’t care about reward at all. I think it’ll be a secondary value, but that was sideways of my point here. In this quote, I was arguing that the AI would be quite able to avoid a “vice” (the blueberry) by modeling the value drift on some level. I was showing a sufficient condition for the “global maximum” picture getting a wrench thrown in it.
When, quantitatively, should that happen, where the agent steps around the planning process? Not sure.
Long before they knew about reward circuitry, humans noticed that e.g. vices are behavioral attractors, with vice → more propensity to do the vice next time → vice, in a vicious cycle. They noticed that far before they noticed that they had reward circuitry causing the internal reinforcement events. If you’re predicting future observations via eg SSL, I think it becomes important to (at least crudely) model effects of value drift during training.
I’m not saying the AI won’t care about reward at all. I think it’ll be a secondary value, but that was sideways of my point here. In this quote, I was arguing that the AI would be quite able to avoid a “vice” (the blueberry) by modeling the value drift on some level. I was showing a sufficient condition for the “global maximum” picture getting a wrench thrown in it.
When, quantitatively, should that happen, where the agent steps around the planning process? Not sure.