FWIW I upvoted but disagree with the end part (hurray for more nuance in voting!)
I think “reward is the antecedent-computation-reinforcer” will probably be true in RL algorithms that scale to AGI
At least from my epistemic position there looks like an explanation/communication gap here: I don’t think we can be as confident of this. To me this claim seems to preclude ‘creative’ forward-looking exploratory behaviour and model-based planning, which have more of a probingness and less of a merely-antecedent-computation-reinforcingness. But I see other comments from you here which talk about foresighted exploration (and foresighted non-exploration!) and I know you’ve written about these things at length. How are you squaring/nuancing these things? (Silence or a link to an already-written post will not be deemed rude.)
FWIW I upvoted but disagree with the end part (hurray for more nuance in voting!)
At least from my epistemic position there looks like an explanation/communication gap here: I don’t think we can be as confident of this. To me this claim seems to preclude ‘creative’ forward-looking exploratory behaviour and model-based planning, which have more of a probingness and less of a merely-antecedent-computation-reinforcingness. But I see other comments from you here which talk about foresighted exploration (and foresighted non-exploration!) and I know you’ve written about these things at length. How are you squaring/nuancing these things? (Silence or a link to an already-written post will not be deemed rude.)