Abram—I’ve gone back and forth a few times, but currently think that “gradient descent is myopic” arguments don’t carry through 100% when the predictions invoke memorized temporal sequences (and hierarchies or abstractions thereof) that extend arbitrarily far into the future. For example, if I think someone is about to start singing “Happy birthday”, I’m directly making a prediction about the very next moment, but I’m implicitly making a prediction about the next 30 seconds, and thus the prediction error feedback signal is not just retrospective but also partly prospective.
I agree that we should NOT expect “outputs to strategically make future inputs easier to predict”, but I think there might be a non-myopic tendency for outputs that strategically make the future conform to the a priori prediction. See here, including the comments, for my discussion, trying to get my head around this.
Anyway, if that’s right, that would seem to be the exact type of non-myopia needed for a hierarchical Bayesian prediction machine to also be able to act as a hierarchical control system. (And sorry again if I’m just being confused.)
I appreciate your thoughts! My own thinking on this is rapidly shifting and I regret that I’m not producing more posts about it right now. I will try to comment further on your linked post. Feel encouraged to PM me if you write/wrote more in this and think I might have missed it; I’m pretty interested in this right now.
Abram—I’ve gone back and forth a few times, but currently think that “gradient descent is myopic” arguments don’t carry through 100% when the predictions invoke memorized temporal sequences (and hierarchies or abstractions thereof) that extend arbitrarily far into the future. For example, if I think someone is about to start singing “Happy birthday”, I’m directly making a prediction about the very next moment, but I’m implicitly making a prediction about the next 30 seconds, and thus the prediction error feedback signal is not just retrospective but also partly prospective.
I agree that we should NOT expect “outputs to strategically make future inputs easier to predict”, but I think there might be a non-myopic tendency for outputs that strategically make the future conform to the a priori prediction. See here, including the comments, for my discussion, trying to get my head around this.
Anyway, if that’s right, that would seem to be the exact type of non-myopia needed for a hierarchical Bayesian prediction machine to also be able to act as a hierarchical control system. (And sorry again if I’m just being confused.)
I appreciate your thoughts! My own thinking on this is rapidly shifting and I regret that I’m not producing more posts about it right now. I will try to comment further on your linked post. Feel encouraged to PM me if you write/wrote more in this and think I might have missed it; I’m pretty interested in this right now.