Claim: predictive learning gets gradients “for free” … Claim: if you’re learning to act, you do not similarly get gradients “for free”. You take an action, and you see results of that one action. This means you fundamentally don’t know what would have happened had you taken alternate actions, which means you don’t have a direction to move your policy in. You don’t know whether alternatives would have been better or worse. So, rewards you observe seem like not enough to determine how you should learn.
This immediately jumped out at me as an implausible distinction because I was just reading Surfing Uncertainty which goes on endlessly about how the machinery of hierarchical predictive coding is exactly the same as the machinery of hierarchical motor control (with “priors” in the former corresponding to “priors + control-theory-setpoints” in the latter, and with “predictions about upcoming proprioceptive inputs” being identical to the muscle control outputs). Example excerpt:
the heavy lifting that is usually done by the use of efference copy, inverse models, and optimal controllers [in the models proposed by non-predictive-coding people] is now shifted [in the predictive coding paradigm] to the acquisition and use of the predictive (generative) model (i.e., the right set of prior probabilistic ‘beliefs’). This is potentially advantageous if (but only if) we can reasonably assume that these beliefs ‘emerge naturally as top-down or empirical priors during hierarchical perceptual inference’ (Friston, 2011a, p. 492). The computational burden thus shifts to the acquisition of the right set of priors (here, priors over trajectories and state transitions), that is, it shifts the burden to acquiring and tuning the generative model itself. --Surfing Uncertainty chapter 4
I’m a bit hazy on the learning mechanism for this (confusingly-named) “predictive model” (I haven’t gotten around to chasing down the references) and how that relates to what you wrote… But it does sorta sound like it entails one update process rather than two...
Yep, I 100% agree that this is relevant. The PP/Friston/free-energy/active-inference camp is definitely at least trying to “cross the gradient gap” with a unified theory as opposed to a two-system solution. However, I’m not sure how to think about it yet.
I may be completely wrong, but I have a sense that there’s a distinction between learning and inference which plays a similar role; IE, planning is just inference, but both planning and inference work only because the learning part serves as the second “protected layer”??
It may be that the PP is “more or less” the Bayesian solution; IE, it requires a grain of truth to get good results, so it doesn’t really help with the things I’m most interested in getting out of “crossing the gap”.
Note that PP clearly tries to implement things by pushing everything into epistemics. On the other hand, I’m mostly discussing what happens when you try to smoosh everything into the instrumental system. So many of my remarks are not directly relevant to PP.
I get the sense that Friston might be using the “evolution solution” I mentioned; so, unifying things in a way which kind of lets us talk about evolved agents, but not artificial ones. However, this is obviously an oversimplification, because he does present designs for artificial agents based on the ideas.
Overall, my current sense is that PP obscures the issue I’m interested in more than solves it, but it’s not clear.
This immediately jumped out at me as an implausible distinction because I was just reading Surfing Uncertainty which goes on endlessly about how the machinery of hierarchical predictive coding is exactly the same as the machinery of hierarchical motor control (with “priors” in the former corresponding to “priors + control-theory-setpoints” in the latter, and with “predictions about upcoming proprioceptive inputs” being identical to the muscle control outputs). Example excerpt:
I’m a bit hazy on the learning mechanism for this (confusingly-named) “predictive model” (I haven’t gotten around to chasing down the references) and how that relates to what you wrote… But it does sorta sound like it entails one update process rather than two...
Yep, I 100% agree that this is relevant. The PP/Friston/free-energy/active-inference camp is definitely at least trying to “cross the gradient gap” with a unified theory as opposed to a two-system solution. However, I’m not sure how to think about it yet.
I may be completely wrong, but I have a sense that there’s a distinction between learning and inference which plays a similar role; IE, planning is just inference, but both planning and inference work only because the learning part serves as the second “protected layer”??
It may be that the PP is “more or less” the Bayesian solution; IE, it requires a grain of truth to get good results, so it doesn’t really help with the things I’m most interested in getting out of “crossing the gap”.
Note that PP clearly tries to implement things by pushing everything into epistemics. On the other hand, I’m mostly discussing what happens when you try to smoosh everything into the instrumental system. So many of my remarks are not directly relevant to PP.
I get the sense that Friston might be using the “evolution solution” I mentioned; so, unifying things in a way which kind of lets us talk about evolved agents, but not artificial ones. However, this is obviously an oversimplification, because he does present designs for artificial agents based on the ideas.
Overall, my current sense is that PP obscures the issue I’m interested in more than solves it, but it’s not clear.