Why is it called predictive “coding” in the first place? Also this particular method didn’t end up as being very useful compared to neural networks, right? So what is the relevance of it?
It’s called predictive coding because you’re encoding the image as the vector r, which is typically much smaller than the image. The idea is that in the brain, U changes slowly, while r tracks changes in the retinal image.
It’s not the best method for machine learning, but some smart people claim this is our best bet for the algorithm actually used by the brain. So it is of interest for that reason alone. In addition, there’s a good chance that our current algorithms are not the best possible, so it behooves us to keep some “also ran” algorithms in mind. It’s happened several times in the history of AI that an apparently inferior algorithm has received a minor tweak and become a champion.
Just to add to Carl Feynman’s response, which I thought was good.
Part of the reason these systems are inefficient is because it requires you to (effectively) run gradient descent even at inference, even after training is over. Or you can run the RNN, which is mathematically equivalent but again you can see where the inefficiency comes in: the value at time t=3 is a function of the value at time t=2, which is a function of t=1 and so on, so in order to get the converged value of the activations you have to, in a for loop, compute each timestep one by one.
This is in contrast to a feedforward network like a (normal) convnet or transformer, which can run extremely quickly and in parallel on gpu.
Why is it called predictive “coding” in the first place? Also this particular method didn’t end up as being very useful compared to neural networks, right? So what is the relevance of it?
It’s called predictive coding because you’re encoding the image as the vector r, which is typically much smaller than the image. The idea is that in the brain, U changes slowly, while r tracks changes in the retinal image.
It’s not the best method for machine learning, but some smart people claim this is our best bet for the algorithm actually used by the brain. So it is of interest for that reason alone. In addition, there’s a good chance that our current algorithms are not the best possible, so it behooves us to keep some “also ran” algorithms in mind. It’s happened several times in the history of AI that an apparently inferior algorithm has received a minor tweak and become a champion.
Just to add to Carl Feynman’s response, which I thought was good.
Part of the reason these systems are inefficient is because it requires you to (effectively) run gradient descent even at inference, even after training is over. Or you can run the RNN, which is mathematically equivalent but again you can see where the inefficiency comes in: the value at time t=3 is a function of the value at time t=2, which is a function of t=1 and so on, so in order to get the converged value of the activations you have to, in a for loop, compute each timestep one by one.
This is in contrast to a feedforward network like a (normal) convnet or transformer, which can run extremely quickly and in parallel on gpu.