A predictive processor doesn’t just minimize prediction error. It minimizes free energy. Prediction error is only half of the free energy equation. Here is a partial explanation that I hope will begin to help clear up some confusion.
Consider the simplest example. All the heuristics agree. Then region W will just copy and paste the inputs.
But now suppose the heuristics disagree. Suppose 7 heuristics output a value of 1 and 2 heuristics output a value of −1. Region W could just copy the heuristics, but if it did then 29 of the region would have a value of −1 and 79 of the region would have a value of 1. A small region with two contradictory values has high free energy. If the region is tightly coupled to itself then the region will round the whole thing to 1 instead.
A predictive processor doesn’t just minimize prediction error. It minimizes free energy. Prediction error is only half of the free energy equation. Here is a partial explanation that I hope will begin to help clear up some confusion.
Consider the simplest example. All the heuristics agree. Then region W will just copy and paste the inputs.
But now suppose the heuristics disagree. Suppose 7 heuristics output a value of 1 and 2 heuristics output a value of −1. Region W could just copy the heuristics, but if it did then 29 of the region would have a value of −1 and 79 of the region would have a value of 1. A small region with two contradictory values has high free energy. If the region is tightly coupled to itself then the region will round the whole thing to 1 instead.