Another part of the picture that isn’t complicated is that the exact same algorithms can be used for probabilistic inference (finding good explanations for the data) and planning (finding a plan that achieves some goal). In fact this connection is useful and people in AI sometimes exploit it. It’s a bit deeper than it sounds but not that deep. See planning as inference, which Eli mentions above. It seems worth understanding this simple idea before trying to understand some extremely confusing pile of ideas.
Another important distinction: there are two different algorithms one might describe as “minimizing prediction error:”
I think the more natural one is algorithm A: you adjust your beliefs to minimize prediction error (after translating your preferences into “optimistic beliefs”). Then you act according to your beliefs about how you will act. This is equivalent to independently forming beliefs and then acting to get what you want, it’s just an implementation detail.
There is a much more complicated family of algorithms, call them algorithm B, where you actually plan in order to change the observations you’ll make in the future, with the goal of minimizing prediction error. This is the version that would cause you to e.g. go read a textbook, or lock yourself in a dark room. This version is algorithmically way more complicated to implement, even though it maybe sounds simpler. It also has all kinds of weird implications and it’s not easy to see how to turn it into something that isn’t obviously wrong.
Regardless of which view you prefer, it seems important to recognize the difference between the two. In particular, evidence for the us using algorithm A shouldn’t be interpreted as evidence that we use algorithm B.
It sounds like Friston intends algorithm B. This version is pretty different from anything that researchers in AI use, and I’m pretty skeptical (based on observations of humans and the surface implausibility of the story rather than any knowledge about the area).
Paul, this is very helpful! Finally I understand what this “active inference” stuff is about. I wonder whether there were any significant theoretical results about these methods since Rawlik et al 2012?
Another part of the picture that isn’t complicated is that the exact same algorithms can be used for probabilistic inference (finding good explanations for the data) and planning (finding a plan that achieves some goal). In fact this connection is useful and people in AI sometimes exploit it. It’s a bit deeper than it sounds but not that deep. See planning as inference, which Eli mentions above. It seems worth understanding this simple idea before trying to understand some extremely confusing pile of ideas.
Another important distinction: there are two different algorithms one might describe as “minimizing prediction error:”
I think the more natural one is algorithm A: you adjust your beliefs to minimize prediction error (after translating your preferences into “optimistic beliefs”). Then you act according to your beliefs about how you will act. This is equivalent to independently forming beliefs and then acting to get what you want, it’s just an implementation detail.
There is a much more complicated family of algorithms, call them algorithm B, where you actually plan in order to change the observations you’ll make in the future, with the goal of minimizing prediction error. This is the version that would cause you to e.g. go read a textbook, or lock yourself in a dark room. This version is algorithmically way more complicated to implement, even though it maybe sounds simpler. It also has all kinds of weird implications and it’s not easy to see how to turn it into something that isn’t obviously wrong.
Regardless of which view you prefer, it seems important to recognize the difference between the two. In particular, evidence for the us using algorithm A shouldn’t be interpreted as evidence that we use algorithm B.
It sounds like Friston intends algorithm B. This version is pretty different from anything that researchers in AI use, and I’m pretty skeptical (based on observations of humans and the surface implausibility of the story rather than any knowledge about the area).
Paul, this is very helpful! Finally I understand what this “active inference” stuff is about. I wonder whether there were any significant theoretical results about these methods since Rawlik et al 2012?
Oh hey, so that’s the original KL control paper. Saved!