I [just now] read the 2009 letter in Cell. It was very clear that this was a proposal for a model of human perception and action that was not at all tautological. But it didn’t explain why we’d expect this model to be true… instead, it had a lot of handwaving, and for “more details,” referred me to the 2010 Nature paper. Which I then skimmed, looking for the derivation or motivation of these equations (e.g. from figure 1 in Friston 2009). Of which I found exactly nothing.
Basically, when presented with an idea, it’s often hard to tell whether it’s true in a vacuum. But it’s not so hard to evaluate why it’s true – there are so many false things that if you believe something without good reason, it’s probably false. So rather than delving into issues with the idea itself, which might lead to engaging with some very vague writing, it’s a lot easier to just note that the mathematical parts of this model are pulled directly from the posterior.
But this definitely seems like the better website to talk to Eli Sennesh on :)
>But this definitely seems like the better website to talk to Eli Sennesh on :)
Somewhat honored, though I’m not sure we’ve met before :-).
I’m posting here mostly by now, because I’m… somewhat disappointed with people saying things like, “it’s bullshit” or “the mathematical parts of this model are pulled directly from the posterior”.
IMHO, there’s a lot to the strictly neuroscientific, biological aspects of the free-energy theory, and it integrates well with physics (good prediction resists disorder, “Thermodynamics of Prediction”) and with evolution (predictive regulation being the unique contribution of the brain).
Mathematically, well, I’m sure that a purely theoretical probabilist or analyst can pick everything up quickly.
Computationally and psychologically, it’s a hot mess. It feels, to me at least, like trying to explain a desktop computer by recourse to saying, “It successively and continually attempts to satisfy its beliefs under the logical model inherent to its circuitry”, that is, to compute a tree of NANDS of binary inputs. Is the explanation literally true? Yes! Why? Because it’s a universal explanation of the most convenient way we know of to implement Turing-complete computation in hardware.
But bullshit? No, I don’t think so.
I wind up putting Friston in the context of Tenenbaum, Goodman, Gershman, etc. Ok, it makes complete sense that the most primitive hardware-level operations of the brain may be probabilistic. We have plenty of evidence that the brain does probabilistic inference on multiple levels, including the seeming “top-down” ones like decision making and motor control. Having evolved one useful mechanism, it makes sense that evolution would just try to put more and more of them together, like Lego blocks, occasionally varying the design slightly to implement a new generative model or inference method within the basic layer or microcircuit doing the probabilistic job.
That’s still not a large-scale explanation of everything. It’s a language. Telling you the grammar of C or Lisp doesn’t teach you the architecture of Half Life 2. Showing that it’s a probability model just shows you that you can probably write it in Church or Pyro given enough hardware, and those allow all computably sampleable distributions—an immensely broad class of models!
On the other hand, if you had previously not even known what C or Turing machines were, and were just wondering how the guns and headcrabs got on the shiny box, you’ve made a big advance, haven’t you?
I think about predictive brain models by trying to parse them this as something like probabilistic programs:
What predictions? That is, what original generative model P, with what observable variables?
What inference methods? If variational, what sort of guide model Q? If Monte Carlo, what proposal Q?
Most importantly, which predictions are updated (via inference), and which are fulfilled (via action)?
The usual way to spot the latter in an active inference paper is to look for an equation saying something like −logP(u;Θ)=DKL(Q(s)||P(s|Θ,m|)). That denotes control states being sampled from a Boltzmann Distribution whose energy function is the divergence between empirical observations and actual goals.
The usual way to spot the latter in a computational cognitive science paper is just to look for an equation saying something like u∼∫ΘP(u,Θ|Goal) , which just says that you sample actions which make your goal most likely via ordinary conditionalizing.
Like I said, all this probabilistic mind stuff is a language to learn, which then lets you read lots of neuroscience and cognitive science papers more fluently. The reward is that, once you understand it, you get a nice solid intuition that, on the one hand, some papers might be mistaken, but on the other hand, with a few core ideas like hierarchical probability models and sampling actions from inferences, we’ve got an “assembly language” for describing a wide variety of possible cognitions.
I’m not qualified to comment on the literature in general or how research goes—if you say that treating the brain as drawing actions from a Boltzmann distribution on this weird divergence is useful, I believe you. But it seems like you can extract very specific claims from Friston 2009, like the brain having a model from perceptions to a distribution over “causes” (model parameters), and each step of learning in the brain reducing the KL divergence (specifically!) between a mutable internal generative model of “causes” and the fixed sense-inferred “causes.” This is the sort of thing that I failed to find a justification for, and therefore am treating as having a tenuous relation to real brains. And I don’t think this is just nitpicking, because fixed inference of causes is used to get fixed motivations that have preferences over causes.
So we could quibble over the details of Friston 2009, *buuuuut*...
I don’t find it useful to take Friston at 110% of his word. I find it more useful to read him like I read all other cognitive modelers: as establishing a language and a set of techniques whose scientific rigor he demonstrates via their application to novel experiments and known data.
He’s no more an absolute gold-standard than, say, Dennett, but his techniques have a certain theoretical elegance in terms of positing that the brain is built out of very few, very efficient core mechanisms, applied to abundant embodied training data, instead of very many mechanisms with relatively little training or processing power for each one.
Rather than quibble over him, I think that this morning in the shower I got what he means on a slightly deeper level, and now I seriously want to write a parody entitled, “So You Want to Write a Friston Paper”.
Reposting my comment from SSC:
But this definitely seems like the better website to talk to Eli Sennesh on :)
>But this definitely seems like the better website to talk to Eli Sennesh on :)
Somewhat honored, though I’m not sure we’ve met before :-).
I’m posting here mostly by now, because I’m… somewhat disappointed with people saying things like, “it’s bullshit” or “the mathematical parts of this model are pulled directly from the posterior”.
IMHO, there’s a lot to the strictly neuroscientific, biological aspects of the free-energy theory, and it integrates well with physics (good prediction resists disorder, “Thermodynamics of Prediction”) and with evolution (predictive regulation being the unique contribution of the brain).
Mathematically, well, I’m sure that a purely theoretical probabilist or analyst can pick everything up quickly.
Computationally and psychologically, it’s a hot mess. It feels, to me at least, like trying to explain a desktop computer by recourse to saying, “It successively and continually attempts to satisfy its beliefs under the logical model inherent to its circuitry”, that is, to compute a tree of NANDS of binary inputs. Is the explanation literally true? Yes! Why? Because it’s a universal explanation of the most convenient way we know of to implement Turing-complete computation in hardware.
But bullshit? No, I don’t think so.
I wind up putting Friston in the context of Tenenbaum, Goodman, Gershman, etc. Ok, it makes complete sense that the most primitive hardware-level operations of the brain may be probabilistic. We have plenty of evidence that the brain does probabilistic inference on multiple levels, including the seeming “top-down” ones like decision making and motor control. Having evolved one useful mechanism, it makes sense that evolution would just try to put more and more of them together, like Lego blocks, occasionally varying the design slightly to implement a new generative model or inference method within the basic layer or microcircuit doing the probabilistic job.
That’s still not a large-scale explanation of everything. It’s a language. Telling you the grammar of C or Lisp doesn’t teach you the architecture of Half Life 2. Showing that it’s a probability model just shows you that you can probably write it in Church or Pyro given enough hardware, and those allow all computably sampleable distributions—an immensely broad class of models!
On the other hand, if you had previously not even known what C or Turing machines were, and were just wondering how the guns and headcrabs got on the shiny box, you’ve made a big advance, haven’t you?
I think about predictive brain models by trying to parse them this as something like probabilistic programs:
What predictions? That is, what original generative model P, with what observable variables?
What inference methods? If variational, what sort of guide model Q? If Monte Carlo, what proposal Q?
Most importantly, which predictions are updated (via inference), and which are fulfilled (via action)?
The usual way to spot the latter in an active inference paper is to look for an equation saying something like −logP(u;Θ)=DKL(Q(s)||P(s|Θ,m|)). That denotes control states being sampled from a Boltzmann Distribution whose energy function is the divergence between empirical observations and actual goals.
The usual way to spot the latter in a computational cognitive science paper is just to look for an equation saying something like u∼∫ΘP(u,Θ|Goal) , which just says that you sample actions which make your goal most likely via ordinary conditionalizing.
Like I said, all this probabilistic mind stuff is a language to learn, which then lets you read lots of neuroscience and cognitive science papers more fluently. The reward is that, once you understand it, you get a nice solid intuition that, on the one hand, some papers might be mistaken, but on the other hand, with a few core ideas like hierarchical probability models and sampling actions from inferences, we’ve got an “assembly language” for describing a wide variety of possible cognitions.
I’m not qualified to comment on the literature in general or how research goes—if you say that treating the brain as drawing actions from a Boltzmann distribution on this weird divergence is useful, I believe you. But it seems like you can extract very specific claims from Friston 2009, like the brain having a model from perceptions to a distribution over “causes” (model parameters), and each step of learning in the brain reducing the KL divergence (specifically!) between a mutable internal generative model of “causes” and the fixed sense-inferred “causes.” This is the sort of thing that I failed to find a justification for, and therefore am treating as having a tenuous relation to real brains. And I don’t think this is just nitpicking, because fixed inference of causes is used to get fixed motivations that have preferences over causes.
So we could quibble over the details of Friston 2009, *buuuuut*...
I don’t find it useful to take Friston at 110% of his word. I find it more useful to read him like I read all other cognitive modelers: as establishing a language and a set of techniques whose scientific rigor he demonstrates via their application to novel experiments and known data.
He’s no more an absolute gold-standard than, say, Dennett, but his techniques have a certain theoretical elegance in terms of positing that the brain is built out of very few, very efficient core mechanisms, applied to abundant embodied training data, instead of very many mechanisms with relatively little training or processing power for each one.
Rather than quibble over him, I think that this morning in the shower I got what he means on a slightly deeper level, and now I seriously want to write a parody entitled, “So You Want to Write a Friston Paper”.