You write “I don’t currently understand how PP does RL”. I’m not claiming that PP does RL. PP can do RL, but that’s not important. The biological neural network model in my original post is getting trained simultaneously by two different algorithms with two different optimization targets. The PP algorithm is running at all times and is training the neural network to minimize surprise. The RL algorithm is activated intermittently and trains the neural network to take actions that produce squirts of dopamine.
To be clear here, I did understand that you posit a dual-system approach in this post, with squirts of dopamine for RL and PP for everything else. However, I didn’t really understand why you wanted to posit that, in the context of your other posts, where you mention PP doing the RL part too.
We can teach it play chess, build cars, take over the world and disassemble stars. But there is one thing we can’t teach it to do: We can’t teach it to maximize its own error function. It’s not just physically impossible. It’s a logical contradiction.
But is this important/interesting?
Here’s my problem. I feel like in general when talking about PP, I end up chasing shadows. First, there’s a lot of naive PP discourse out there, where people just talk about “minimizing free energy” like it explains everything, with no apparent understanding of the nuances behind what different types of free energy you can minimize, and in what kind of minimization framework, etc. People claiming that they can explain any psychological phenomena in terms of minimizing predictive error. So you get paragraphs like:
If you connect the neurons in a predictive processor into motor (output) neurons, then the predictive processor will learn to send motor outputs which minimize predictive error i.e minimize surprise.
And then you gen the semi-experts/semi-dilettantes, who have read a few papers on the subject and can’t claim to explain everything but recognize the obvious fallacies and believe that there are ways around them.
So then you get paragraphs like:
Wait a minute. Don’t people we get bored and seek out novelty? And isn’t novelty a form of surprise (which increases free energy)? Yes, but that’s because the human brain isn’t a pure predictive processor. The brain gets a squirt of dopamine when it exhibits a behavior that evolution wants to reinforce. Dopamine-moderated reinforcement learning alone is enough to elicit non-free-energy-minimizing behaviors (such as gambling) from a predictive processor.
Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there’s no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.
This seems to me like a recipe for scientific disaster.
I get this feeling that people must be initially attracted to PP by (a) the promised generality (which actually means it doesn’t predict anything very strongly), or (b) the neat math, or (c) some particular clever arguments about how some specific phenomena can be understood as minimization of prediction error, like maybe how humans often seem to confuse ‘is’ with ‘ought’. And then, if they get far enough, they start to see how the naive version can’t make sense; but there are so many ways to patch it, and other smart people who seem to believe that things work out...
I haven’t examined the pile of evidence that’s supposedly in favor of actual PP in the actual brain. I’m missing a ton of context. I just get the feeling from a distance, that it’s this intellectual black hole.
We can’t teach it to maximize its own error function. It’s not just physically impossible. It’s a logical contradiction.
But is this important/interesting?
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there’s no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.
I totally hang out in dark, quiet rooms where there’s no surprise.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
This seems to me like a recipe for scientific disaster.…I haven’t examined the pile of evidence that’s supposedly in favor of actual PP in the actual brain. I’m missing a ton of context. I just get the feeling from a distance, that it’s this intellectual black hole.
You’re right. The problem is even broader than you write. Psychology is a recipe for scientific disaster. Freud was a disaster. The Behaviorists were (less of) a disaster. And those are (to my knowledge) the two most powerful schools in psychiatry.
But I think I’m mostly right about the basics, and the right thing to do under such circumstances is to post my predictions on a public forum. If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
I don’t deny this. My fear isn’t a general fear that any time we conclude there’s a base system with some patches, we’re wrong. Rather, I have a fear of using these patches to excuse a bad theory, like epicycle theory vs Newton. The specific worry is more like why do people start buying this in the first place? I’ve never seen concrete evidence that it helps people understand things?? And when people check the math in Friston papers, it seems to be a Swiss Cheese of errors???
If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
To state the obvious, this feedback loop is too slow, but obviously that’s compatible with your point here.
Still, I hope we can find predictions that can be tested faster.
Or even moreso, I hope that we can spell out reasons for believing things which help us find double-cruxes which we can settle through simple discussion.
Treating “PP” as a monolithic ideology probably greatly exaggerates the seeming disagreement. I don’t have any dispute with a lot of the concrete PP methodology. For example, the predictive coding = gradient descent paper commits no sins by my lights. I haven’t understood the math in enough detail to believe the biological implications yet (I feel, uneasily, like there might be a catch somewhere which makes it still not too biologically plausible). But at base, it’s a result showing that a specific variational method is in-some-sense equivalent to gradient descent.
(As long as we’re in the realm of “some specific variational method” instead of blurring everything together into “free energy minimization”, I’m relatively happier.)
If you want to get into that level of technical granularity then there are major things that need to change before applying the PP methodology in the paper to real biological neurons. Two of the big ones are brainwave oscillations and existing in the flow of time.
Mostly what I find interesting is the theory that the bulk of animal brain processing goes into creating a real-time internal simulation of the world, that this is mathematically plausible via forward-propagating signals, and that error and entropy are fused together.
When I say “free energy minimization” I mean the idea that error and surprise are fused together (possibly with an entropy minimizer thrown in).
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
Your claim is a variant of, like, “you can’t seek to minimize your own utility function”. Like, sure, yeah...
I expected that the historical record would show that carefully spelled-out versions of the orthogonality thesis would claim something like “preferences can vary almost independently of intelligence” (for reasons such as, an agent can prefer to behave unintelligently; if it successfully does so, it scarcely seems fair to call it highly intelligent, at least in so far as definitions of intelligence were supposed to be behavioral).
I was wrong; it appears that historical definitions of the orthogonality thesis do make the strong claim that goals can vary independently of intellect.
So yeah, I think there are some exceptions to the strongest form of the orthogonality thesis (at least, depending on definitions of intelligence).
OTOH, the claims that no agent can seek to maximize its own learning-theoretic loss, or minimize its own utility-theoretic preferences, don’t really speak against Orthogonality. Since they’re intelligence-independent constraints.
But you were talking about wireheading.
How does agents cannot seek to maximize their own learning-theoretic loss take a bite out of wireheading? It seems entirely compatible with wireheading.
I appreciate your epistemic honesty regarding the historical record.
As for the theory of wireheading, I think it’s drifting away from the original topic of my post here. I created a new post Self-Reference Breaks the Orthogonality Thesis which I think provides a cleaner version of what I’m trying to say, without the biological spandrels. If you want to continue this discussion, I think it’d be better to do so there.
To be clear here, I did understand that you posit a dual-system approach in this post, with squirts of dopamine for RL and PP for everything else. However, I didn’t really understand why you wanted to posit that, in the context of your other posts, where you mention PP doing the RL part too.
But is this important/interesting?
Here’s my problem. I feel like in general when talking about PP, I end up chasing shadows. First, there’s a lot of naive PP discourse out there, where people just talk about “minimizing free energy” like it explains everything, with no apparent understanding of the nuances behind what different types of free energy you can minimize, and in what kind of minimization framework, etc. People claiming that they can explain any psychological phenomena in terms of minimizing predictive error. So you get paragraphs like:
And then you gen the semi-experts/semi-dilettantes, who have read a few papers on the subject and can’t claim to explain everything but recognize the obvious fallacies and believe that there are ways around them.
So then you get paragraphs like:
Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there’s no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.
This seems to me like a recipe for scientific disaster.
I get this feeling that people must be initially attracted to PP by (a) the promised generality (which actually means it doesn’t predict anything very strongly), or (b) the neat math, or (c) some particular clever arguments about how some specific phenomena can be understood as minimization of prediction error, like maybe how humans often seem to confuse ‘is’ with ‘ought’. And then, if they get far enough, they start to see how the naive version can’t make sense; but there are so many ways to patch it, and other smart people who seem to believe that things work out...
I haven’t examined the pile of evidence that’s supposedly in favor of actual PP in the actual brain. I’m missing a ton of context. I just get the feeling from a distance, that it’s this intellectual black hole.
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
I totally hang out in dark, quiet rooms where there’s no surprise.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
You’re right. The problem is even broader than you write. Psychology is a recipe for scientific disaster. Freud was a disaster. The Behaviorists were (less of) a disaster. And those are (to my knowledge) the two most powerful schools in psychiatry.
But I think I’m mostly right about the basics, and the right thing to do under such circumstances is to post my predictions on a public forum. If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
I don’t deny this. My fear isn’t a general fear that any time we conclude there’s a base system with some patches, we’re wrong. Rather, I have a fear of using these patches to excuse a bad theory, like epicycle theory vs Newton. The specific worry is more like why do people start buying this in the first place? I’ve never seen concrete evidence that it helps people understand things?? And when people check the math in Friston papers, it seems to be a Swiss Cheese of errors???
To state the obvious, this feedback loop is too slow, but obviously that’s compatible with your point here.
Still, I hope we can find predictions that can be tested faster.
Or even moreso, I hope that we can spell out reasons for believing things which help us find double-cruxes which we can settle through simple discussion.
Treating “PP” as a monolithic ideology probably greatly exaggerates the seeming disagreement. I don’t have any dispute with a lot of the concrete PP methodology. For example, the predictive coding = gradient descent paper commits no sins by my lights. I haven’t understood the math in enough detail to believe the biological implications yet (I feel, uneasily, like there might be a catch somewhere which makes it still not too biologically plausible). But at base, it’s a result showing that a specific variational method is in-some-sense equivalent to gradient descent.
(As long as we’re in the realm of “some specific variational method” instead of blurring everything together into “free energy minimization”, I’m relatively happier.)
If you want to get into that level of technical granularity then there are major things that need to change before applying the PP methodology in the paper to real biological neurons. Two of the big ones are brainwave oscillations and existing in the flow of time.
Mostly what I find interesting is the theory that the bulk of animal brain processing goes into creating a real-time internal simulation of the world, that this is mathematically plausible via forward-propagating signals, and that error and entropy are fused together.
When I say “free energy minimization” I mean the idea that error and surprise are fused together (possibly with an entropy minimizer thrown in).
Your claim is a variant of, like, “you can’t seek to minimize your own utility function”. Like, sure, yeah...
I expected that the historical record would show that carefully spelled-out versions of the orthogonality thesis would claim something like “preferences can vary almost independently of intelligence” (for reasons such as, an agent can prefer to behave unintelligently; if it successfully does so, it scarcely seems fair to call it highly intelligent, at least in so far as definitions of intelligence were supposed to be behavioral).
I was wrong; it appears that historical definitions of the orthogonality thesis do make the strong claim that goals can vary independently of intellect.
So yeah, I think there are some exceptions to the strongest form of the orthogonality thesis (at least, depending on definitions of intelligence).
OTOH, the claims that no agent can seek to maximize its own learning-theoretic loss, or minimize its own utility-theoretic preferences, don’t really speak against Orthogonality. Since they’re intelligence-independent constraints.
But you were talking about wireheading.
How does agents cannot seek to maximize their own learning-theoretic loss take a bite out of wireheading? It seems entirely compatible with wireheading.
I appreciate your epistemic honesty regarding the historical record.
As for the theory of wireheading, I think it’s drifting away from the original topic of my post here. I created a new post Self-Reference Breaks the Orthogonality Thesis which I think provides a cleaner version of what I’m trying to say, without the biological spandrels. If you want to continue this discussion, I think it’d be better to do so there.