We can’t teach it to maximize its own error function. It’s not just physically impossible. It’s a logical contradiction.
But is this important/interesting?
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there’s no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.
I totally hang out in dark, quiet rooms where there’s no surprise.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
This seems to me like a recipe for scientific disaster.…I haven’t examined the pile of evidence that’s supposedly in favor of actual PP in the actual brain. I’m missing a ton of context. I just get the feeling from a distance, that it’s this intellectual black hole.
You’re right. The problem is even broader than you write. Psychology is a recipe for scientific disaster. Freud was a disaster. The Behaviorists were (less of) a disaster. And those are (to my knowledge) the two most powerful schools in psychiatry.
But I think I’m mostly right about the basics, and the right thing to do under such circumstances is to post my predictions on a public forum. If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
I don’t deny this. My fear isn’t a general fear that any time we conclude there’s a base system with some patches, we’re wrong. Rather, I have a fear of using these patches to excuse a bad theory, like epicycle theory vs Newton. The specific worry is more like why do people start buying this in the first place? I’ve never seen concrete evidence that it helps people understand things?? And when people check the math in Friston papers, it seems to be a Swiss Cheese of errors???
If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
To state the obvious, this feedback loop is too slow, but obviously that’s compatible with your point here.
Still, I hope we can find predictions that can be tested faster.
Or even moreso, I hope that we can spell out reasons for believing things which help us find double-cruxes which we can settle through simple discussion.
Treating “PP” as a monolithic ideology probably greatly exaggerates the seeming disagreement. I don’t have any dispute with a lot of the concrete PP methodology. For example, the predictive coding = gradient descent paper commits no sins by my lights. I haven’t understood the math in enough detail to believe the biological implications yet (I feel, uneasily, like there might be a catch somewhere which makes it still not too biologically plausible). But at base, it’s a result showing that a specific variational method is in-some-sense equivalent to gradient descent.
(As long as we’re in the realm of “some specific variational method” instead of blurring everything together into “free energy minimization”, I’m relatively happier.)
If you want to get into that level of technical granularity then there are major things that need to change before applying the PP methodology in the paper to real biological neurons. Two of the big ones are brainwave oscillations and existing in the flow of time.
Mostly what I find interesting is the theory that the bulk of animal brain processing goes into creating a real-time internal simulation of the world, that this is mathematically plausible via forward-propagating signals, and that error and entropy are fused together.
When I say “free energy minimization” I mean the idea that error and surprise are fused together (possibly with an entropy minimizer thrown in).
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
Your claim is a variant of, like, “you can’t seek to minimize your own utility function”. Like, sure, yeah...
I expected that the historical record would show that carefully spelled-out versions of the orthogonality thesis would claim something like “preferences can vary almost independently of intelligence” (for reasons such as, an agent can prefer to behave unintelligently; if it successfully does so, it scarcely seems fair to call it highly intelligent, at least in so far as definitions of intelligence were supposed to be behavioral).
I was wrong; it appears that historical definitions of the orthogonality thesis do make the strong claim that goals can vary independently of intellect.
So yeah, I think there are some exceptions to the strongest form of the orthogonality thesis (at least, depending on definitions of intelligence).
OTOH, the claims that no agent can seek to maximize its own learning-theoretic loss, or minimize its own utility-theoretic preferences, don’t really speak against Orthogonality. Since they’re intelligence-independent constraints.
But you were talking about wireheading.
How does agents cannot seek to maximize their own learning-theoretic loss take a bite out of wireheading? It seems entirely compatible with wireheading.
I appreciate your epistemic honesty regarding the historical record.
As for the theory of wireheading, I think it’s drifting away from the original topic of my post here. I created a new post Self-Reference Breaks the Orthogonality Thesis which I think provides a cleaner version of what I’m trying to say, without the biological spandrels. If you want to continue this discussion, I think it’d be better to do so there.
Because it implies the existence of a fixed point of epistemic convergence that’s robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.
I totally hang out in dark, quiet rooms where there’s no surprise.
But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.
You’re right. The problem is even broader than you write. Psychology is a recipe for scientific disaster. Freud was a disaster. The Behaviorists were (less of) a disaster. And those are (to my knowledge) the two most powerful schools in psychiatry.
But I think I’m mostly right about the basics, and the right thing to do under such circumstances is to post my predictions on a public forum. If you think I’m wrong, then you can register your counter-prediction and we can check back in 30 years and we’ll see if one of us has been proven right.
I don’t deny this. My fear isn’t a general fear that any time we conclude there’s a base system with some patches, we’re wrong. Rather, I have a fear of using these patches to excuse a bad theory, like epicycle theory vs Newton. The specific worry is more like why do people start buying this in the first place? I’ve never seen concrete evidence that it helps people understand things?? And when people check the math in Friston papers, it seems to be a Swiss Cheese of errors???
To state the obvious, this feedback loop is too slow, but obviously that’s compatible with your point here.
Still, I hope we can find predictions that can be tested faster.
Or even moreso, I hope that we can spell out reasons for believing things which help us find double-cruxes which we can settle through simple discussion.
Treating “PP” as a monolithic ideology probably greatly exaggerates the seeming disagreement. I don’t have any dispute with a lot of the concrete PP methodology. For example, the predictive coding = gradient descent paper commits no sins by my lights. I haven’t understood the math in enough detail to believe the biological implications yet (I feel, uneasily, like there might be a catch somewhere which makes it still not too biologically plausible). But at base, it’s a result showing that a specific variational method is in-some-sense equivalent to gradient descent.
(As long as we’re in the realm of “some specific variational method” instead of blurring everything together into “free energy minimization”, I’m relatively happier.)
If you want to get into that level of technical granularity then there are major things that need to change before applying the PP methodology in the paper to real biological neurons. Two of the big ones are brainwave oscillations and existing in the flow of time.
Mostly what I find interesting is the theory that the bulk of animal brain processing goes into creating a real-time internal simulation of the world, that this is mathematically plausible via forward-propagating signals, and that error and entropy are fused together.
When I say “free energy minimization” I mean the idea that error and surprise are fused together (possibly with an entropy minimizer thrown in).
Your claim is a variant of, like, “you can’t seek to minimize your own utility function”. Like, sure, yeah...
I expected that the historical record would show that carefully spelled-out versions of the orthogonality thesis would claim something like “preferences can vary almost independently of intelligence” (for reasons such as, an agent can prefer to behave unintelligently; if it successfully does so, it scarcely seems fair to call it highly intelligent, at least in so far as definitions of intelligence were supposed to be behavioral).
I was wrong; it appears that historical definitions of the orthogonality thesis do make the strong claim that goals can vary independently of intellect.
So yeah, I think there are some exceptions to the strongest form of the orthogonality thesis (at least, depending on definitions of intelligence).
OTOH, the claims that no agent can seek to maximize its own learning-theoretic loss, or minimize its own utility-theoretic preferences, don’t really speak against Orthogonality. Since they’re intelligence-independent constraints.
But you were talking about wireheading.
How does agents cannot seek to maximize their own learning-theoretic loss take a bite out of wireheading? It seems entirely compatible with wireheading.
I appreciate your epistemic honesty regarding the historical record.
As for the theory of wireheading, I think it’s drifting away from the original topic of my post here. I created a new post Self-Reference Breaks the Orthogonality Thesis which I think provides a cleaner version of what I’m trying to say, without the biological spandrels. If you want to continue this discussion, I think it’d be better to do so there.