Imposing an RL algorithm on the dynamics of the lizard’s brain and body is no more justified than imposing the Active Inference algorithm on it.
I think you misunderstood (or missed) the part where I wrote “Suppose (for the sake of argument) that what’s happening behind the scenes is an RL algorithm in its brain, whose reward function is external temperature when the lizard feels cold, and whose reward function is negative external temperature when the lizard feels hot.”
What I’m saying here is that RL is not a thing I am “imposing” on the lizard brain—it’s how the brain actually works (in this for-the-sake-of-argument hypothetical).
Pick your favorite RL algorithm—let’s say PPO. And imagine that when we look inside this lizard brain we find every step of the PPO algorithm implemented in neurons in a way that exactly parallels, line-by-line, how PPO works in the textbooks. “Aha”, you say, “look at the pattern of synapses in this group of 10,000 neurons, this turns out to be exactly how you would wire together neurons to calculate the KL divergence of (blah blah blah). And look at that group of neurons! It is configured in the exact right way to double the β parameter when the divergence is too high. And look at …” etc. etc.
Is this realistic? No. Lizard brains do not literally implement the PPO algorithm. But they could in principle, and if they did, we would find that the lizards move around in a way that effectively maintains their body temperature. And FEP would apply to those hypothetical lizard brains, just like FEP applies by definition to everything with bodily integrity etc. But here we can say that the person who says “the lizard brain is running an RL algorithm, namely PPO with thus-and-such reward function” is correctly describing a gears-level model of this hypothetical lizard brain. They are not “imposing” anything! Whereas the person who says “the lizard is ‘predicting’ that its body temperature will be constant” is not doing that. The latter person is much farther away from understanding this hypothetical lizard brain than the former person, right?
Yes, I mentally skipped the part when you created “artificial lizard with RL architecture” (that was unexpected). Then, the argument collapses to the first part of the comment to which you are replying: gears-level is more precise, of course, but “birds-eye view” of Active Inference could give you the concepts for thinking about agency, persuadability (aka corrigibility), etc., without the need to re-invent them, and without spawning a plethora of concepts which don’t make sense in the abstract and are specific for each AI algorithm/architecture (such as, the concept of “reward” is not a fundamental concept of alignment, because it applies to RL agents, but doesn’t apply to LLMs, which are also agents).
I think you misunderstood (or missed) the part where I wrote “Suppose (for the sake of argument) that what’s happening behind the scenes is an RL algorithm in its brain, whose reward function is external temperature when the lizard feels cold, and whose reward function is negative external temperature when the lizard feels hot.”
What I’m saying here is that RL is not a thing I am “imposing” on the lizard brain—it’s how the brain actually works (in this for-the-sake-of-argument hypothetical).
Pick your favorite RL algorithm—let’s say PPO. And imagine that when we look inside this lizard brain we find every step of the PPO algorithm implemented in neurons in a way that exactly parallels, line-by-line, how PPO works in the textbooks. “Aha”, you say, “look at the pattern of synapses in this group of 10,000 neurons, this turns out to be exactly how you would wire together neurons to calculate the KL divergence of (blah blah blah). And look at that group of neurons! It is configured in the exact right way to double the β parameter when the divergence is too high. And look at …” etc. etc.
Is this realistic? No. Lizard brains do not literally implement the PPO algorithm. But they could in principle, and if they did, we would find that the lizards move around in a way that effectively maintains their body temperature. And FEP would apply to those hypothetical lizard brains, just like FEP applies by definition to everything with bodily integrity etc. But here we can say that the person who says “the lizard brain is running an RL algorithm, namely PPO with thus-and-such reward function” is correctly describing a gears-level model of this hypothetical lizard brain. They are not “imposing” anything! Whereas the person who says “the lizard is ‘predicting’ that its body temperature will be constant” is not doing that. The latter person is much farther away from understanding this hypothetical lizard brain than the former person, right?
Yes, I mentally skipped the part when you created “artificial lizard with RL architecture” (that was unexpected). Then, the argument collapses to the first part of the comment to which you are replying: gears-level is more precise, of course, but “birds-eye view” of Active Inference could give you the concepts for thinking about agency, persuadability (aka corrigibility), etc., without the need to re-invent them, and without spawning a plethora of concepts which don’t make sense in the abstract and are specific for each AI algorithm/architecture (such as, the concept of “reward” is not a fundamental concept of alignment, because it applies to RL agents, but doesn’t apply to LLMs, which are also agents).