I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.
I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
Thanks!
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.