Your comment is a bit funny to me because I think of algorithms and definitions as kinda different topics. In particular, I think of RL as a family of algorithms, not as a definition of an agent.
If I have I have a robot running an RL algorithm, and I have a sufficiently thorough understanding of how the algorithm works and how the robot works, then I can answer any question about what the robot will do, but I haven’t necessarily made any progress at all on the question of “what are its beliefs?” or “is it an agent?” For example, we can easily come up with edge cases where the question “what do I (Steve) believe?” has an unclear answer, even knowing everything about how I behave and how my brain works. (E.g. if I explicitly deny X but act in other ways as if I believe X, then is X part of my “beliefs”? Or if I give different answers about X depending on context and framing? Etc.)
Conversely, I presume that if we had a rigorous mathematical definition of “beliefs” and “agents” and whatever, it would not immediately tell us the answers to algorithms questions like how to build artificial general intelligence or how the brain works.
In terms of definitions: If FEP people someday use FEP to construct a definition of “agent” that reproduces common sense, e.g. I’m an agent, a rock isn’t an agent, a flame isn’t an agent, etc., then I’d be interested to see it. And if I thought that this definition had any kernel of truth, the first thing I’d try to do is reformulate that definition in a way that doesn’t explicitly mention FEP. If I found that this was impossible or pointlessly convoluted, then I would count that as the first good reason to talk about FEP. I am currently skeptical that any part of that is going to actually happen.
As it happens, I have some friends in AGI safety & alignment interested in the question of how to define agency, goals, etc. (e.g. 1,2). I guess I find such work slightly helpful sometimes. I’ve gone back to this post a couple times, for example. (I have not found the FEP-centric discussions in this vein to be helpful.) But by and large, I mostly don’t see it as very central to my main interest of safe and beneficial AGI. Again, if I have I have a robot running a certain algorithm, and I understand that algorithm well enough, then I can answer any question about what the robot will do. And that’s what I care about. If I don’t know a rigorous definition of “what the robot believes”, but I can still answer any question about what the robot will do and why, then I don’t really feel like I’m missing anything important.
You bring up the difficulty of defining the interface separating an algorithm’s actuators from the world. Amusingly, you take that observation as evidence that we need FEP to come up with good definitions, and I take that same observation as evidence that there is no good definition, it’s inherently an arbitrary thing and really there isn’t anything here to define and we shouldn’t be arguing about it in the first place. :-P
In terms of algorithms: I agree that there are interesting questions related to building a reinforcement learning algorithm that is not subject to wishful thinking, or at least that is minimally subject to wishful thinking. (I consider humans to be “minimally subject to wishful thinking”, in the sense that it obviously happens sometimes, but it happens much less than it might happen, e.g. I don’t keep opening my wallet expecting to find a giant wad of cash inside.) I feel like I personally have a pretty good handle on how that potential issue is mitigated in the human brain, after a whole lot of time thinking about it. I think it centrally involves the fact that the human brain use model-based actor-critic RL, and the model is not updated to maximize reward but rather updated using (a generalization of) self-supervised learning on sensory predictions. And likewise the critic is (obviously) not updated to maximize rewards but rather updated to estimate reward using (a variant of) TD learning.
In principle it’s possible that I would learn something new and helpful about the question of wishful thinking by reading more FEP literature, but my personal experience is very strongly in the opposite direction, i.e. that FEP people are typically way more confused about this topic than pretty much anyone else, e.g. their proposed solutions actually make the problem much much worse, but also make everything sufficiently confusing / obfuscated that they don’t realize it. Basically, a good solution to wishful thinking involves working hard to separate plans/inclinations/desires from beliefs, and in particular updating them in different ways, whereas FEP people want to go in the opposite direction by mixing plans/inclinations/desires & beliefs together into one big unified framework. Interestingly, one of the first things I read when I was starting in that area was an FEP-ish book (Surfing Uncertainty), and I consider that much of my subsequent progress involved “un-learning” many of the ideas I had gotten out of that book. ¯\_(ツ)_/¯
I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.
Thanks for the comment!
Your comment is a bit funny to me because I think of algorithms and definitions as kinda different topics. In particular, I think of RL as a family of algorithms, not as a definition of an agent.
If I have I have a robot running an RL algorithm, and I have a sufficiently thorough understanding of how the algorithm works and how the robot works, then I can answer any question about what the robot will do, but I haven’t necessarily made any progress at all on the question of “what are its beliefs?” or “is it an agent?” For example, we can easily come up with edge cases where the question “what do I (Steve) believe?” has an unclear answer, even knowing everything about how I behave and how my brain works. (E.g. if I explicitly deny X but act in other ways as if I believe X, then is X part of my “beliefs”? Or if I give different answers about X depending on context and framing? Etc.)
Conversely, I presume that if we had a rigorous mathematical definition of “beliefs” and “agents” and whatever, it would not immediately tell us the answers to algorithms questions like how to build artificial general intelligence or how the brain works.
In terms of definitions: If FEP people someday use FEP to construct a definition of “agent” that reproduces common sense, e.g. I’m an agent, a rock isn’t an agent, a flame isn’t an agent, etc., then I’d be interested to see it. And if I thought that this definition had any kernel of truth, the first thing I’d try to do is reformulate that definition in a way that doesn’t explicitly mention FEP. If I found that this was impossible or pointlessly convoluted, then I would count that as the first good reason to talk about FEP. I am currently skeptical that any part of that is going to actually happen.
As it happens, I have some friends in AGI safety & alignment interested in the question of how to define agency, goals, etc. (e.g. 1,2). I guess I find such work slightly helpful sometimes. I’ve gone back to this post a couple times, for example. (I have not found the FEP-centric discussions in this vein to be helpful.) But by and large, I mostly don’t see it as very central to my main interest of safe and beneficial AGI. Again, if I have I have a robot running a certain algorithm, and I understand that algorithm well enough, then I can answer any question about what the robot will do. And that’s what I care about. If I don’t know a rigorous definition of “what the robot believes”, but I can still answer any question about what the robot will do and why, then I don’t really feel like I’m missing anything important.
You bring up the difficulty of defining the interface separating an algorithm’s actuators from the world. Amusingly, you take that observation as evidence that we need FEP to come up with good definitions, and I take that same observation as evidence that there is no good definition, it’s inherently an arbitrary thing and really there isn’t anything here to define and we shouldn’t be arguing about it in the first place. :-P
In terms of algorithms: I agree that there are interesting questions related to building a reinforcement learning algorithm that is not subject to wishful thinking, or at least that is minimally subject to wishful thinking. (I consider humans to be “minimally subject to wishful thinking”, in the sense that it obviously happens sometimes, but it happens much less than it might happen, e.g. I don’t keep opening my wallet expecting to find a giant wad of cash inside.) I feel like I personally have a pretty good handle on how that potential issue is mitigated in the human brain, after a whole lot of time thinking about it. I think it centrally involves the fact that the human brain use model-based actor-critic RL, and the model is not updated to maximize reward but rather updated using (a generalization of) self-supervised learning on sensory predictions. And likewise the critic is (obviously) not updated to maximize rewards but rather updated to estimate reward using (a variant of) TD learning.
(See my post Reward Is Not Enough for example.)
In principle it’s possible that I would learn something new and helpful about the question of wishful thinking by reading more FEP literature, but my personal experience is very strongly in the opposite direction, i.e. that FEP people are typically way more confused about this topic than pretty much anyone else, e.g. their proposed solutions actually make the problem much much worse, but also make everything sufficiently confusing / obfuscated that they don’t realize it. Basically, a good solution to wishful thinking involves working hard to separate plans/inclinations/desires from beliefs, and in particular updating them in different ways, whereas FEP people want to go in the opposite direction by mixing plans/inclinations/desires & beliefs together into one big unified framework. Interestingly, one of the first things I read when I was starting in that area was an FEP-ish book (Surfing Uncertainty), and I consider that much of my subsequent progress involved “un-learning” many of the ideas I had gotten out of that book. ¯\_(ツ)_/¯
I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
Thanks!
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.