There is a great deal of confusion regarding the whole point of the FEP research program. Is it a tautology, does it apply to flames, etc. This is unfortunate because the goal of the research program is actually quite interesting: to come up with a good definition of an agent (or any other thing for that matter). That is why FEP proponents embrace the tautology criticism: they are proposing a mathematical definition of ‘things’ (using markov blankets and langevin dynamics) in order to construct precise mathematical notions of more complex and squishy concepts that separate life from non-life. Moreover, they seek to do so in a manner that is compatible with known physical theory. It may seem like overkill to try to nail down how a physical system can form a belief, but its actually pretty critical for anyone who is not a dualist. Moreover, because we dont currently have such a mathematical framework we have no idea if the manner in which we discuss life and thinking is even coherent. This about Russel’s paradox. Prior to late 19th century it was considered so intuitively obvious that a set could be defined by a property that the so called axiom of unrestricted comprehension literally went without saying. Only in the attempt to formalize set theory was it discovered that this axiom had to go. By analogy, only in the attempt to construct formal description of how matter can form beliefs do we have any chance of determining if our notion of ‘belief’ is actually consistent with physical theories.
While I have no idea how to accomplish such an ambitious goal, it seems clear to me that the reinforcement learning paradigm is not suited to the task. This is because, in such a setting, definitions matter and the RL definition of an agent leaves a lot to be desired. In RL, an agent is defined by (a) its sensory and action space, (b) its inference engine, and (c) the reward function it is trying to maximize. Ignoring the matter of identifying sensory and action space, it should be clear this is a practical definition not a principled one as it is under-constrained. This isn’t just because I can add a constant to reward without altering policy or something silly like that, it is because (1) it is not obvious how to identify the sensory and action space and (2) inference and reward are fundamentally conflated. Item (1) leads to questions like does my brain end at the base of my skull, the tips of my fingers, or the items I have arranged on my desk. The Markov blanket component of the FEP attempts to address this, and while I think it still needs work it has the right flavor. Item (2), however, is much more problematic. In RL policies are computed by convolving beliefs (the output of the inference engine) with reward and selecting the best option. This convolution + max operation means that if your model fails to predict behavior it could be because you were wrong about the inference engine or wrong about the reward function. Unfortunately, it is impossible to determine which you were wrong about without additional assumptions. For example, in MaxEntInverseRL one has to assume (incredibly) that inference is Bayes optimal and (reasonably) that equally rewarding paths are equally likely to occur. Regardless, this kind of ambiguity is a hallmark of a bad definition because it relies on a function from beliefs and reward to observations of behavior that is not uniquely invertible.
In contrast, FEP advocates propose a somewhat ‘better’ definition of an agent. This is accomplished by identifying the necessary properties of sensor and action spaces, i.e. they form a Markov blanket and, in a manner similar to that used in systems identification theory, define an agent’s type by the statistics of that blanket. They then replace arbitrary reward functions with negative surprise. Though it doesn’t quite work for very technical reasons, this has the flavor of a good definition and a necessary principle. After all, if an object or agent type is defined by the statistics of its boundary, then clearly a necessary description of what an agent is doing is that it is not straying too far from its definition.
That was more than I intended to write, but the point is that precision and consistency checks require good definitions, i.e. a tautology. On that front, the FEP is currently the only game in town. It’s not a perfect principle and its presentation leaves much to be desired, but it seems to me that something very much like it will be needed if we ever wish to understand the relationship between ‘mind’ and matter.
Your comment is a bit funny to me because I think of algorithms and definitions as kinda different topics. In particular, I think of RL as a family of algorithms, not as a definition of an agent.
If I have I have a robot running an RL algorithm, and I have a sufficiently thorough understanding of how the algorithm works and how the robot works, then I can answer any question about what the robot will do, but I haven’t necessarily made any progress at all on the question of “what are its beliefs?” or “is it an agent?” For example, we can easily come up with edge cases where the question “what do I (Steve) believe?” has an unclear answer, even knowing everything about how I behave and how my brain works. (E.g. if I explicitly deny X but act in other ways as if I believe X, then is X part of my “beliefs”? Or if I give different answers about X depending on context and framing? Etc.)
Conversely, I presume that if we had a rigorous mathematical definition of “beliefs” and “agents” and whatever, it would not immediately tell us the answers to algorithms questions like how to build artificial general intelligence or how the brain works.
In terms of definitions: If FEP people someday use FEP to construct a definition of “agent” that reproduces common sense, e.g. I’m an agent, a rock isn’t an agent, a flame isn’t an agent, etc., then I’d be interested to see it. And if I thought that this definition had any kernel of truth, the first thing I’d try to do is reformulate that definition in a way that doesn’t explicitly mention FEP. If I found that this was impossible or pointlessly convoluted, then I would count that as the first good reason to talk about FEP. I am currently skeptical that any part of that is going to actually happen.
As it happens, I have some friends in AGI safety & alignment interested in the question of how to define agency, goals, etc. (e.g. 1,2). I guess I find such work slightly helpful sometimes. I’ve gone back to this post a couple times, for example. (I have not found the FEP-centric discussions in this vein to be helpful.) But by and large, I mostly don’t see it as very central to my main interest of safe and beneficial AGI. Again, if I have I have a robot running a certain algorithm, and I understand that algorithm well enough, then I can answer any question about what the robot will do. And that’s what I care about. If I don’t know a rigorous definition of “what the robot believes”, but I can still answer any question about what the robot will do and why, then I don’t really feel like I’m missing anything important.
You bring up the difficulty of defining the interface separating an algorithm’s actuators from the world. Amusingly, you take that observation as evidence that we need FEP to come up with good definitions, and I take that same observation as evidence that there is no good definition, it’s inherently an arbitrary thing and really there isn’t anything here to define and we shouldn’t be arguing about it in the first place. :-P
In terms of algorithms: I agree that there are interesting questions related to building a reinforcement learning algorithm that is not subject to wishful thinking, or at least that is minimally subject to wishful thinking. (I consider humans to be “minimally subject to wishful thinking”, in the sense that it obviously happens sometimes, but it happens much less than it might happen, e.g. I don’t keep opening my wallet expecting to find a giant wad of cash inside.) I feel like I personally have a pretty good handle on how that potential issue is mitigated in the human brain, after a whole lot of time thinking about it. I think it centrally involves the fact that the human brain use model-based actor-critic RL, and the model is not updated to maximize reward but rather updated using (a generalization of) self-supervised learning on sensory predictions. And likewise the critic is (obviously) not updated to maximize rewards but rather updated to estimate reward using (a variant of) TD learning.
In principle it’s possible that I would learn something new and helpful about the question of wishful thinking by reading more FEP literature, but my personal experience is very strongly in the opposite direction, i.e. that FEP people are typically way more confused about this topic than pretty much anyone else, e.g. their proposed solutions actually make the problem much much worse, but also make everything sufficiently confusing / obfuscated that they don’t realize it. Basically, a good solution to wishful thinking involves working hard to separate plans/inclinations/desires from beliefs, and in particular updating them in different ways, whereas FEP people want to go in the opposite direction by mixing plans/inclinations/desires & beliefs together into one big unified framework. Interestingly, one of the first things I read when I was starting in that area was an FEP-ish book (Surfing Uncertainty), and I consider that much of my subsequent progress involved “un-learning” many of the ideas I had gotten out of that book. ¯\_(ツ)_/¯
I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.
There is a great deal of confusion regarding the whole point of the FEP research program. Is it a tautology, does it apply to flames, etc. This is unfortunate because the goal of the research program is actually quite interesting: to come up with a good definition of an agent (or any other thing for that matter). That is why FEP proponents embrace the tautology criticism: they are proposing a mathematical definition of ‘things’ (using markov blankets and langevin dynamics) in order to construct precise mathematical notions of more complex and squishy concepts that separate life from non-life. Moreover, they seek to do so in a manner that is compatible with known physical theory. It may seem like overkill to try to nail down how a physical system can form a belief, but its actually pretty critical for anyone who is not a dualist. Moreover, because we dont currently have such a mathematical framework we have no idea if the manner in which we discuss life and thinking is even coherent. This about Russel’s paradox. Prior to late 19th century it was considered so intuitively obvious that a set could be defined by a property that the so called axiom of unrestricted comprehension literally went without saying. Only in the attempt to formalize set theory was it discovered that this axiom had to go. By analogy, only in the attempt to construct formal description of how matter can form beliefs do we have any chance of determining if our notion of ‘belief’ is actually consistent with physical theories.
While I have no idea how to accomplish such an ambitious goal, it seems clear to me that the reinforcement learning paradigm is not suited to the task. This is because, in such a setting, definitions matter and the RL definition of an agent leaves a lot to be desired. In RL, an agent is defined by (a) its sensory and action space, (b) its inference engine, and (c) the reward function it is trying to maximize. Ignoring the matter of identifying sensory and action space, it should be clear this is a practical definition not a principled one as it is under-constrained. This isn’t just because I can add a constant to reward without altering policy or something silly like that, it is because (1) it is not obvious how to identify the sensory and action space and (2) inference and reward are fundamentally conflated. Item (1) leads to questions like does my brain end at the base of my skull, the tips of my fingers, or the items I have arranged on my desk. The Markov blanket component of the FEP attempts to address this, and while I think it still needs work it has the right flavor. Item (2), however, is much more problematic. In RL policies are computed by convolving beliefs (the output of the inference engine) with reward and selecting the best option. This convolution + max operation means that if your model fails to predict behavior it could be because you were wrong about the inference engine or wrong about the reward function. Unfortunately, it is impossible to determine which you were wrong about without additional assumptions. For example, in MaxEntInverseRL one has to assume (incredibly) that inference is Bayes optimal and (reasonably) that equally rewarding paths are equally likely to occur. Regardless, this kind of ambiguity is a hallmark of a bad definition because it relies on a function from beliefs and reward to observations of behavior that is not uniquely invertible.
In contrast, FEP advocates propose a somewhat ‘better’ definition of an agent. This is accomplished by identifying the necessary properties of sensor and action spaces, i.e. they form a Markov blanket and, in a manner similar to that used in systems identification theory, define an agent’s type by the statistics of that blanket. They then replace arbitrary reward functions with negative surprise. Though it doesn’t quite work for very technical reasons, this has the flavor of a good definition and a necessary principle. After all, if an object or agent type is defined by the statistics of its boundary, then clearly a necessary description of what an agent is doing is that it is not straying too far from its definition.
That was more than I intended to write, but the point is that precision and consistency checks require good definitions, i.e. a tautology. On that front, the FEP is currently the only game in town. It’s not a perfect principle and its presentation leaves much to be desired, but it seems to me that something very much like it will be needed if we ever wish to understand the relationship between ‘mind’ and matter.
Thanks for the comment!
Your comment is a bit funny to me because I think of algorithms and definitions as kinda different topics. In particular, I think of RL as a family of algorithms, not as a definition of an agent.
If I have I have a robot running an RL algorithm, and I have a sufficiently thorough understanding of how the algorithm works and how the robot works, then I can answer any question about what the robot will do, but I haven’t necessarily made any progress at all on the question of “what are its beliefs?” or “is it an agent?” For example, we can easily come up with edge cases where the question “what do I (Steve) believe?” has an unclear answer, even knowing everything about how I behave and how my brain works. (E.g. if I explicitly deny X but act in other ways as if I believe X, then is X part of my “beliefs”? Or if I give different answers about X depending on context and framing? Etc.)
Conversely, I presume that if we had a rigorous mathematical definition of “beliefs” and “agents” and whatever, it would not immediately tell us the answers to algorithms questions like how to build artificial general intelligence or how the brain works.
In terms of definitions: If FEP people someday use FEP to construct a definition of “agent” that reproduces common sense, e.g. I’m an agent, a rock isn’t an agent, a flame isn’t an agent, etc., then I’d be interested to see it. And if I thought that this definition had any kernel of truth, the first thing I’d try to do is reformulate that definition in a way that doesn’t explicitly mention FEP. If I found that this was impossible or pointlessly convoluted, then I would count that as the first good reason to talk about FEP. I am currently skeptical that any part of that is going to actually happen.
As it happens, I have some friends in AGI safety & alignment interested in the question of how to define agency, goals, etc. (e.g. 1,2). I guess I find such work slightly helpful sometimes. I’ve gone back to this post a couple times, for example. (I have not found the FEP-centric discussions in this vein to be helpful.) But by and large, I mostly don’t see it as very central to my main interest of safe and beneficial AGI. Again, if I have I have a robot running a certain algorithm, and I understand that algorithm well enough, then I can answer any question about what the robot will do. And that’s what I care about. If I don’t know a rigorous definition of “what the robot believes”, but I can still answer any question about what the robot will do and why, then I don’t really feel like I’m missing anything important.
You bring up the difficulty of defining the interface separating an algorithm’s actuators from the world. Amusingly, you take that observation as evidence that we need FEP to come up with good definitions, and I take that same observation as evidence that there is no good definition, it’s inherently an arbitrary thing and really there isn’t anything here to define and we shouldn’t be arguing about it in the first place. :-P
In terms of algorithms: I agree that there are interesting questions related to building a reinforcement learning algorithm that is not subject to wishful thinking, or at least that is minimally subject to wishful thinking. (I consider humans to be “minimally subject to wishful thinking”, in the sense that it obviously happens sometimes, but it happens much less than it might happen, e.g. I don’t keep opening my wallet expecting to find a giant wad of cash inside.) I feel like I personally have a pretty good handle on how that potential issue is mitigated in the human brain, after a whole lot of time thinking about it. I think it centrally involves the fact that the human brain use model-based actor-critic RL, and the model is not updated to maximize reward but rather updated using (a generalization of) self-supervised learning on sensory predictions. And likewise the critic is (obviously) not updated to maximize rewards but rather updated to estimate reward using (a variant of) TD learning.
(See my post Reward Is Not Enough for example.)
In principle it’s possible that I would learn something new and helpful about the question of wishful thinking by reading more FEP literature, but my personal experience is very strongly in the opposite direction, i.e. that FEP people are typically way more confused about this topic than pretty much anyone else, e.g. their proposed solutions actually make the problem much much worse, but also make everything sufficiently confusing / obfuscated that they don’t realize it. Basically, a good solution to wishful thinking involves working hard to separate plans/inclinations/desires from beliefs, and in particular updating them in different ways, whereas FEP people want to go in the opposite direction by mixing plans/inclinations/desires & beliefs together into one big unified framework. Interestingly, one of the first things I read when I was starting in that area was an FEP-ish book (Surfing Uncertainty), and I consider that much of my subsequent progress involved “un-learning” many of the ideas I had gotten out of that book. ¯\_(ツ)_/¯
I will certainly agree that a big problem for the FEP is related to its presentation. They start with the equations of mathematical physics and show how to get from there to information theory, inference, beliefs, etc. This is because they are trying to get from matter to mind. But they could have gone the other way since all the equations of mathematical physics have an information theoretic derivation that includes a notion of free energy. This means that all the stuff about Langevin dynamics of sparsely connected systems (the ‘particular’ fep) could have been included as a footnote in a much simpler derivation.
As you note, the other problem with the FEP is that it seems to add very little to the dominant RL framework. I would argue that this is because they are really not interested in designing better agents, but rather in figuring out what it means for mind to arise from matter. So basically it is physics inspired philosophy of mind, which does sound like something that has no utility whatsoever. But explanatory paradigms can open up new ways of thinking.
For example, relevant to your interests, it turns out that the FEP definition of an agent has the potential to bypass one of the more troubling AI safety concerns associated with RL. When using RL there is a substantial concern that straight-up optimizing a reward function can lead to undesirable results, i.e. the imperative to ‘end world hunger’ leads to ‘kill all humans’. In contrast, in the standard formulation of the FEP the reward function is replaced by a stationary distribution over actions and outcomes. This suggests the following paradigm for developing a safer AI agent. Observe human decision making in some area to get a stationary distribution over actions and outcomes that are considered acceptable but perhaps not optimal. Optimize the free energy of the expected future (FEEF) applied to the observed distribution of actions and outcomes (instead of just outcomes as is usually done) to train an agent to reproduce human decision-making behavior. Assuming it works you now have an automated decision maker that, on average, replicates human behavior, i.e you have an agent that is weakly equivalent to the average human. Now suppose that there are certain outcomes that we would like to make happen more frequently than human decision-makers have been able to achieve, but don’t want the algorithm to take any drastic actions. No problem: train a second agent to produce this new distribution of outcomes while keeping the stationary distribution over actions the same.
This is not guaranteed to work as some outcome distributions are inaccessible, but one could conceive an iterative process where you explore the space of accessible outcome distributions by slightly perturbing the outcome distribution and retraining and repeating...
I’m generally in favor of figuring out how to make AIs that are inclined to follow human norms / take human-typical actions / share human-typical preferences and can still do human-level R&D etc.
That seems hard basically for reasons here (section “recognizing human actions in data”) and the fact that most actions are invisible (e.g. brainstorming), and the fact that innovation inevitably entails going out of distribution.
But first I have a more basic question where I’m confused about your perspective:
Do you think it’s possible for the exact same source code to be either a “FEP agent” or an “RL agent” depending on how you want to think about things?
Or do you think “FEP agent” and “RL agent” are two different families of algorithms? (Possibly overlapping?)
And if it’s this, do you think both families of algorithms include algorithms that can do human-level scientific R&D, invent tools, design & build factories, etc.? Or if you think just one of the two families of algorithms can do those things, which one?
And which family of algorithms (or both) would you put the human brain into?
Thanks!
The short answer is that, in a POMDP setting, FEP agents and RL agents can be mapped one onto the other via appropriate choice of reward function and inference algorithm. One of the goals of the FEP is to come with a normative definition of the reward function (google the misleadingly titled “optimal control without cost functions” paper or, for a non-FEP version of the same, thing google the accurately titled: “Revisiting Maximum Entropy Inverse Reinforcement Learning”). Despite the very different approaches, the underlying mathematics is very similar as both are strongly tied to KL control theory and Jaynes’ maximum entropy principle. But the ultimate difference between FEP and RL in a POMDP setting is how an agent is defined. RL needs an inference algorithm and a reward function that operates on action and outcomes, R(o,a). The FEP needs stationary blanket statistics, p(o,a), and nothing else. The inverse reinforcement paper shows how to go from p(o,a) to a unique R(o,a) assuming a bayes optimal RL agent in a MDP setting. Similarly, if you start with R(o,a) and optimize it, you get a stationary distribution, p(o,a). This distribution is also unique under some ‘mild’ conditions. So they are more or less equivalent in terms of expressive power. Indeed, you can generalize all this crap to show any subsystem of any physical system can be mathematically described as Bayes optimal RL agent. You can even identify the reward function with a little work. I believe this is why we intuitively anthropomorphize physical systems, i.e. when we say things like they system is “seeking” a minimum energy state.
But regardless, from a pragmatic perspective they are equally expressive mathematical systems. The advantage of one over the other depends upon your prior knowledge and goals. If you know the reward function and have knowledge of how the world works use RL. If you know the reward function but are in a POMDP setting without knowledge of how the world works, use an information seeking version of RL (maxentRL or BayesianRL). If you dont know the reward function but do know how the world works and have observations of behavior use max ent inverseRL).
The problem with RL is that its unclear how to use it when you don’t know how the world works and you don’t know what the reward function is, but do have observations of behavior. This is the situation when you are modeling behavior as in the url you cited. In this setting, we don’t know what model humans are using to form their inferences and we don’t know what motivates their behavior. If we are lucky we can glean some notion of their policy by observing behavior, but usually that notion is very coarse i.e. we may only know the average distribution of their actions and observations, p(o,a). The utility of the FEP is that p(o,a) defines the agent all by itself. This means we can start with a policy and infer both belief and reward. This is not something RL was designed to do. RL is for going from reward and belief (or belief formation rules) to policy, not the other way around. IRL can go backward, but only if your beliefs are Bayes optimal.
As for the human brain, I am fully committed to the Helmholtzian notion that the brain is a statistical learning machine as in the Bayesian brain hypothesis with the added caveat that it is important to remember that the brain is massively suboptimal.
Thanks again! Feel free to stop responding if you’re busy.
Here’s where I’m at so far. Let’s forget about human brains and just talk about how we should design an AGI.
One thing we can do is design an AGI whose source code straightforwardly resembles model-based reinforcement learning. So the code has data structures for a critic / value-function, and a reward function, and TD learning, and a world-model, and so on.
This has the advantage that (I claim) it can actually work. (Not today, but after further algorithmic progress.) After all, I am confident that human brains have all those things (critic ≈ striatum, TD learning ≈ dopamine, etc.) and the human brain can do lots of impressive things, like go to the moon invent and quantum mechanics and so on.
But it also has a disadvantage that it’s unclear how to make the AGI motivated to act in a human-like way and follow human norms. It’s not impossible, evidently—after all, I am a human and I am at least somewhat motivated to follow human norms, and when someone I greatly admire starts doing X or wanting X, then I also am more inclined to start doing X and wanting X. But it’s unclear how this works in terms of my brain’s reward functions / loss functions / neural architecture / whatever—or at least, it’s presently unclear to me. (It is one of my major areas of research interest, and I think I’m making gradual progress, but as of now I don’t have any good & complete answer.)
A different thing we can do is design an AGI whose source code straightforwardly resembles active inference / FEP. So the code has, umm, I’m not sure, something about generative models and probability distributions? But it definitely does NOT have a reward function or critic / value-function etc.
This has the advantage that (according to you, IIUC) there’s a straightforward way to make the AGI act in a human-like way and follow human norms.
And it has the disadvantage that I’m somewhat skeptical that it will ever be possible to actually code up an AGI that way.
So I’m pretty confused here.
For one thing, I’m not yet convinced that the first bullet point is actually straightforward (or even possible). Maybe I didn’t follow your previous response. Some of my concerns are: (1) most human actions are not visible (e.g. deciding what to think about, recalling a memory), (2) even the ones that are visible in principle are very hard to extract in practice (e.g. did I move deliberately or did was that a random jostle or gust of wind?) (3) almost all “outcomes” of interest in the AGI context are outcomes that have never happened in the training data, e.g. the AGI can invent a new gadget which no human had ever previously invented. So I’m not sure how you get p(o,a) from observations of humans.
For the second thing, among other issues, it seems to me that building a beyond-human-level understanding of the world requires RL-type trial-and-error exploration, for reasons in Section 1.1 here.