(Sorry but this chapter has many errors. Rewriting in progress)
In the previous section, we have defined goal and utility within ACI’s framework using Kullback-Leibler divergence. This approach is inspired by Active Inference, which uses KL divergence to describe the surprise between the model and the world, while ACI uses it to describe the difference between a world and the right history. In this chapter, we will show how ACI improves the Active Inference model.
From model of the world to policies
The first principle of Active Inference is change the world or change my mind, which argues that one can either change its model to fit the world or change the world to fit its model, representing the balance between exploration and exploitation.
By focusing on policies rather than models of the world, ACI improves this viewpoint. ACI argues that not all policies are suitable to be categorized as aiming to achieve a specific state of the world. Homeostasis is crucial to organisms, but it’s not everything. While Active Inference is a representation of rational agents, ACI endeavors to go beyond this limitation.
ACI also explains why an agent may prefer one state over another, as well as the mechanisms for acquiring and modifying preferences. Active inference describes preferences as the prior of the agent’s model of the world, but according to ACI, the preference can be improved by creating new history during interacting with the world.
The utility of ACI also demonstrates the exploration-exploitation trade-off. In order to get higher expected utility, one ACI agent may pursue one of two strategies: exploration or exploitation. They can either change the distribution of policy, give stronger policies higher probabilities, or choose the preferred environment, sticking to a safe zone.
Exploitation involves living in a familiar environment that is easier to align with previous higher probability policies, while exploration involves exposing oneself to a wider range of environments, giving more effective policies higher probabilities, in order to achieve future success across a variety of scenarios.
An example
Here is an example borrowed from Active inference:
When a person senses her body temperature higher than expected, she may change her expectation, or take action to lower the body temperature, for example, by opening the window. She would choose the later option because “we are considerably more confident about our core temperature because it underwrites our existence.”
ACI could reinterpret this example, from the perspective of policies and OPD:
One person could build OPD as the posterior distribution of policy given past history that were proved to be right (either learned herself or from others) , which contains behaviors like opening the window when central thermoreceptors have higher outputs.
Some policies such as putting on more clothes when feeling warm, would have lower OPD given those history. One may follow a policy with higher OPD, such as open the window when it feels too hot.
This choice is optimal if she stays in Greenland all her lifetime (also depending on global warming speed), because this policy is always effective and simple.
However, lower my body temperature in various ways such as turning on the AC, opening the window, taking a cold shower, etc. is also an effective policy, but it seems too complex and overkill for an invariant environment. It involves more objects and higher order concepts, thus it has a relatively lower OPD.
Suppose another day the same person feels too warm and opens the window, but finds out she is in Bangkok, where the outer temperature is a little higher. In this case, the lower the body temperature in various ways policy is significantly better than the open the window policy.
Her OPD will update. So will the utilities of some possible worlds, such as those including openable windows. She will opt to turn on the AC if there is one in the room. With the experiences of more different environments, the better policies would have higher probabilities, such as those policies that take moisture and wind chill into consideration.
A Brief Introduction to ACI, 3.5: How ACI improves Active Inference
(Sorry but this chapter has many errors. Rewriting in progress)
In the previous section, we have defined goal and utility within ACI’s framework using Kullback-Leibler divergence. This approach is inspired by Active Inference, which uses KL divergence to describe the surprise between the model and the world, while ACI uses it to describe the difference between a world and the right history. In this chapter, we will show how ACI improves the Active Inference model.
From model of the world to policies
The first principle of Active Inference is change the world or change my mind, which argues that one can either change its model to fit the world or change the world to fit its model, representing the balance between exploration and exploitation.
By focusing on policies rather than models of the world, ACI improves this viewpoint. ACI argues that not all policies are suitable to be categorized as aiming to achieve a specific state of the world. Homeostasis is crucial to organisms, but it’s not everything. While Active Inference is a representation of rational agents, ACI endeavors to go beyond this limitation.
ACI also explains why an agent may prefer one state over another, as well as the mechanisms for acquiring and modifying preferences. Active inference describes preferences as the prior of the agent’s model of the world, but according to ACI, the preference can be improved by creating new history during interacting with the world.
The utility of ACI also demonstrates the exploration-exploitation trade-off. In order to get higher expected utility, one ACI agent may pursue one of two strategies: exploration or exploitation. They can either change the distribution of policy, give stronger policies higher probabilities, or choose the preferred environment, sticking to a safe zone.
Exploitation involves living in a familiar environment that is easier to align with previous higher probability policies, while exploration involves exposing oneself to a wider range of environments, giving more effective policies higher probabilities, in order to achieve future success across a variety of scenarios.
An example
Here is an example borrowed from Active inference:
When a person senses her body temperature higher than expected, she may change her expectation, or take action to lower the body temperature, for example, by opening the window. She would choose the later option because “we are considerably more confident about our core temperature because it underwrites our existence.”
ACI could reinterpret this example, from the perspective of policies and OPD:
One person could build OPD as the posterior distribution of policy given past history that were proved to be right (either learned herself or from others) , which contains behaviors like opening the window when central thermoreceptors have higher outputs.
Some policies such as putting on more clothes when feeling warm, would have lower OPD given those history. One may follow a policy with higher OPD, such as open the window when it feels too hot.
This choice is optimal if she stays in Greenland all her lifetime (also depending on global warming speed), because this policy is always effective and simple.
However, lower my body temperature in various ways such as turning on the AC, opening the window, taking a cold shower, etc. is also an effective policy, but it seems too complex and overkill for an invariant environment. It involves more objects and higher order concepts, thus it has a relatively lower OPD.
Suppose another day the same person feels too warm and opens the window, but finds out she is in Bangkok, where the outer temperature is a little higher. In this case, the lower the body temperature in various ways policy is significantly better than the open the window policy.
Her OPD will update. So will the utilities of some possible worlds, such as those including openable windows. She will opt to turn on the AC if there is one in the room. With the experiences of more different environments, the better policies would have higher probabilities, such as those policies that take moisture and wind chill into consideration.