This paper (Keno Juechems & Christopher Summerfield: Where does value come from? Trends in Cognitive Sciences, 2019) seems interesting from an “understanding human values” perspective.
Abstract: The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making.
Some choice quotes:
We suggest that, during learning, humans form new setpoints pertaining to cognitive goals. For example, we might represent current and desired states on axes pertaining to financial stability, moral worth, or physical health as well as hunger, thirst, or temperature. [...] This theory proposes that current states and goals are encoded in a multidimensional ‘value map’. Motivated behaviour can then be seen as an attempt to minimise the maximum distance to setpoints in this value space. Repurposing this framework for cognitive settings, agents commit to policies that focus on purposively driving the current state towards setpoints on a particular goal dimension, such as caching resources, building a shelter, obtaining a mate, or enhancing professional status. In doing so, their ultimate goal is to maintain equilibrium among all goal states, achieving what might be popularly characterised as a state of ‘wellbeing’. [...]
More generally, we argue that some of the most complex and abstract decisions that humans make might be better described by a process that optimises over states, rather than rewards. For example, consider a high-school student choosing a career path. Under the (model-based) RL framework, the student must consider an impossibly large number of potential futures and select whichever is going to be most rewarding. This appears to imply the devotion of disproportionate levels of computational resources to the search problem. The approach advocated here implies that they first select a goal state (e.g., become a lawyer) and then takes actions that minimise distance to that goal. For example, they seek to go to law school; to maximise their chances of acceptance, they first study hard for their exams; this in turn influences decisions about whether to socialise with friends. This explanation appears to accord better with our common sense intuition of how the complex choices faced by humans are made. However, the computations involved may build upon more phylogenetically ancient mechanisms. For example, one of the most prominent theories of insect navigation proposes that, to reach their home base, central-place foragers, such as honey bees (and desert ants), initially encode an egocentric snapshot of their base and, subsequently, on the return journey, use a similarity-matching process to gradually reach their goal [28]. This implies that they are similarly performing gradient descent over states, akin to the process proposed here. [...]
An appealing aspect of this framework is that it provides a natural way to understand the affective states that pervade our everyday mental landscape, including satisfaction (goal completion), frustration (goal obstruction), and disappointment (goal abandonment), which have largely eluded computational description thus far [14]. [...]
The natural world is structured in such a way that some states are critical for survival or have substantial impact on long-run future outcomes. For example, the student introduced above might work hard to pass their exams in the knowledge that it will open up interesting career opportunities. These states are often attained when accumulated resources reach, or fall below, a critical threshold. Behavioural ecologists have argued that the risky foraging behaviour of animals adapts to satisfy a ‘budget rule’ that seeks to maintain energetic resources at aspirational levels that safely offset future scarcity. For example, birds make risky foraging choices at dusk to accrue sufficient energy to survive a cold night [45]. This view is neatly accommodated within the framework proposed here, in that the aspiration level reflects the setpoint against which current resource levels are compared, and the driver of behaviour is the disparity between current state and goal.
This framework of having multiple axes representing different goals, and trying to minimize the sum of distances to their setpoints, also reminds me a bit of moridinamael’s Complex Behavior from Simple (Sub)Agents.
This paper (Keno Juechems & Christopher Summerfield: Where does value come from? Trends in Cognitive Sciences, 2019) seems interesting from an “understanding human values” perspective.
Some choice quotes:
This framework of having multiple axes representing different goals, and trying to minimize the sum of distances to their setpoints, also reminds me a bit of moridinamael’s Complex Behavior from Simple (Sub)Agents.