We should expect an evolutionarily successful organism to develop concepts that abstract over situations that are similar with regards to receiving a reward from the optimal reward function. Suppose that a certain action in state s1 gives the organism a reward, and that there are also states s2–s5 in which taking some specific action causes the organism to end up in s1. Then we should expect the organism to develop a common concept for being in the states s2–s5, and we should expect that concept to be “more similar” to the concept of being in state s1 than to the concept of being in some state that was many actions away. [...]
I suggest that human values are concepts which abstract over situations in which we’ve previously received rewards, making those concepts and the situations associated with them valued for their own sake.
(Defining Human Values for Value Learners, p. 3)
Let me put this in terms of the locality example from your previous post:
Suppose that state s1 is “me having a giant stack of money”; in this state, it is easy for me to spend the money in order to get something that I value. Say that states s2−s5 are ones in which I don’t have the giant stack of money, but the money is fifty steps away from me, either to my front (s2), my right (s3), my back (s4), or my left (s5). Intuitively, all of these states are similar to each other in the sense that I can just take some steps in a particular direction and then I’ll have the money; my cognitive system can then generate the concept of “the money being within reach”, which refers to all of these states.
Being in a state where the money is within reach is useful for me—it lets me move from the money being in reach to me actually having the money, which in turns lets me act to obtain a reward. Because of this, I come to value the state/concept of “having money within reach”.
Now consider the state in which the pile of money is on the moon. For all practical purposes, it is no longer within my reach; moving it further out continues to keep it beyond my reach. In other words, I cannot move from the state smoney−on−moon to s1. Neither can I move from the state smoney−on−alpha−centauri to s1. As there isn’t a viable path from either state to s1, I can generate the concept of “the money is unreachable” which abstracts over these states. Intuitively, an action that shifts the world from smoney−on−moon to smoney−on−alpha−centauri or back is low impact because either of those transitions maintains the general “the money is unreachable” state. Which means that there’s no change to my estimate of whether I can use the money to purchase things that I want.
(See also the suggestion here that we choose life goals by selecting e.g. a state of “I’m a lawyer” as the goal because from the point of view of achieving our needs, that seems like a generally good state to be in. We then take actions to minimize our distance from that state.)
If I’m unaware of the mental machinery which generates this process of valuing, I naively think that I’m valuing the states themselves (e.g. the state of having money within the reach), when the states are actually just instrumental values for getting me the things that I actually care about (some deeper set of fundamental human needs, probably).
Then, if one runs into an ontological crisis, one can in principle re-generate their ontology by figuring out how to reason in terms of the new ontology in order to best fulfill their values. I believe this to have happened with at least one historical “ontological crisis”:
As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control?
The US Congress chose to the latter, designating the airways as public, with the Supreme Court choosing to uphold the decision in a 1946 case. Justice Douglas wrote in the court’s majority that
The air is a public highway, as Congress has declared. Were that not true, every transcontinental flight would subject the operator to countless trespass suits. Common sense revolts at the idea.
(Defining Human Values for Value Learners, p. 2)
In a sense, people had been reasoning about land ownership in terms of a two-dimensional ontology: one owned everything within an area that was defined in terms of two dimensions. The concept for land ownership had left the exact three-dimensional size undefined (“to an indefinite extent, upwards”), because for as long as airplanes didn’t exist, incorporating this feature into our definition had been unnecessary. Once air travel became possible, our ontology was revised in such a way as to better allow us to achieve our other values.
I don’t know the exact cognitive process by which it was decided that you didn’t need the landowner’s permission to fly over their land. But I’m guessing that it involved reasoning like: if the plane flies at a sufficient height, then that doesn’t harm the landowner in any way. Flying would become impossible difficult if you had to get separate permission from every person whose land you were going to fly over. And, especially before the invention of radar, a ban on unauthorized flyovers would be next to impossible to enforce anyway.
We might say that after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values.
(“What are concepts for, and how to deal with alien concepts”)
One complication here is that, so far, this suggests a relatively simplistic picture: we have some set of innate needs which we seek to fulfill; we then come to instrumentally value concepts (states) which help us fulfill those needs. If this was the case, then we could in principle re-derive our instrumental values purely from scratch if ontology changes forced us to do so. However, to some extent humans seem to also internalize instrumental values as intrinsic ones, which complicates things:
In most artificial RL agents, reward and value are kept strictly separate. In humans (and mammals in general), this doesn’t seem to work quite the same way. Rather, if there are things or behaviors which have once given us rewards, we tend to eventually start valuing them for their own sake. If you teach a child to be generous by praising them when they share their toys with others, you don’t have to keep doing it all the way to your grave. Eventually they’ll internalize the behavior, and start wanting to do it. One might say that the positive feedback actually modifies their reward function, so that they will start getting some amount of pleasure from generous behavior without needing to get external praise for it. In general, behaviors which are learned strongly enough don’t need to be reinforced anymore (Pryor 2006).
Why does the human reward function change as well? Possibly because of the bootstrapping problem: there are things such as social status that are very complicated and hard to directly encode as “rewarding” in an infant mind, but which can be learned by associating them with rewards. One researcher I spoke with commented that he “wouldn’t be at all surprised” if it turned out that sexual orientation was learned by men and women having slightly different smells, and sexual interest bootstrapping from an innate reward for being in the presence of the right kind of a smell, which the brain then associated with the features usually co-occurring with it. His point wasn’t so much that he expected this to be the particular mechanism, but that he wouldn’t find it particularly surprising if a core part of the mechanism was something that simple. Remember that incest avoidance seems to bootstrap from the simple cue of “don’t be sexually interested in the people you grew up with”.
(“What are concepts for, and how to deal with alien concepts”)
So for figuring out how to deal with ontological shifts, we would also need to figure out how to distinguish between intrinsic and instrumental values. When writing these posts and the paper, I was thinking in terms of our concepts having some kind of an affect value (or more specifically, valence value) which was learned and computed on a context-sensitive basis by some machinery which was left unspecified.
Currently, I think more in terms of subagents, with different subagents valuing various concepts in complicated ways which reflect a number of strategic considerations as well as the underlying world-models of those subagents.
I also suspect that there might be something to Ziz’s core-and-structure model, under which we generally don’t actually internalize new values to the level of taking them as intrinsic values after all. Rather, there is just a fundamental set of basic desires (“core”), and increasingly elaborate strategic and cached reasons for acting in various ways and valuing particular things (“structure”). But these remain separate in the sense that the right kind of belief update can always push through a value update which changes the structure (your internalized instrumental values), if the overall system becomes sufficiently persuaded of the change being a better way of fulfilling its fundamental basic desires. For example, an athlete may feel like sports are a fundamental part of their identity, but if they ever became handicapped and forced to retire from sports, they could eventually adjust their identity.
(It’s an interesting question whether there’s some broader class of “forced ontological shifts” for which the “standard” ontological crises are a special case. If an athlete is forced to revise their ontology and what they care about because they become disabled, then that is not an ontological crisis in the usual sense. But arguably, it is a process which starts from the athlete receiving the information that they can no longer do sports, and forces them to refactor part of their ontology to create new concepts and identities to care about, now that the old ones are no longer as useful for furthering their values. In a sense, this is the same kind of a process as in an ontological crisis: a belief update forcing a revision of the ontology, as the old ontology is no longer a useful tool for furthering one’s goals.)
Then, if one runs into an ontological crisis, one can in principle re-generate their ontology by figuring out how to reason in terms of the new ontology in order to best fulfill their values.
I’ve found myself confused by how the process at the end of this sentence works. It seems like there’s some abstract “will this worldview lead to value fulfillment?” question being asked, even though the core values seem undefined during an ontological crisis! I agree that once you can regenerate the ontology once you have the core values redefined.
I don’t think that the real core values are affected during most ontological crises. I suspect that the real core values are things like feeling loved vs. despised, safe vs. threatened, competent vs. useless, etc. Crucially, what is optimized for is a feeling, not an external state.
Of course, the subsystems which compute where we feel on those axes need to take external data as input. I don’t have a very good model of how exactly they work, but I’m guessing that their internal models have to be kept relatively encapsulated from a lot of other knowledge, since it would be dangerous if it was easy to rationalize yourself into believing that you e.g. were loved when everyone was actually planning to kill you. My guess is that the computation of the feelings bootstraps from simple features in your sensory experience, such as an infant being innately driven to make their caregivers smile, and that simple pattern-detector of a smile then developing to an increasingly sophisticated model of what “being loved” means.
But I suspect that even the more developed versions of the pattern detectors are ultimately looking for patterns in your direct sensory data, such as detecting when a romantic partner does something that you’ve learned to associate with being loved.
It’s those patterns which cause particular subsystems to compute things like the feeling of being loved, and it’s those feelings that other subsystems treat as the core values to optimize for. Ontologies are generated so as to help you predict how to get more of those feelings, and most ontological crises don’t have an effect on how they are computed from the patterns, so most ontological crises don’t actually change your real core values. (One exception being if you manage to look at the functioning of your mind closely enough to directly challenge the implicit assumptions that the various subsystems are operating on. That can get nasty for a while.)
Great sequence!
It didn’t occur to me to apply the notion to questions of limited impact, but I arrived at a very similar model when trying to figure out how humans navigate ontological crises. In the LW articles “The problem of alien concepts” and “What are concepts for, and how to deal with alien concepts”, as well as my later paper “Defining Human Values for Value Learners”, I was working with the premise that ontologies (which I called “concepts”) are generated as a tool which lets us fulfill our primary values:
(Defining Human Values for Value Learners, p. 3)
Let me put this in terms of the locality example from your previous post:
Suppose that state s1 is “me having a giant stack of money”; in this state, it is easy for me to spend the money in order to get something that I value. Say that states s2−s5 are ones in which I don’t have the giant stack of money, but the money is fifty steps away from me, either to my front (s2), my right (s3), my back (s4), or my left (s5). Intuitively, all of these states are similar to each other in the sense that I can just take some steps in a particular direction and then I’ll have the money; my cognitive system can then generate the concept of “the money being within reach”, which refers to all of these states.
Being in a state where the money is within reach is useful for me—it lets me move from the money being in reach to me actually having the money, which in turns lets me act to obtain a reward. Because of this, I come to value the state/concept of “having money within reach”.
Now consider the state in which the pile of money is on the moon. For all practical purposes, it is no longer within my reach; moving it further out continues to keep it beyond my reach. In other words, I cannot move from the state smoney−on−moon to s1. Neither can I move from the state smoney−on−alpha−centauri to s1. As there isn’t a viable path from either state to s1, I can generate the concept of “the money is unreachable” which abstracts over these states. Intuitively, an action that shifts the world from smoney−on−moon to smoney−on−alpha−centauri or back is low impact because either of those transitions maintains the general “the money is unreachable” state. Which means that there’s no change to my estimate of whether I can use the money to purchase things that I want.
(See also the suggestion here that we choose life goals by selecting e.g. a state of “I’m a lawyer” as the goal because from the point of view of achieving our needs, that seems like a generally good state to be in. We then take actions to minimize our distance from that state.)
If I’m unaware of the mental machinery which generates this process of valuing, I naively think that I’m valuing the states themselves (e.g. the state of having money within the reach), when the states are actually just instrumental values for getting me the things that I actually care about (some deeper set of fundamental human needs, probably).
Then, if one runs into an ontological crisis, one can in principle re-generate their ontology by figuring out how to reason in terms of the new ontology in order to best fulfill their values. I believe this to have happened with at least one historical “ontological crisis”:
(Defining Human Values for Value Learners, p. 2)
In a sense, people had been reasoning about land ownership in terms of a two-dimensional ontology: one owned everything within an area that was defined in terms of two dimensions. The concept for land ownership had left the exact three-dimensional size undefined (“to an indefinite extent, upwards”), because for as long as airplanes didn’t exist, incorporating this feature into our definition had been unnecessary. Once air travel became possible, our ontology was revised in such a way as to better allow us to achieve our other values.
(“What are concepts for, and how to deal with alien concepts”)
One complication here is that, so far, this suggests a relatively simplistic picture: we have some set of innate needs which we seek to fulfill; we then come to instrumentally value concepts (states) which help us fulfill those needs. If this was the case, then we could in principle re-derive our instrumental values purely from scratch if ontology changes forced us to do so. However, to some extent humans seem to also internalize instrumental values as intrinsic ones, which complicates things:
(“What are concepts for, and how to deal with alien concepts”)
So for figuring out how to deal with ontological shifts, we would also need to figure out how to distinguish between intrinsic and instrumental values. When writing these posts and the paper, I was thinking in terms of our concepts having some kind of an affect value (or more specifically, valence value) which was learned and computed on a context-sensitive basis by some machinery which was left unspecified.
Currently, I think more in terms of subagents, with different subagents valuing various concepts in complicated ways which reflect a number of strategic considerations as well as the underlying world-models of those subagents.
I also suspect that there might be something to Ziz’s core-and-structure model, under which we generally don’t actually internalize new values to the level of taking them as intrinsic values after all. Rather, there is just a fundamental set of basic desires (“core”), and increasingly elaborate strategic and cached reasons for acting in various ways and valuing particular things (“structure”). But these remain separate in the sense that the right kind of belief update can always push through a value update which changes the structure (your internalized instrumental values), if the overall system becomes sufficiently persuaded of the change being a better way of fulfilling its fundamental basic desires. For example, an athlete may feel like sports are a fundamental part of their identity, but if they ever became handicapped and forced to retire from sports, they could eventually adjust their identity.
(It’s an interesting question whether there’s some broader class of “forced ontological shifts” for which the “standard” ontological crises are a special case. If an athlete is forced to revise their ontology and what they care about because they become disabled, then that is not an ontological crisis in the usual sense. But arguably, it is a process which starts from the athlete receiving the information that they can no longer do sports, and forces them to refactor part of their ontology to create new concepts and identities to care about, now that the old ones are no longer as useful for furthering their values. In a sense, this is the same kind of a process as in an ontological crisis: a belief update forcing a revision of the ontology, as the old ontology is no longer a useful tool for furthering one’s goals.)
I really like this line of thinking.
I’ve found myself confused by how the process at the end of this sentence works. It seems like there’s some abstract “will this worldview lead to value fulfillment?” question being asked, even though the core values seem undefined during an ontological crisis! I agree that once you can regenerate the ontology once you have the core values redefined.
Thanks! I’ve really liked yours, too.
I don’t think that the real core values are affected during most ontological crises. I suspect that the real core values are things like feeling loved vs. despised, safe vs. threatened, competent vs. useless, etc. Crucially, what is optimized for is a feeling, not an external state.
Of course, the subsystems which compute where we feel on those axes need to take external data as input. I don’t have a very good model of how exactly they work, but I’m guessing that their internal models have to be kept relatively encapsulated from a lot of other knowledge, since it would be dangerous if it was easy to rationalize yourself into believing that you e.g. were loved when everyone was actually planning to kill you. My guess is that the computation of the feelings bootstraps from simple features in your sensory experience, such as an infant being innately driven to make their caregivers smile, and that simple pattern-detector of a smile then developing to an increasingly sophisticated model of what “being loved” means.
But I suspect that even the more developed versions of the pattern detectors are ultimately looking for patterns in your direct sensory data, such as detecting when a romantic partner does something that you’ve learned to associate with being loved.
It’s those patterns which cause particular subsystems to compute things like the feeling of being loved, and it’s those feelings that other subsystems treat as the core values to optimize for. Ontologies are generated so as to help you predict how to get more of those feelings, and most ontological crises don’t have an effect on how they are computed from the patterns, so most ontological crises don’t actually change your real core values. (One exception being if you manage to look at the functioning of your mind closely enough to directly challenge the implicit assumptions that the various subsystems are operating on. That can get nasty for a while.)