This is quite interesting. It strikes me as perhaps a first-principles derivation of the theory of constructed preferences in behavioral economics.
Compare your
A shard of value refers to the contextually activated computations which are downstream of similar historical reinforcement events … We think that simple reward circuitry leads to different cognition activating in different circumstances. Different circumstances can activate cognition that implements different values, and this can lead to inconsistent or biased behavior. We conjecture that many biases are convergent artifacts of the human training process and internal shard dynamics. People aren’t just randomly/hardcoded to be more or less “rational” in different situations.
According to this view, I aggregate the many diverse aspects of my experience only when called upon to do so for a given purpose, such as making a choice or answering a question about my well-being. … To answer a question about my overall welfare, or to choose between alternatives without deploying a previously constructed rule of thumb, I must weigh the positives against the negatives and construct an answer de novo. …This perspective potentially attributes particular choice anomalies to the vagaries of aggregation. In particular, when I deliberate and aggregate, the weights I attach to the various dimensions of my subjective experience may be sensitive to context.
Values are closely related to preferences, and preferences have been extensively studied in behavioral econ. I’ve written more on the connection between AI and behavioral econ here.
Thanks for the reference (and sorry for just now getting around to replying).
I think Bernheim’s paper is somewhat related to the shard theory of human values. There are several commonalities, including
Rejecting the idea that humans secretly have “true preferences” or “utility functions”
Taking a stand against ad hoc / patchwork / case-by-case explanations of welfare-related decisions
Recognizing the influence of context on decision-making; via “frames” (this work) or “shard activation contexts” (shard theory)
However, I think that shard theory is not a rederivation of this work, or other work mentioned in this paper:
This paper presents a framework for locating decision-making contexts in which people are making ~informed decisions (at a gloss), gathering data within those contexts, and then making inferences about that person’s preferences.
Shard theory aims to predict what kinds of neural circuits get formed given certain initial conditions (like local random initialization of the cortex and certain reward circuitry), and to then draw conclusions about the choices of that learned policy.
That doesn’t mean these works are unrelated. If you want to deeply understand welfare and “idealized preferences” / what people “should” choose, I think that we should understand more about how people make choices, via what neural circuits. This is a question of neuroscience and reinforcement learning theory. The shard theory of human values aims to contribute to that question.
As you pointed out in private correspondence, the shard theory of human values can be viewed as a hypothesis about where the context-sensitive preferences come from.
This is quite interesting. It strikes me as perhaps a first-principles derivation of the theory of constructed preferences in behavioral economics.
Compare your
to Bernheim’s
Values are closely related to preferences, and preferences have been extensively studied in behavioral econ. I’ve written more on the connection between AI and behavioral econ here.
Thanks for the reference (and sorry for just now getting around to replying).
I think Bernheim’s paper is somewhat related to the shard theory of human values. There are several commonalities, including
Rejecting the idea that humans secretly have “true preferences” or “utility functions”
Taking a stand against ad hoc / patchwork / case-by-case explanations of welfare-related decisions
Recognizing the influence of context on decision-making; via “frames” (this work) or “shard activation contexts” (shard theory)
However, I think that shard theory is not a rederivation of this work, or other work mentioned in this paper:
This paper presents a framework for locating decision-making contexts in which people are making ~informed decisions (at a gloss), gathering data within those contexts, and then making inferences about that person’s preferences.
Shard theory aims to predict what kinds of neural circuits get formed given certain initial conditions (like local random initialization of the cortex and certain reward circuitry), and to then draw conclusions about the choices of that learned policy.
That doesn’t mean these works are unrelated. If you want to deeply understand welfare and “idealized preferences” / what people “should” choose, I think that we should understand more about how people make choices, via what neural circuits. This is a question of neuroscience and reinforcement learning theory. The shard theory of human values aims to contribute to that question.
As you pointed out in private correspondence, the shard theory of human values can be viewed as a hypothesis about where the context-sensitive preferences come from.