The core claim in this post is that our brains model the world as though there’s a thing called “our values”, and tries to learn about those values in the usual epistemic way.
I find that a very strange idea, as strange as Plato’s Socrates’ parallel idea that learning is not the acquisition of something new, but recollection of what one had forgotten.
If I try X, anticipating that it will be an excellent experience, and find it disappointing, I have not learned something about my values, but about X.
I have never eaten escamoles. If I try them, what I will discover is what they are like to eat. If I like them, did I always like them? That is an unheard-falling-trees question.
If I value a thing at one period of life and turn away from it later, I have not discovered something about my values. My values have changed. In the case of the teenager we call this process “maturing”. Wine maturing in a barrel is not becoming what it always was, but simply becoming, according to how the winemaker conducts the process.
But people have this persistent illusion that how they are today is how they always were and always will be, and that their mood of the moment is their fundamental nature, despite the evidence of their own memory.
In Bayesian decision theory, there’s the distinction between expected utility, which changes as one learns about the environment, and actual utility, which does not. Under this frame, I’d be inclined to round you off to using the words “values”/”liking”/etc. to refer to expected utility. Would you agree with that? If not, why not?
It might be tempting to round the OP off to use the word “values”/”ought” to refer to actual utility, but the details of that are kind of awkward at the edges so I would hold off on that.
Regardless of the issues with my own position, I’m confused about your worldview. Do you not have a distinction between expected and actual utility, or do you consider there to be two different kinds of changes in values? How do you model value of information? (If you do model it, that is.)
Expected utility is what you have before the outcome of an action is known. Actual utility is what you have after the outcome is known. Here, the utility function has remained the same and you have acquired knowledge of the outcome.
Someone no longer finding a thing valuable that they used to, has either re-evaluated the thing in the light of new information about it, or changed the value they (their utility function) put on it.
So you’re basically working with a maximally-shattered model of agency where life consists of a bunch of independent activities that can be fully observed post-hoc and which have no connection between them?
So e.g. if you sometimes feel like eating one kind of food and other times feel like eating another kind of food, you just think “ah, my food preference arbitrarily changed”, not “my situation changed to make so that the way to objectively improve my food intake is different now than it was in the past”?
No. I can’t make any sense of where that came from.
So e.g. if you sometimes feel like eating one kind of food and other times feel like eating another kind of food, you just think “ah, my food preference arbitrarily changed”, not “my situation changed to make so that the way to objectively improve my food intake is different now than it was in the past”?
No, there is simply no such thing as a utility function over foodstuffs.
I’m basically confused about what a canonical example of changing values looks like to you. Like I assume you have some examples that make you postulate that it is possible or something. I’ve seen changes in food taste used as a canonical example before, but if that’s not your example, then I would like to hear what your example is.
One example would be the generic one from the OP: “As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it’s a bit embarrassing in hindsight.” This hypothetical teenager’s values (I suggest, in disagreement with the OP) have changed. Their knowledge about the world has no doubt also changed, but I see no need to postulate some unobservable deeper value underlying their earlier views that has remained unchanged, only their knowledge about Z having changed.
Long-term lasting changes in one’s food preferences might also count, but not the observation that whatever someone has for lunch, they are less likely to have again for dinner.
Utility theory is overrated. There is a certain mathematical neatness to it for “small world” problems, where you know all of the possible actions and their possible effects, and the associated probabilities and payoffs, and you are just choosing the best action, once. Eliezer has described the situation as like a set of searchlights coherently pointing in the same direction. But as soon as you try to make it a universal decision theory it falls apart for reasons that are like another set of searchlights pointing off in all directions, such as unbounded utility, St Petersburg-like games, “outcomes” consistsing of all possible configurations of one’s entire future light-cone, utility monsters, repugnant conclusions, iterated games, multi-player games, collective utility, agents trying to predict each other, and so on, illuminating a landscape of monsters surrounding the orderly little garden of VNM-based utility.
I don’t have one. What would I use it for? I don’t think anyone else yet has one, at least not something mathematically founded, with the simplicity and inevitability of VNM. People put forward various ideas and discuss the various “monsters” I listed, but I see no sign of a consensus.
Here is an analogy. Classical utility theory, as developed by VNM, Savage, and others, the theory of which Eliezer made the searchlight comment, is like propositional calculus. The propositional calculus exists, it’s useful, you cannot ever go against it without falling into contradiction, but there’s not enough there to do much mathematics. For that you need to invent at least first-order logic, and use that to axiomatise arithmetic and eventually all of mathematics, while fending off the paradoxes of self-reference. And all through that, there is the propositional calculus, as valid and necessary as ever, but mathematics requires a great deal more.
The theory that would deal with the “monsters” that I listed does not yet exist. The idea of expected utility may thread its way through all of that greater theory when we have it, but we do not have it. Until we do, talk of the utility function of a person or of an AI is at best sensing what Eliezer has called the rhythm of the situation. To place over-much reliance on its letter will fail.
But propositional calculus and first-order logic exist to support mathematics, which was developed before formal logix. What’s your mathematics-of-value, rather than your logic-of-value?
That was an analogy, a similarity between two things, not an isomorphism.
The mathematics of value that you are asking for is the thing that does not exist yet. People, including me, muddle along as best they can; sometimes at less than that level. Post-rationalists like David Chapman valorise this as “nebulosity”, but I don’t think 19th century mathematicians would have been well served by that attitude.
Richard Jeffrey has a nice utility theory which applies to a Boolean algebra of propositions (instead of e.g. to Savage’s acts/outcomes/states of the world), similar to probability theory.
In fact, it consists of just two axioms plus the three probability axioms.
The theory doesn’t involve time, like probability theory. It also applies to just one agent, again like probability theory.
It doesn’t solve all problems, but neither does probability theory, which e.g. doesn’t solve the sleeping beauty paradox.
Do you nonetheless think utility theory is significantly more problematic than probability theory? Or do you reject both?
Utility theory is significantly more problematic than probability theory.
In both cases, from certain axioms, certain conclusions follow. The difference is in the applicability of those axioms in the real world. Utility theory is supposedly about agents making decisions, but as I remarked earlier in the thread, these are “agents” that make just one decision and stop, with no other agents in the picture.
I have read that Morgenstern was surprised that so much significance was read into the VNM theorem on its publication, when he and von Neumann had considered it to be a rather obvious and minor thing, relegated to the appendix of their book. I have come to agree with that assessment.
[Jeffrey’s] theory doesn’t involve time, like probability theory. It also applies to just one agent, again like probability theory.
Probability theory is not about agents. It is about probability. It applies to many things, including processes in time.
That people fail to solve the Sleeping Beauty paradox does not mean that probability theory fails. I have never paid the problem much attention, but Ape in the coat’s analysis seems convincing to me.
I mean in a subjective interpretation, a probability function represents the beliefs of one person at one point in time. Equally, a (Jeffrey) utility function can represent the desires of one person at one particular point in time. As such it is a theory of what an agent believes and wants.
Decisions can come into play insofar individual actions can be described by propositions (“I do A”, “I do B”) and each of those propositions is equivalent to a disjunction of the form “I do A and X happens or I do A and not-X happens”, which is subject to the axioms. But decisions is not something which is baked into the theory, much like probability theory isn’t necessarily about urns and gambles.
If I value a thing at one period of life and turn away from it later, I have not discovered something about my values. My values have changed. In the case of the teenager we call this process “maturing”. Wine maturing in a barrel is not becoming what it always was, but simply becoming, according to how the winemaker conducts the process.
Your values change according to the process of reflection—the grapes mature into wine through fun chemical reactions.
From what you wrote, it feels like you are mostly considering your ‘first-order values’. However, you have an updating process that you also have values about. Like that I wouldn’t respect simple mind control that alters my first-order values, because my values consider mind-control as disallowed.
Similar to why I wouldn’t take a very potent drug even if I know my first-order values would rank the feeling very highly, because I don’t endorse that specific sort of change.
I have never eaten escamoles. If I try them, what I will discover is what they are like to eat. If I like them, did I always like them? That is an unheard-falling-trees question.
Then we should split the question.
Do you have a value for escamoles specifically before eating them? No.
Do you have a system of thought (of updating your values) that would ~always result in liking escamoles? Well, no in full generality. You might end up with some disease that affects your tastebuds permanently. But in some reasonably large class of normal scenarios, your values would consistently update in a way that would end up liking escamoles were you to ever eat them.
(But really, the value for escamoles is more instrumental of a value for [insert escamole flavor, texture, etc.] here, that the escamoles are learned to be a good instance of.)
What johnwentworth mentions would then be the question of “Would this approved process of updating my values converge to anything”; or tend to in some reasonable reference class; or at least have some guaranteed properties that aren’t freely varying.
I don’t think he is arguing that the values are necessarily fixed and always persistent (I certainly don’t always handle my values according to my professed beliefs about how I should updatethem), but that they’re constrained. That the brain also models them as reasonably constrained, and that you can learn important properties of them.
I find that a very strange idea, as strange as Plato’s Socrates’ parallel idea that learning is not the acquisition of something new, but recollection of what one had forgotten.
If I try X, anticipating that it will be an excellent experience, and find it disappointing, I have not learned something about my values, but about X.
I have never eaten escamoles. If I try them, what I will discover is what they are like to eat. If I like them, did I always like them? That is an unheard-falling-trees question.
If I value a thing at one period of life and turn away from it later, I have not discovered something about my values. My values have changed. In the case of the teenager we call this process “maturing”. Wine maturing in a barrel is not becoming what it always was, but simply becoming, according to how the winemaker conducts the process.
But people have this persistent illusion that how they are today is how they always were and always will be, and that their mood of the moment is their fundamental nature, despite the evidence of their own memory.
In Bayesian decision theory, there’s the distinction between expected utility, which changes as one learns about the environment, and actual utility, which does not. Under this frame, I’d be inclined to round you off to using the words “values”/”liking”/etc. to refer to expected utility. Would you agree with that? If not, why not?
It might be tempting to round the OP off to use the word “values”/”ought” to refer to actual utility, but the details of that are kind of awkward at the edges so I would hold off on that.
That is just replacing the idea of fixed values with a fixed utility function. But it is just as changeable whatever you call it.
Show me your utility function before you were born.
I don’t actually personally agree with Bayesian decision theory anymore and am currently inclined to treat value more like an objective fact about the world than as an individual preference. The provocative position would be a Beff-like one that value = entropy, but while that is an incremental improvenent on utilitarianims/value = negentropy, it is hellish and therefore I can’t endorse it fully.
Regardless of the issues with my own position, I’m confused about your worldview. Do you not have a distinction between expected and actual utility, or do you consider there to be two different kinds of changes in values? How do you model value of information? (If you do model it, that is.)
Expected utility is what you have before the outcome of an action is known. Actual utility is what you have after the outcome is known. Here, the utility function has remained the same and you have acquired knowledge of the outcome.
Someone no longer finding a thing valuable that they used to, has either re-evaluated the thing in the light of new information about it, or changed the value they (their utility function) put on it.
So you’re basically working with a maximally-shattered model of agency where life consists of a bunch of independent activities that can be fully observed post-hoc and which have no connection between them?
So e.g. if you sometimes feel like eating one kind of food and other times feel like eating another kind of food, you just think “ah, my food preference arbitrarily changed”, not “my situation changed to make so that the way to objectively improve my food intake is different now than it was in the past”?
No. I can’t make any sense of where that came from.
No, there is simply no such thing as a utility function over foodstuffs.
I’m basically confused about what a canonical example of changing values looks like to you. Like I assume you have some examples that make you postulate that it is possible or something. I’ve seen changes in food taste used as a canonical example before, but if that’s not your example, then I would like to hear what your example is.
One example would be the generic one from the OP: “As a teenager, I endorsed the view that Z is the highest objective of human existence. … Yeah, it’s a bit embarrassing in hindsight.” This hypothetical teenager’s values (I suggest, in disagreement with the OP) have changed. Their knowledge about the world has no doubt also changed, but I see no need to postulate some unobservable deeper value underlying their earlier views that has remained unchanged, only their knowledge about Z having changed.
Long-term lasting changes in one’s food preferences might also count, but not the observation that whatever someone has for lunch, they are less likely to have again for dinner.
Utility theory is overrated. There is a certain mathematical neatness to it for “small world” problems, where you know all of the possible actions and their possible effects, and the associated probabilities and payoffs, and you are just choosing the best action, once. Eliezer has described the situation as like a set of searchlights coherently pointing in the same direction. But as soon as you try to make it a universal decision theory it falls apart for reasons that are like another set of searchlights pointing off in all directions, such as unbounded utility, St Petersburg-like games, “outcomes” consistsing of all possible configurations of one’s entire future light-cone, utility monsters, repugnant conclusions, iterated games, multi-player games, collective utility, agents trying to predict each other, and so on, illuminating a landscape of monsters surrounding the orderly little garden of VNM-based utility.
A generic example is kind of an anti-example though.
If you reject utility theory, what approach do you use for modelling values instead, and what makes you feel that approach is helpful?
I don’t have one. What would I use it for? I don’t think anyone else yet has one, at least not something mathematically founded, with the simplicity and inevitability of VNM. People put forward various ideas and discuss the various “monsters” I listed, but I see no sign of a consensus.
What’s the use in saying that values change, rather than just saying that you aren’t interested in concepts involving values, then?
I can still be interested, even if I don’t have the answers.
Right, but I’m asking why. Like even if you don’t have a complete framework, I’d think you’d have a general motive for your interest or something.
It’s an interesting open problem.
Here is an analogy. Classical utility theory, as developed by VNM, Savage, and others, the theory of which Eliezer made the searchlight comment, is like propositional calculus. The propositional calculus exists, it’s useful, you cannot ever go against it without falling into contradiction, but there’s not enough there to do much mathematics. For that you need to invent at least first-order logic, and use that to axiomatise arithmetic and eventually all of mathematics, while fending off the paradoxes of self-reference. And all through that, there is the propositional calculus, as valid and necessary as ever, but mathematics requires a great deal more.
The theory that would deal with the “monsters” that I listed does not yet exist. The idea of expected utility may thread its way through all of that greater theory when we have it, but we do not have it. Until we do, talk of the utility function of a person or of an AI is at best sensing what Eliezer has called the rhythm of the situation. To place over-much reliance on its letter will fail.
But propositional calculus and first-order logic exist to support mathematics, which was developed before formal logix. What’s your mathematics-of-value, rather than your logic-of-value?
That was an analogy, a similarity between two things, not an isomorphism.
The mathematics of value that you are asking for is the thing that does not exist yet. People, including me, muddle along as best they can; sometimes at less than that level. Post-rationalists like David Chapman valorise this as “nebulosity”, but I don’t think 19th century mathematicians would have been well served by that attitude.
Richard Jeffrey has a nice utility theory which applies to a Boolean algebra of propositions (instead of e.g. to Savage’s acts/outcomes/states of the world), similar to probability theory.
In fact, it consists of just two axioms plus the three probability axioms.
The theory doesn’t involve time, like probability theory. It also applies to just one agent, again like probability theory.
It doesn’t solve all problems, but neither does probability theory, which e.g. doesn’t solve the sleeping beauty paradox.
Do you nonetheless think utility theory is significantly more problematic than probability theory? Or do you reject both?
Utility theory is significantly more problematic than probability theory.
In both cases, from certain axioms, certain conclusions follow. The difference is in the applicability of those axioms in the real world. Utility theory is supposedly about agents making decisions, but as I remarked earlier in the thread, these are “agents” that make just one decision and stop, with no other agents in the picture.
I have read that Morgenstern was surprised that so much significance was read into the VNM theorem on its publication, when he and von Neumann had considered it to be a rather obvious and minor thing, relegated to the appendix of their book. I have come to agree with that assessment.
Probability theory is not about agents. It is about probability. It applies to many things, including processes in time.
That people fail to solve the Sleeping Beauty paradox does not mean that probability theory fails. I have never paid the problem much attention, but Ape in the coat’s analysis seems convincing to me.
I mean in a subjective interpretation, a probability function represents the beliefs of one person at one point in time. Equally, a (Jeffrey) utility function can represent the desires of one person at one particular point in time. As such it is a theory of what an agent believes and wants.
Decisions can come into play insofar individual actions can be described by propositions (“I do A”, “I do B”) and each of those propositions is equivalent to a disjunction of the form “I do A and X happens or I do A and not-X happens”, which is subject to the axioms. But decisions is not something which is baked into the theory, much like probability theory isn’t necessarily about urns and gambles.
Your values change according to the process of reflection—the grapes mature into wine through fun chemical reactions.
From what you wrote, it feels like you are mostly considering your ‘first-order values’. However, you have an updating process that you also have values about. Like that I wouldn’t respect simple mind control that alters my first-order values, because my values consider mind-control as disallowed. Similar to why I wouldn’t take a very potent drug even if I know my first-order values would rank the feeling very highly, because I don’t endorse that specific sort of change.
Then we should split the question. Do you have a value for escamoles specifically before eating them? No. Do you have a system of thought (of updating your values) that would ~always result in liking escamoles? Well, no in full generality. You might end up with some disease that affects your tastebuds permanently. But in some reasonably large class of normal scenarios, your values would consistently update in a way that would end up liking escamoles were you to ever eat them. (But really, the value for escamoles is more instrumental of a value for [insert escamole flavor, texture, etc.] here, that the escamoles are learned to be a good instance of.)
What johnwentworth mentions would then be the question of “Would this approved process of updating my values converge to anything”; or tend to in some reasonable reference class; or at least have some guaranteed properties that aren’t freely varying. I don’t think he is arguing that the values are necessarily fixed and always persistent (I certainly don’t always handle my values according to my professed beliefs about how I should updatethem), but that they’re constrained. That the brain also models them as reasonably constrained, and that you can learn important properties of them.