I’d like to try to explain and see if I’m pointing at the right thing.
I might value being loved. (This thing has utility to me.)
However, I do not actually have neurons that connect to the territory such that my neurons fire If and Only If I am being loved. My neurons are not magic.
So instead they use proxy measures. Like looking at the person’s face and seeing it smiling at me. Or seeing their body language and noticing it is relaxed and open. Or feeling their gentle touch. Etc.
All these proxy measures add up to something that feels good. However, it is NEVER certain that it’s measuring the thing I ultimately want (being loved). I’m just going off a guess. A pretty good guess, sometimes. But still.
This is Goodhart’s dilemma here.
When I have a measure of a good thing (someone smiling at me), I will try to optimize for the measure, which is not necessarily the thing I was originally wanting to track (being loved).
So at some point I may try to optimize for smiles, even when they’re not out of love. And whatever those behaviors are, we call pica.
Right, I agree that there can be things which I value, and for which I can mistaken about whether or not I have them / they exist / etc.
But PDV didn’t seem to be just saying that “you can be mistaken about whether you actually have the thing that you think you have”. They said that it’s a category error for me to say that something feeling good is by itself something that I value, and that there’s a factual dispute about minds here, rather than a dispute of subjective categorization.
Your example doesn’t feel like it helps me understand those claims. I can have a subjective categorization that being loved is something that I value, and I can be correct or mistaken about whether or not I’m actually loved. And I can indeed end up optimizing for something like smiles, which I think indicates being-lovedness, even when it’s only weakly correlated.
But that doesn’t seem to be like a reason for why something feeling good couldn’t also be something that I value for its own sake.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves? That seems unobjectionable.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error. Like, if you always just think, “I feel good when the sun shines on me! It must BE good that the sun is shining on me.” Then THAT is an error.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves?
Yes. And not just in theory, I would expect that this is what many if not most people do: see e.g. all the advice about how to be happy, or the fact that many people take something like classical utilitarianism seriously as a moral theory.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error.
Oh. I thought that I already mentioned much earlier that I didn’t mean that, when I said that things can be great despite not feeling great, and that “good feelings” are just one of the possible types of good things you can have in your life, and they shouldn’t be the only ones.
Many if not most people are Goodharting in most aspects of their lives. Why not this one?
I acknowledge your claim that you value feeling good over and above the things that cause you to feel good. I agree that many people implicitly endorse this claim about themselves. I think you and they are very likely mistaken about this preference, and that ceasing to optimize for it would improve your life significantly according to your other preferences.
was hoping you’d validate whether my “I thought PDV was saying” one way or another, above …
also, it seems like an important milestone if you guys actually sussed out where the actual disagreement is. and it seems like it isn’t what either of you previously thought it was. so i want that to be made clear.
Kaj wasn’t saying ‘a thing that couldn’t be right’. Kaj was describing a totally realistic thing to do. which is to value feeling good itself.
i think conversational milestones in arguments are important places to stop and orient, and i was worried this milestone would be quickly passed over.
and NOW the disagreement is about a preference / why aren’t you worried about Goodharting, whereas before it wasn’t clear. is this actually agreed now by both parties?
FWIW, I think ‘valuing positive feelings in themselves’ is a bad idea. It’s theoretically possible to do it, but I wouldn’t recommend it as part of one’s final evolutionary form.
Symmetrically, I think ‘equating negative feelings with badness’ or believing ‘feeling bad is bad’ is also not recommended.
I’d like to try to explain and see if I’m pointing at the right thing.
I might value being loved. (This thing has utility to me.)
However, I do not actually have neurons that connect to the territory such that my neurons fire If and Only If I am being loved. My neurons are not magic.
So instead they use proxy measures. Like looking at the person’s face and seeing it smiling at me. Or seeing their body language and noticing it is relaxed and open. Or feeling their gentle touch. Etc.
All these proxy measures add up to something that feels good. However, it is NEVER certain that it’s measuring the thing I ultimately want (being loved). I’m just going off a guess. A pretty good guess, sometimes. But still.
This is Goodhart’s dilemma here.
When I have a measure of a good thing (someone smiling at me), I will try to optimize for the measure, which is not necessarily the thing I was originally wanting to track (being loved).
So at some point I may try to optimize for smiles, even when they’re not out of love. And whatever those behaviors are, we call pica.
Right, I agree that there can be things which I value, and for which I can mistaken about whether or not I have them / they exist / etc.
But PDV didn’t seem to be just saying that “you can be mistaken about whether you actually have the thing that you think you have”. They said that it’s a category error for me to say that something feeling good is by itself something that I value, and that there’s a factual dispute about minds here, rather than a dispute of subjective categorization.
Your example doesn’t feel like it helps me understand those claims. I can have a subjective categorization that being loved is something that I value, and I can be correct or mistaken about whether or not I’m actually loved. And I can indeed end up optimizing for something like smiles, which I think indicates being-lovedness, even when it’s only weakly correlated.
But that doesn’t seem to be like a reason for why something feeling good couldn’t also be something that I value for its own sake.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves? That seems unobjectionable.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error. Like, if you always just think, “I feel good when the sun shines on me! It must BE good that the sun is shining on me.” Then THAT is an error.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves?
Yes. And not just in theory, I would expect that this is what many if not most people do: see e.g. all the advice about how to be happy, or the fact that many people take something like classical utilitarianism seriously as a moral theory.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error.
Oh. I thought that I already mentioned much earlier that I didn’t mean that, when I said that things can be great despite not feeling great, and that “good feelings” are just one of the possible types of good things you can have in your life, and they shouldn’t be the only ones.
Many if not most people are Goodharting in most aspects of their lives. Why not this one?
I acknowledge your claim that you value feeling good over and above the things that cause you to feel good. I agree that many people implicitly endorse this claim about themselves. I think you and they are very likely mistaken about this preference, and that ceasing to optimize for it would improve your life significantly according to your other preferences.
was hoping you’d validate whether my “I thought PDV was saying” one way or another, above …
also, it seems like an important milestone if you guys actually sussed out where the actual disagreement is. and it seems like it isn’t what either of you previously thought it was. so i want that to be made clear.
Kaj wasn’t saying ‘a thing that couldn’t be right’. Kaj was describing a totally realistic thing to do. which is to value feeling good itself.
i think conversational milestones in arguments are important places to stop and orient, and i was worried this milestone would be quickly passed over.
and NOW the disagreement is about a preference / why aren’t you worried about Goodharting, whereas before it wasn’t clear. is this actually agreed now by both parties?
(I greatly appreciate your attempt to clarify/improve the quality of the conversation.)
FWIW, I think ‘valuing positive feelings in themselves’ is a bad idea. It’s theoretically possible to do it, but I wouldn’t recommend it as part of one’s final evolutionary form.
Symmetrically, I think ‘equating negative feelings with badness’ or believing ‘feeling bad is bad’ is also not recommended.