I consider this a factual dispute about minds and Goodhart’s Law, rather than a difference of subjective categorization, so this response is a non sequitur to me.
Your comment used terms like “good” and “great”, which I interpret as subjective valuations, or preferences. I don’t know how to translate a question about subjective valuations into one of factual claims.
I claim that as a general principle, “something feeling great is by itself a type of greatness to me” is a category error. What feels great is a map, and being great is the territory. There is a fact of the matter with regards to what is great for PDV, and what is great for Kaj. They are not identical, and they are not directly queriable, but there is a fact of the matter. Something great is something that increases your utility significantly. (Non-utilitarian ethics: translate that into language your system permits.)
What feels great is a separate fact. It is directly queriable, and correlates with being great, but it is only an approximation, and can therefore be Goodharted. The distinction between the true utility and the approximation is a general property of human minds, with some regularities (superstimuli), but also not identical between people.
So when you say “for me that’s a subcategory”, I conclude that you have a) misunderstood my claim, and b) mistaken the map for the territory.
Like, if we are talking about a claim like “is it raining outside”, then the territory is made up of whether it actually is raining outside or not. It’s a concrete physical event.
For “is something great”, the nearest physical referent that I could think of is “does a person’s brain make the evaluation that this is great”. Which would make it into a question of subjective valuation, but you seem to have some more objective criteria in mind.
I said that already? “Something great is something that increases your utility significantly.” This is a property of timelines, not of world-states, and so can’t be directly queried, but better approximations can be built up by retrospecting on which times feeling great was accurate and which times it was not.
Unreal, in a subthread above, claims that it is possible to realign System 1 such that feeling great coincides with being great. This seems wrong to me, but is the kind of thing that could be right. Your description does not seem to be the kind of thing that could be right.
I’d like to try to explain and see if I’m pointing at the right thing.
I might value being loved. (This thing has utility to me.)
However, I do not actually have neurons that connect to the territory such that my neurons fire If and Only If I am being loved. My neurons are not magic.
So instead they use proxy measures. Like looking at the person’s face and seeing it smiling at me. Or seeing their body language and noticing it is relaxed and open. Or feeling their gentle touch. Etc.
All these proxy measures add up to something that feels good. However, it is NEVER certain that it’s measuring the thing I ultimately want (being loved). I’m just going off a guess. A pretty good guess, sometimes. But still.
This is Goodhart’s dilemma here.
When I have a measure of a good thing (someone smiling at me), I will try to optimize for the measure, which is not necessarily the thing I was originally wanting to track (being loved).
So at some point I may try to optimize for smiles, even when they’re not out of love. And whatever those behaviors are, we call pica.
Right, I agree that there can be things which I value, and for which I can mistaken about whether or not I have them / they exist / etc.
But PDV didn’t seem to be just saying that “you can be mistaken about whether you actually have the thing that you think you have”. They said that it’s a category error for me to say that something feeling good is by itself something that I value, and that there’s a factual dispute about minds here, rather than a dispute of subjective categorization.
Your example doesn’t feel like it helps me understand those claims. I can have a subjective categorization that being loved is something that I value, and I can be correct or mistaken about whether or not I’m actually loved. And I can indeed end up optimizing for something like smiles, which I think indicates being-lovedness, even when it’s only weakly correlated.
But that doesn’t seem to be like a reason for why something feeling good couldn’t also be something that I value for its own sake.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves? That seems unobjectionable.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error. Like, if you always just think, “I feel good when the sun shines on me! It must BE good that the sun is shining on me.” Then THAT is an error.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves?
Yes. And not just in theory, I would expect that this is what many if not most people do: see e.g. all the advice about how to be happy, or the fact that many people take something like classical utilitarianism seriously as a moral theory.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error.
Oh. I thought that I already mentioned much earlier that I didn’t mean that, when I said that things can be great despite not feeling great, and that “good feelings” are just one of the possible types of good things you can have in your life, and they shouldn’t be the only ones.
Many if not most people are Goodharting in most aspects of their lives. Why not this one?
I acknowledge your claim that you value feeling good over and above the things that cause you to feel good. I agree that many people implicitly endorse this claim about themselves. I think you and they are very likely mistaken about this preference, and that ceasing to optimize for it would improve your life significantly according to your other preferences.
was hoping you’d validate whether my “I thought PDV was saying” one way or another, above …
also, it seems like an important milestone if you guys actually sussed out where the actual disagreement is. and it seems like it isn’t what either of you previously thought it was. so i want that to be made clear.
Kaj wasn’t saying ‘a thing that couldn’t be right’. Kaj was describing a totally realistic thing to do. which is to value feeling good itself.
i think conversational milestones in arguments are important places to stop and orient, and i was worried this milestone would be quickly passed over.
and NOW the disagreement is about a preference / why aren’t you worried about Goodharting, whereas before it wasn’t clear. is this actually agreed now by both parties?
FWIW, I think ‘valuing positive feelings in themselves’ is a bad idea. It’s theoretically possible to do it, but I wouldn’t recommend it as part of one’s final evolutionary form.
Symmetrically, I think ‘equating negative feelings with badness’ or believing ‘feeling bad is bad’ is also not recommended.
I consider this a factual dispute about minds and Goodhart’s Law, rather than a difference of subjective categorization, so this response is a non sequitur to me.
Your comment used terms like “good” and “great”, which I interpret as subjective valuations, or preferences. I don’t know how to translate a question about subjective valuations into one of factual claims.
I claim that as a general principle, “something feeling great is by itself a type of greatness to me” is a category error. What feels great is a map, and being great is the territory. There is a fact of the matter with regards to what is great for PDV, and what is great for Kaj. They are not identical, and they are not directly queriable, but there is a fact of the matter. Something great is something that increases your utility significantly. (Non-utilitarian ethics: translate that into language your system permits.)
What feels great is a separate fact. It is directly queriable, and correlates with being great, but it is only an approximation, and can therefore be Goodharted. The distinction between the true utility and the approximation is a general property of human minds, with some regularities (superstimuli), but also not identical between people.
So when you say “for me that’s a subcategory”, I conclude that you have a) misunderstood my claim, and b) mistaken the map for the territory.
So what makes up the territory?
Like, if we are talking about a claim like “is it raining outside”, then the territory is made up of whether it actually is raining outside or not. It’s a concrete physical event.
For “is something great”, the nearest physical referent that I could think of is “does a person’s brain make the evaluation that this is great”. Which would make it into a question of subjective valuation, but you seem to have some more objective criteria in mind.
I said that already? “Something great is something that increases your utility significantly.” This is a property of timelines, not of world-states, and so can’t be directly queried, but better approximations can be built up by retrospecting on which times feeling great was accurate and which times it was not.
Unreal, in a subthread above, claims that it is possible to realign System 1 such that feeling great coincides with being great. This seems wrong to me, but is the kind of thing that could be right. Your description does not seem to be the kind of thing that could be right.
Taboo “utility”? To me it’s again just another word for personal preferences.
I’d like to try to explain and see if I’m pointing at the right thing.
I might value being loved. (This thing has utility to me.)
However, I do not actually have neurons that connect to the territory such that my neurons fire If and Only If I am being loved. My neurons are not magic.
So instead they use proxy measures. Like looking at the person’s face and seeing it smiling at me. Or seeing their body language and noticing it is relaxed and open. Or feeling their gentle touch. Etc.
All these proxy measures add up to something that feels good. However, it is NEVER certain that it’s measuring the thing I ultimately want (being loved). I’m just going off a guess. A pretty good guess, sometimes. But still.
This is Goodhart’s dilemma here.
When I have a measure of a good thing (someone smiling at me), I will try to optimize for the measure, which is not necessarily the thing I was originally wanting to track (being loved).
So at some point I may try to optimize for smiles, even when they’re not out of love. And whatever those behaviors are, we call pica.
Right, I agree that there can be things which I value, and for which I can mistaken about whether or not I have them / they exist / etc.
But PDV didn’t seem to be just saying that “you can be mistaken about whether you actually have the thing that you think you have”. They said that it’s a category error for me to say that something feeling good is by itself something that I value, and that there’s a factual dispute about minds here, rather than a dispute of subjective categorization.
Your example doesn’t feel like it helps me understand those claims. I can have a subjective categorization that being loved is something that I value, and I can be correct or mistaken about whether or not I’m actually loved. And I can indeed end up optimizing for something like smiles, which I think indicates being-lovedness, even when it’s only weakly correlated.
But that doesn’t seem to be like a reason for why something feeling good couldn’t also be something that I value for its own sake.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves? That seems unobjectionable.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error. Like, if you always just think, “I feel good when the sun shines on me! It must BE good that the sun is shining on me.” Then THAT is an error.
Wait… are you just trying to say that you can, in theory, value “positive feelings” like joy, delight, etc. in themselves?
Yes. And not just in theory, I would expect that this is what many if not most people do: see e.g. all the advice about how to be happy, or the fact that many people take something like classical utilitarianism seriously as a moral theory.
I thought PDV was saying that if you mistake “good feelings” for “good things” in general, that this was a category error.
Oh. I thought that I already mentioned much earlier that I didn’t mean that, when I said that things can be great despite not feeling great, and that “good feelings” are just one of the possible types of good things you can have in your life, and they shouldn’t be the only ones.
Many if not most people are Goodharting in most aspects of their lives. Why not this one?
I acknowledge your claim that you value feeling good over and above the things that cause you to feel good. I agree that many people implicitly endorse this claim about themselves. I think you and they are very likely mistaken about this preference, and that ceasing to optimize for it would improve your life significantly according to your other preferences.
was hoping you’d validate whether my “I thought PDV was saying” one way or another, above …
also, it seems like an important milestone if you guys actually sussed out where the actual disagreement is. and it seems like it isn’t what either of you previously thought it was. so i want that to be made clear.
Kaj wasn’t saying ‘a thing that couldn’t be right’. Kaj was describing a totally realistic thing to do. which is to value feeling good itself.
i think conversational milestones in arguments are important places to stop and orient, and i was worried this milestone would be quickly passed over.
and NOW the disagreement is about a preference / why aren’t you worried about Goodharting, whereas before it wasn’t clear. is this actually agreed now by both parties?
(I greatly appreciate your attempt to clarify/improve the quality of the conversation.)
FWIW, I think ‘valuing positive feelings in themselves’ is a bad idea. It’s theoretically possible to do it, but I wouldn’t recommend it as part of one’s final evolutionary form.
Symmetrically, I think ‘equating negative feelings with badness’ or believing ‘feeling bad is bad’ is also not recommended.