I really appreciate the example that you spelled out. I think this is solidly pointing at the same concept.
On this paragraph:
When I am choosing an action and justifying it as wholesome, what it often feels like is that I am trying to track all the obvious considerations, but some (be it internal or external) force is pushing me to ignore one of them. Not merely to trade off against it, but to look away from it in my mind. And agains that force I’m trying to defend a particular action as the best one call all things considered—the “wholesome” action.
… I am inclined to try not to use the word “wholesome” to mean “right” (= the thing you should do, all things considered). I’m trying to use “wholesome” to mean “not looking away from any of the considerations”. This then allows “choosing what feels wholesome is a good heuristic for choosing what is right” to be a substantive claim.
Because of this post, I’ve been thinking a little today about people who I consider wholesome, who often seem wiser than those who don’t (I guess due to tracking things that others have pressures to not think about). I think the main thing I find upsetting about these people is that they tend to less often be holy madmen who commit their life to a cause or do something great, and instead they often do more typical human things like tradeoff against their career and impact on the world in order to get married and have kids.
I think the world will probably end soon and most value will be lost to us and I am kind of upset when people choose not to fight against that but instead to live a simpler and smaller life. Especially so when I thought we were fighting it together and then they just… stop. Then I feel kind of betrayed.
I think something that can go poorly when trying to be more wholesome is that, on finding yourself aware of a cost that you’ve been paying, you can also find that you do not have the strength now to choose to continue paying that cost. Now that you see how you’ve been hurting yourself, even though you’ve been getting great results, you cannot continue to inflict that upon yourself, and so you will instead give up on this course of action and do something less personally difficult.
I think this is a fair complaint! I think it’s quite unwholesome, if you think we’re in a crisis, to turn away and not look at that, or not work towards helping. It seems important to think about safety rails against that. It’s less obviously unwholesome to keep devoting some effort towards typical human things while also devoting some effort towards fighting. (And I think there are a lot of stories I could tell where such paths end up doing better for the world than monomania about the things which seem most important.)
BTW one agenda here is thinking about what kinds of properties we might want societies of AI systems to have. I think there’s some slice of worlds where things go better if we teach our AI systems to be something-in-the-vicinity-of-wholesome than if we don’t.
I’m curious if you have any ideas for what to say to someone who isn’t being wholesome in some context—who is avoiding looking at some part of reality. For instance, in the above example, what could someone say to me when I’m ignoring my impacts upon them?
A standard line people say is “you don’t care about hurting my feelings” and that’s not quite the right response, because I would then argue that your feelings are less important than serving the mission.
I’m looking for something like “You don’t seem to be aware of the impacts you’re having on me”. Or maybe “You don’t seem to understand what he impact of your speech”. But I’m not sure either of these would successfully communicate with the self-blinded Ben I describe, and I’d appreciate hearing another’s thoughts on how to communicate here.
Often the effect of being blinded is that you take suboptimal actions. As you pointed out in your example, if you see the problem then all sorts of cheap ways to reduce the harmful impact occur to you. So perhaps one way of getting to the issue could be to point at that: “I know you care about my feelings, and it wouldn’t have made this meeting any less effective to have had it more privately, so I’m surprised that you didn’t”?
I’d be tempted to make it a question, and ask something like “what do you think the impacts of this on [me/person] are?”.
It might be that question would already do work by getting them to think about the thing they haven’t been thinking about. But it could also elicit a defence like “it doesn’t matter because the mission is more important” in which case I’d follow up with an argument that it’s likely worth at least understanding the impacts because it might help to find actions which are better on those grounds while being comparably good—or even better—for the mission. Or it might elicit a mistaken model of the impacts, in which case I’d follow up by saying that I thought it was mistaken and explaining how.
I really appreciate the example that you spelled out. I think this is solidly pointing at the same concept.
On this paragraph:
… I am inclined to try not to use the word “wholesome” to mean “right” (= the thing you should do, all things considered). I’m trying to use “wholesome” to mean “not looking away from any of the considerations”. This then allows “choosing what feels wholesome is a good heuristic for choosing what is right” to be a substantive claim.
I’d agree with that.
Because of this post, I’ve been thinking a little today about people who I consider wholesome, who often seem wiser than those who don’t (I guess due to tracking things that others have pressures to not think about). I think the main thing I find upsetting about these people is that they tend to less often be holy madmen who commit their life to a cause or do something great, and instead they often do more typical human things like tradeoff against their career and impact on the world in order to get married and have kids.
I think the world will probably end soon and most value will be lost to us and I am kind of upset when people choose not to fight against that but instead to live a simpler and smaller life. Especially so when I thought we were fighting it together and then they just… stop. Then I feel kind of betrayed.
I think something that can go poorly when trying to be more wholesome is that, on finding yourself aware of a cost that you’ve been paying, you can also find that you do not have the strength now to choose to continue paying that cost. Now that you see how you’ve been hurting yourself, even though you’ve been getting great results, you cannot continue to inflict that upon yourself, and so you will instead give up on this course of action and do something less personally difficult.
I think this is a fair complaint! I think it’s quite unwholesome, if you think we’re in a crisis, to turn away and not look at that, or not work towards helping. It seems important to think about safety rails against that. It’s less obviously unwholesome to keep devoting some effort towards typical human things while also devoting some effort towards fighting. (And I think there are a lot of stories I could tell where such paths end up doing better for the world than monomania about the things which seem most important.)
BTW one agenda here is thinking about what kinds of properties we might want societies of AI systems to have. I think there’s some slice of worlds where things go better if we teach our AI systems to be something-in-the-vicinity-of-wholesome than if we don’t.
I’m curious if you have any ideas for what to say to someone who isn’t being wholesome in some context—who is avoiding looking at some part of reality. For instance, in the above example, what could someone say to me when I’m ignoring my impacts upon them?
A standard line people say is “you don’t care about hurting my feelings” and that’s not quite the right response, because I would then argue that your feelings are less important than serving the mission.
I’m looking for something like “You don’t seem to be aware of the impacts you’re having on me”. Or maybe “You don’t seem to understand what he impact of your speech”. But I’m not sure either of these would successfully communicate with the self-blinded Ben I describe, and I’d appreciate hearing another’s thoughts on how to communicate here.
Often the effect of being blinded is that you take suboptimal actions. As you pointed out in your example, if you see the problem then all sorts of cheap ways to reduce the harmful impact occur to you. So perhaps one way of getting to the issue could be to point at that: “I know you care about my feelings, and it wouldn’t have made this meeting any less effective to have had it more privately, so I’m surprised that you didn’t”?
I’d be tempted to make it a question, and ask something like “what do you think the impacts of this on [me/person] are?”.
It might be that question would already do work by getting them to think about the thing they haven’t been thinking about. But it could also elicit a defence like “it doesn’t matter because the mission is more important” in which case I’d follow up with an argument that it’s likely worth at least understanding the impacts because it might help to find actions which are better on those grounds while being comparably good—or even better—for the mission. Or it might elicit a mistaken model of the impacts, in which case I’d follow up by saying that I thought it was mistaken and explaining how.