On the other hand, consider Elsa. Elsa, too, does not initially have an appreciation jazz and also comes to love it. In her case, however, the change is the result of Elsa joining a cult which, as a central pillar of their ideology, venerate a love of jazz. The cult makes use of elaborate means of coercive persuasion, involving psychological techniques as well as psychoactive substances, in order to get all of their members to appreciate jazz.
I think this is an unfortunate example, as cults are quite ineffectual at retaining people (1% retention rates are good, for a cult!). Adressing the core point, I think people overstate how bad value-shifts are, as we humans implicitly accept them all the time whenever we move to a new social group. In some sense, we hold the values of a group kind of lightly, as a sort of mask. But because inauthentically wearing a mask fares worse under social scrutiny than becoming the mask, we humans will really make our social group’s values a deep part of us. And that’s fine! But it makes it tricky to disentangle what sorts of value changes are, or are not, legitimate.
(Going on a rant now because I couldn’t resist).
And such shifts certainly exist! Like, if you don’t think a tamping iron spikes through my skull and causes my personality to radically shift is an illegitimate value change, then I’ve got some brain surgery I want to try out on you.
Which suggests a class of value-changes that we might think of as illegitimate: shifts caused by a cartesian-boundry violating event. If something re-arranges the insides of my skull, that’s murder. And if it ain’t, it is an illegitmate value shift. If some molecular system slips past my blood-brain barrier and causes my reward centers to light up like a firework, well, that’s probably Heroin. And it is violating my boundry, which means it is causing an illegitimate value shift. And so on.
But wait! What about drugs like selective seratonin-uptake inhibitors i.e. SSRIs? Taking that can cause a value shift, but if you deny that it is legitimate, then I hope you never become depressed. So maybe voluntarily taking these things is what matters?
But wait! What if you are unware of the consequences of taking the medication? For instance, for the average depressed person, it either does nothing or cures their depression. But for you, it gives you Schizophrenia, because we’re in a world beyond the reach of god. Well then, that sounds like an illegitimate value shift.
So maybe the problem is that the changes are predictably not sanctioned by us ahead of time? Well then, what about something like the Gandhi-murder pill? You can take a pill which (additively) makes you 1% more like a mass murderer but gives you $1 milllion in exchange. If you take the pill now, you’re more likely to take such pills in the future, driving you down a slippery slop to evil. So maybe you, I don’t know, make a legal agreement to restrict your future self’s actions.
But then you wind up with your future self disagreeing with your current self about what they’re allowed to do, whilst you delberately and knowingly put yourself into that situation. Is that legitimate? I don’t know.
This is to show, referring to self-determination alone does not clarify all we need to know about what does and does not constitute legitimate value change. As mentioned above, in future work, I am interested in stress testing and building on this preliminary account further.
I am looking forward to it. I don’t think your post updated me, though I didn’t read it carefully, but I am glad someone is talking about this. This is a serious problem that we have to solve to deal with alignment, and I think, to convince (some) people that there is some grounds to saying we should try to “align” AI at all. We can simultaenously tolerate a very wide space of values and say that no, going outside of those values is not OK, neither for us nor our descendants. And that such a position is just common sense.
Or maybe you’ll find out that no, people who believe that are deluding themselves, in which case I’m eager to hear your arguements.
We can simultaenously tolerate a very wide space of values and say that no, going outside of those values is not OK, neither for us nor our descendants. And that such a position is just common sense.
Is this the alternative you’re proposing? Is this basically saying that there should be ~indifference between many induced value changes, within some bounds of acceptability? I think clarifying the exact bounds of acceptability is quite hard, and anything that’s borderline might lead to increased chance of values drifting to “non-acceptable” regions.
Also, common sense has changed dramatically over centuries, so it seems hard to ground these kinds of notions entirely in common sense too.
I’m not quite sure. Some people react to the idea of imbuing AI with some values with horror (“that’s slavery!” or “you’re forcing the AI to have your values!”) and I’m a little empathetic but also befuddled about what else to do. When you make these things, you’re implicitly making some choice about how to influence what they value.
Is this the alternative you’re proposing? Is this basically saying that there should be ~indifference between many induced value changes, within some bounds of acceptability? I think clarifying the exact bounds of acceptability is quite hard, and anything that’s borderline might lead to increased chance of values drifting to “non-acceptable” regions
No, I was vaguely describing at a high-level what value-change policy I endorse. As you point out, clarifying those bounds is very hard, and very important.
Likewise, I think “common sense” can change in endrosed ways, but I think we probably have a better handle on that as correct reasoning is a much more general, and hence simple, sort of capacity.
I think this is an unfortunate example, as cults are quite ineffectual at retaining people (1% retention rates are good, for a cult!). Adressing the core point, I think people overstate how bad value-shifts are, as we humans implicitly accept them all the time whenever we move to a new social group. In some sense, we hold the values of a group kind of lightly, as a sort of mask. But because inauthentically wearing a mask fares worse under social scrutiny than becoming the mask, we humans will really make our social group’s values a deep part of us. And that’s fine! But it makes it tricky to disentangle what sorts of value changes are, or are not, legitimate.
(Going on a rant now because I couldn’t resist).
And such shifts certainly exist! Like, if you don’t think a tamping iron spikes through my skull and causes my personality to radically shift is an illegitimate value change, then I’ve got some brain surgery I want to try out on you.
Which suggests a class of value-changes that we might think of as illegitimate: shifts caused by a cartesian-boundry violating event. If something re-arranges the insides of my skull, that’s murder. And if it ain’t, it is an illegitmate value shift. If some molecular system slips past my blood-brain barrier and causes my reward centers to light up like a firework, well, that’s probably Heroin. And it is violating my boundry, which means it is causing an illegitimate value shift. And so on.
But wait! What about drugs like selective seratonin-uptake inhibitors i.e. SSRIs? Taking that can cause a value shift, but if you deny that it is legitimate, then I hope you never become depressed. So maybe voluntarily taking these things is what matters?
But wait! What if you are unware of the consequences of taking the medication? For instance, for the average depressed person, it either does nothing or cures their depression. But for you, it gives you Schizophrenia, because we’re in a world beyond the reach of god. Well then, that sounds like an illegitimate value shift.
So maybe the problem is that the changes are predictably not sanctioned by us ahead of time? Well then, what about something like the Gandhi-murder pill? You can take a pill which (additively) makes you 1% more like a mass murderer but gives you $1 milllion in exchange. If you take the pill now, you’re more likely to take such pills in the future, driving you down a slippery slop to evil. So maybe you, I don’t know, make a legal agreement to restrict your future self’s actions.
But then you wind up with your future self disagreeing with your current self about what they’re allowed to do, whilst you delberately and knowingly put yourself into that situation. Is that legitimate? I don’t know.
I am looking forward to it. I don’t think your post updated me, though I didn’t read it carefully, but I am glad someone is talking about this. This is a serious problem that we have to solve to deal with alignment, and I think, to convince (some) people that there is some grounds to saying we should try to “align” AI at all. We can simultaenously tolerate a very wide space of values and say that no, going outside of those values is not OK, neither for us nor our descendants. And that such a position is just common sense.
Or maybe you’ll find out that no, people who believe that are deluding themselves, in which case I’m eager to hear your arguements.
What would be the alternative?
Is this the alternative you’re proposing? Is this basically saying that there should be ~indifference between many induced value changes, within some bounds of acceptability? I think clarifying the exact bounds of acceptability is quite hard, and anything that’s borderline might lead to increased chance of values drifting to “non-acceptable” regions.
Also, common sense has changed dramatically over centuries, so it seems hard to ground these kinds of notions entirely in common sense too.
I’m not quite sure. Some people react to the idea of imbuing AI with some values with horror (“that’s slavery!” or “you’re forcing the AI to have your values!”) and I’m a little empathetic but also befuddled about what else to do. When you make these things, you’re implicitly making some choice about how to influence what they value.
No, I was vaguely describing at a high-level what value-change policy I endorse. As you point out, clarifying those bounds is very hard, and very important.
Likewise, I think “common sense” can change in endrosed ways, but I think we probably have a better handle on that as correct reasoning is a much more general, and hence simple, sort of capacity.