I don’t see the usual commonsense understanding of “values” (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you’ve the fact that they’re not to make a seemingly unjustified rhetorical leap to “values are just habituations or patterns of action”, which just doesn’t seem to be true.
Most importantly, because the “values” that people are concerned with then they talk about “value drift” are idealized values (ala. extrapolated volition), not instantaneous values or opinions or habituations.
For instance, philosophers such as EY consider that changing one’s mind in response to a new moral argument is not value drift because it preserves one’s idealized values, and that it is generally instrumentally positive because (if it brings one’s instantaneous opinions closer to their idealized values) it makes one better at accomplishing their idealized values. So indeed, we should let the EAs “drift” in that sense.
On the other hand, getting hit with a cosmic ray which alters your brain, or getting hacked by a remote code execution exploit is value drift because it does not preserve one’s idealized values (and is therefore bad, according to the usual decision theoretic argument, because it makes you worse at accomplishing them). And those are the kind of problems we worry about with AI.
I don’t see the usual commonsense understanding of “values” (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you’ve the fact that they’re not to make a seemingly unjustified rhetorical leap to “values are just habituations or patterns of action”, which just doesn’t seem to be true.
Right, I think people are pointing at something else when they normally talk about values but that cluster is poorly constructed and doesn’t cut reality at the joint in the same way our naive notions of belief, morals, and much else cut reality slightly askew. I’m suggesting this as a rehabilitative framing of values that is a stronger, more consistent meaning for “value” than the confused cluster of things people are normally pointing at. Although to be clear even the naive confused notion of value I’m trying to explode and rebuild here is still a fundamentally ontological thing, unless you think people mean something by “value” more like signals in the brain serving as control mechanisms to regulate feedback systems.
To your concern about an unjustified leap, this is a weakness of my current position: I don’t yet have a strong ability to describe my own reasoning to bring most people along, and is one of the points of working out these ideas: so I can see what inferences do seem intuitive to people and which don’t and use that information to iterate on my explanations.
Most importantly, because the “values” that people are concerned with then they talk about “value drift” are idealized values (ala. extrapolated volition), not instantaneous values or opinions or habituations.
To the extent that I think “value” is a confused concept, I think “idealized value” is consequently also confused, perhaps even more so because it is further distanced from what is happening on the ground. I realize idealized value feels intuitive to many folks, and at one time it did seem intuitive to me, but I am similarly suspicious that it is cleanly pointing to a real thing and is instead a fancy thing we have constructed as part of our reasoning that has no clear correlate out in the world. That is, it is an artifact of our reasoning process, and while that’s not inherently bad, it also means it’s something almost purely subjective and can easily become unhinged from reality, which makes me nervous about using it as a justification for any particular policy we might want to pursue.
I don’t see the usual commonsense understanding of “values” (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you’ve the fact that they’re not to make a seemingly unjustified rhetorical leap to “values are just habituations or patterns of action”, which just doesn’t seem to be true.
Most importantly, because the “values” that people are concerned with then they talk about “value drift” are idealized values (ala. extrapolated volition), not instantaneous values or opinions or habituations.
For instance, philosophers such as EY consider that changing one’s mind in response to a new moral argument is not value drift because it preserves one’s idealized values, and that it is generally instrumentally positive because (if it brings one’s instantaneous opinions closer to their idealized values) it makes one better at accomplishing their idealized values. So indeed, we should let the EAs “drift” in that sense.
On the other hand, getting hit with a cosmic ray which alters your brain, or getting hacked by a remote code execution exploit is value drift because it does not preserve one’s idealized values (and is therefore bad, according to the usual decision theoretic argument, because it makes you worse at accomplishing them). And those are the kind of problems we worry about with AI.
Right, I think people are pointing at something else when they normally talk about values but that cluster is poorly constructed and doesn’t cut reality at the joint in the same way our naive notions of belief, morals, and much else cut reality slightly askew. I’m suggesting this as a rehabilitative framing of values that is a stronger, more consistent meaning for “value” than the confused cluster of things people are normally pointing at. Although to be clear even the naive confused notion of value I’m trying to explode and rebuild here is still a fundamentally ontological thing, unless you think people mean something by “value” more like signals in the brain serving as control mechanisms to regulate feedback systems.
To your concern about an unjustified leap, this is a weakness of my current position: I don’t yet have a strong ability to describe my own reasoning to bring most people along, and is one of the points of working out these ideas: so I can see what inferences do seem intuitive to people and which don’t and use that information to iterate on my explanations.
To the extent that I think “value” is a confused concept, I think “idealized value” is consequently also confused, perhaps even more so because it is further distanced from what is happening on the ground. I realize idealized value feels intuitive to many folks, and at one time it did seem intuitive to me, but I am similarly suspicious that it is cleanly pointing to a real thing and is instead a fancy thing we have constructed as part of our reasoning that has no clear correlate out in the world. That is, it is an artifact of our reasoning process, and while that’s not inherently bad, it also means it’s something almost purely subjective and can easily become unhinged from reality, which makes me nervous about using it as a justification for any particular policy we might want to pursue.