Inferring preferences from actions is also philosophically tricky. My favorite reference is this old comment thread.
Wei:
let’s say it models the world as a 2D grid of cells that have intrinsic color… What does this robot “actually want”, given that the world is not really a 2D grid of cells that have intrinsic color?
steven0461:
Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks. But with humans, you’re not limited to just looking at what they do on auto-pilot; instead, you can just ask
So with my post I’m trying to continue that line. It was understood (I hope!) that inferring preferences from actions would lead to something very evolutionary-messy and selfish and you wouldn’t endorse it when shown the description. And now I try to show that inferring preferences by asking is also kind of meaningless.
Hmm. I guess I start with the knowledge that humans don’t seem to be VNM-consistent, so it’s quite reasonable to start by tabooing “want” and “prefer”, because they don’t apply in the way that’s usually studied and analyzed.
I disagree with steven0461 that “just ask” provides any more information than watching an artificial choice. Both are trying to infer something that doesn’t exist from something easily observable.
For many humans, we CAN say they “currently prefer” the expected outcome of an actual choice they make, but that’s a pretty weak and circular definition.
So—what do you hope to actually model about an individual human that you’re using the word “want” for?
Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.
Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.
Inferring preferences from actions is also philosophically tricky. My favorite reference is this old comment thread.
Wei:
steven0461:
So with my post I’m trying to continue that line. It was understood (I hope!) that inferring preferences from actions would lead to something very evolutionary-messy and selfish and you wouldn’t endorse it when shown the description. And now I try to show that inferring preferences by asking is also kind of meaningless.
Hmm. I guess I start with the knowledge that humans don’t seem to be VNM-consistent, so it’s quite reasonable to start by tabooing “want” and “prefer”, because they don’t apply in the way that’s usually studied and analyzed.
I disagree with steven0461 that “just ask” provides any more information than watching an artificial choice. Both are trying to infer something that doesn’t exist from something easily observable.
For many humans, we CAN say they “currently prefer” the expected outcome of an actual choice they make, but that’s a pretty weak and circular definition.
So—what do you hope to actually model about an individual human that you’re using the word “want” for?
The overarching problem is figuring out human preferences so that AI can fulfill them. We’re all on the same page that humans aren’t VNM-consistent.
Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.
Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.