Well… no, actually. That sort of economical preference induction assumes that the agents it’s modelling are Homo Economus. If an agent consistently chooses x over y, it’s believed that it values x more than y—even if x is one marshmallow now, and y is two marshmallows in fifteen minutes. It’s a useful abstraction for modelling how people actually behave, but that is notably not what we’re interested in.
That sort of modelling was my starting point, but it’s obviously totally inadequate for this sort of application. Hence why everything past the third paragraph of this post talks about dealing with imperfect optimizers.
For a quick summary of that paper, see the 3rd-to-last paragraph of The Singularity and Machine Ethics, which is really something you should read if you’re interested in FAI and inferring preferences. Likewise Learning What to Value.
Thank you, that was interesting reading. If I’m not mistaken, though, the Nielsen-Jenson paper is talking about how to make the value inference more robust in the presence of contradictory behavior. It doesn’t seem to me that this sort of procedure will reliably isolate the values we’re interested in from limitations on human rationality.
The idea (page sixteen of your second citation) of extracting a human utility function by eliminating contradictory or inconsistent features of your model of human behavior-in-general is interesting, but I have some reservations about it. There are numerous studies floating around suggesting that human moral intuition can be contradictory or incoherent, and I’d prefer not to throw the baby out with the bathwater if that’s the case.
I am not sure what you mean. Essentially, it tries to find a utility maximiser that behaves in the same way. There are multiple ways of doing this—not one canonical method.
You’re just using a different standard from me to assess the method. By saying it “applies”, I mean that you can feed these techniques imperfect optimizers and they will construct utility-based models their actions. Revealed preference theory can be successfully applied to imperfect optimizers. A good job too, since all known optimisers are imperfect.
Maybe those models don’t have the properties you are looking for—but that doesn’t represent a point of disagreement between us.
Oh, I know it will construct a utility-based optimizer perfectly well. But considering it wont actually determine their preferences, that’s rather useless for most practical purposes—such as the comment you replied to.
We don’t seem to disagree on facts, though, so I don’t think this conversation is going to go anywhere.
Um, surely this field is well established—and called “revealed preference theory”.
Well… no, actually. That sort of economical preference induction assumes that the agents it’s modelling are Homo Economus. If an agent consistently chooses x over y, it’s believed that it values x more than y—even if x is one marshmallow now, and y is two marshmallows in fifteen minutes. It’s a useful abstraction for modelling how people actually behave, but that is notably not what we’re interested in.
That sort of modelling was my starting point, but it’s obviously totally inadequate for this sort of application. Hence why everything past the third paragraph of this post talks about dealing with imperfect optimizers.
Not necessarily.
For a quick summary of that paper, see the 3rd-to-last paragraph of The Singularity and Machine Ethics, which is really something you should read if you’re interested in FAI and inferring preferences. Likewise Learning What to Value.
Thank you, that was interesting reading. If I’m not mistaken, though, the Nielsen-Jenson paper is talking about how to make the value inference more robust in the presence of contradictory behavior. It doesn’t seem to me that this sort of procedure will reliably isolate the values we’re interested in from limitations on human rationality.
The idea (page sixteen of your second citation) of extracting a human utility function by eliminating contradictory or inconsistent features of your model of human behavior-in-general is interesting, but I have some reservations about it. There are numerous studies floating around suggesting that human moral intuition can be contradictory or incoherent, and I’d prefer not to throw the baby out with the bathwater if that’s the case.
Um, “revealed preference theory” applies to imperfect optimizers just fine.
Does it have a canonical method of isolating the utility function from the details of the optimization process?
I am not sure what you mean. Essentially, it tries to find a utility maximiser that behaves in the same way. There are multiple ways of doing this—not one canonical method.
No, it doesn’t. It treats actual preferences and mistakes made while trying to implement them the same way.
You’re just using a different standard from me to assess the method. By saying it “applies”, I mean that you can feed these techniques imperfect optimizers and they will construct utility-based models their actions. Revealed preference theory can be successfully applied to imperfect optimizers. A good job too, since all known optimisers are imperfect.
Maybe those models don’t have the properties you are looking for—but that doesn’t represent a point of disagreement between us.
Oh, I know it will construct a utility-based optimizer perfectly well. But considering it wont actually determine their preferences, that’s rather useless for most practical purposes—such as the comment you replied to.
We don’t seem to disagree on facts, though, so I don’t think this conversation is going to go anywhere.