Why are people on Less Wrong still talking about ‘their’ ‘values’ using deviations from a model that assumes they have a ‘utility function’? It’s not enough to explicitly believe and disclaim that this is obviously an incorrect model, at some point you have to actually stop using the model and adopt something else. People are godshatter, they are incoherent, they are inconsistent, they are an abstraction, they are confused about morality, their revealed preferences aren’t their preferences, their revealed preferences aren’t even their revealed preferences, their verbally expressed preferences aren’t even preferences, the beliefs of parts of them about the preferences of other parts of them aren’t their preferences, the beliefs of parts of them aren’t even beliefs, preferences aren’t morality, predisposition isn’t justification, et cetera...
We might make something someday that isn’t godshatter, and we need to practice.
I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.
But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing?
Also, I don’t think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn’t mean that we should have to use it.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.
We might make something someday that isn’t godshatter, and we need to practice.
I agree that reforming humans to be rational is hopeless, but it is nevertheless useful to imagine how a rational being would deal with things.
But VNM utility is just one particularly unintuitive property of rational agents. (For instance, I would never ever use a utility function to represent the values of an AGI.) Surely we can talk about rational agents in other ways that are not so confusing?
Also, I don’t think VNM utility takes into account things like bounded computational resources, although I could be wrong. Either way, just because something is mathematically proven to exist doesn’t mean that we should have to use it.
Who is sure? If you’re saying that, I hope you are. What do you propose?
I don’t think anybody advocated what you’re arguing against there.
The nearest thing I’m willing to argue for is that one of the following possibilities hold:
We use something that has been mathematically proven to exist, now.
We might be speaking nonsense, depending on whether the concepts we’re using can be mathematically proven to make sense in the future.
Since even irrational agents can be modelled using a utility function, no “reforming” is needed.
How can they be modeled with a utility function?
As explained here:
Thanks for the reference.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
The whole universe may well be computable—according to the Church–Turing–Deutsch principle. If it isn’t the above analysis may not apply.
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.