Sure, but that’s a reason to research consistent values that are close to ours, so we have something to program into a certain kind of FAI. That’s why people research “idealizing values”, and I think it’s a worthwhile direction. Figuring out how to optimize inconsistent values could be another direction, they are not mutually exclusive.
I think that’s a very dangerous direction. It seems like it would be all too easy for judgments of value ‘closeness’ to be made on the basis of possibility/convenience/etc. (i.e., “how easy it would be to program this into an FAI”), rather than… unbiased evaluation.
Furthermore, it seems to me that if you take any set of values “close to” your own, and then optimize for those values, that optimization itself will make these values less and less close to yours. (This would be especially true if there is no practical/meaningful way to optimize your actual values!)
These two things put together, which complement (in a negative and dangerous way) each other, make me very leery of the “research consistent values that are close to ours” approach.
I think it makes sense to worry about value fragility and shoehorning, but it’s a cost-benefit thing. The benefits of consistency are large: it lets you prove stuff. And the costs seem small to me, because consistency requires nothing more than having an ordering on possible worlds. For example, if some possible world seems ok to you, you can put it at the top of the ordering. So assuming infinite power, any ok outcome that can be achieved by any other system can be achieved by a consistent system.
And even if you want to abandon consistency and talk about messy human values, OP’s point still stands: unbounded utility functions are useless. They allow “St Petersburg inconsistencies” and disallow “bounded inconsistencies”, but human values probably have both.
consistency requires nothing more than having an ordering on possible worlds. For example, if some possible world seems ok to you, you can put it at the top of the ordering. So assuming infinite power, any ok outcome that can be achieved by any other system can be achieved by a consistent system
This is an interesting point. I will have to think about it, thanks.
And even if you want to abandon consistency and talk about messy human values, OP’s point still stands: unbounded utility functions are useless.
To be clear, I take no position on this point in particular. My disagreements are as noted in my top-level comment—no more nor less. (You might say that I am questioning various aspects of the OP’s “local validity”. The broader point may stand anyway, or it may not; that is to be evaluated once the disagreements are resolved.)
That’s hardly a reason to modify our morality…!
Sure, but that’s a reason to research consistent values that are close to ours, so we have something to program into a certain kind of FAI. That’s why people research “idealizing values”, and I think it’s a worthwhile direction. Figuring out how to optimize inconsistent values could be another direction, they are not mutually exclusive.
I think that’s a very dangerous direction. It seems like it would be all too easy for judgments of value ‘closeness’ to be made on the basis of possibility/convenience/etc. (i.e., “how easy it would be to program this into an FAI”), rather than… unbiased evaluation.
Furthermore, it seems to me that if you take any set of values “close to” your own, and then optimize for those values, that optimization itself will make these values less and less close to yours. (This would be especially true if there is no practical/meaningful way to optimize your actual values!)
These two things put together, which complement (in a negative and dangerous way) each other, make me very leery of the “research consistent values that are close to ours” approach.
I think it makes sense to worry about value fragility and shoehorning, but it’s a cost-benefit thing. The benefits of consistency are large: it lets you prove stuff. And the costs seem small to me, because consistency requires nothing more than having an ordering on possible worlds. For example, if some possible world seems ok to you, you can put it at the top of the ordering. So assuming infinite power, any ok outcome that can be achieved by any other system can be achieved by a consistent system.
And even if you want to abandon consistency and talk about messy human values, OP’s point still stands: unbounded utility functions are useless. They allow “St Petersburg inconsistencies” and disallow “bounded inconsistencies”, but human values probably have both.
This is an interesting point. I will have to think about it, thanks.
To be clear, I take no position on this point in particular. My disagreements are as noted in my top-level comment—no more nor less. (You might say that I am questioning various aspects of the OP’s “local validity”. The broader point may stand anyway, or it may not; that is to be evaluated once the disagreements are resolved.)