Maybe I’m reading your post wrong, but it seems that you’re assuming that a coherent approach is needed in a way that could be counter-productive. I think that a model of an individual’s preferences is likely to be better represented by taking multiple approaches, where each fails differently. I’d think that a method that extends or uses revealed preferences would have advantages and disadvantages that none of, say, stated preferences, TD Learning, CEV, or indirect normativity share, and the same would be true for each of that list. I think that we want that type of robust multi-model approach as part of the way we mitigate over-optimization failures, and to limit our downside from model specification errors.
(I also think that we might be better off building AI to evaluate actions on the basis of some moral congress approach using differently elicited preferences across multiple groups, and where decisions need a super-majority of some sort as a hedge against over-optimization of an incompletely specified version of morality. But it may be over-restrictive, and not allow any actions—so it’s a weakly held theory, and I haven’t discussed it with anyone.)
I think that a model of an individual’s preferences is likely to be better represented by taking multiple approaches, where each fails differently.
I agree. But what counts as a failure? Unless we have a theory of what we’re trying to define, we can’t define failure beyond our own vague intuitions. But once we have a better theory, defining failure becomes a lot easier.
I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.
As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I’d bet on an average of 50 untrained idiots over any single superforecaster.
Maybe I’m reading your post wrong, but it seems that you’re assuming that a coherent approach is needed in a way that could be counter-productive. I think that a model of an individual’s preferences is likely to be better represented by taking multiple approaches, where each fails differently. I’d think that a method that extends or uses revealed preferences would have advantages and disadvantages that none of, say, stated preferences, TD Learning, CEV, or indirect normativity share, and the same would be true for each of that list. I think that we want that type of robust multi-model approach as part of the way we mitigate over-optimization failures, and to limit our downside from model specification errors.
(I also think that we might be better off building AI to evaluate actions on the basis of some moral congress approach using differently elicited preferences across multiple groups, and where decisions need a super-majority of some sort as a hedge against over-optimization of an incompletely specified version of morality. But it may be over-restrictive, and not allow any actions—so it’s a weakly held theory, and I haven’t discussed it with anyone.)
I agree. But what counts as a failure? Unless we have a theory of what we’re trying to define, we can’t define failure beyond our own vague intuitions. But once we have a better theory, defining failure becomes a lot easier.
I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.
As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I’d bet on an average of 50 untrained idiots over any single superforecaster.