I agree that exotic decision algorithms or preference transformations are probably not going to be useful for alignment, but I think this kind of activity is currently more fruitful for theory building than directly trying to get decision theory right. It’s just that the usual framing is suspect: instead of exploration of the decision theory landscape by considering clearly broken/insane-acting/useless but not yet well-understood constructions, these things are pitched (and chosen) for their perceived use in alignment.
Also, to clarify, when you say “not going to be useful for alignment”, do you mean something like ”...for alignment of arbitrarily capable systems”? i.e. do you think they could be useful for aligning systems that aren’t too much smarter than humans?
I agree that exotic decision algorithms or preference transformations are probably not going to be useful for alignment, but I think this kind of activity is currently more fruitful for theory building than directly trying to get decision theory right. It’s just that the usual framing is suspect: instead of exploration of the decision theory landscape by considering clearly broken/insane-acting/useless but not yet well-understood constructions, these things are pitched (and chosen) for their perceived use in alignment.
What do you mean “these things”?
Also, to clarify, when you say “not going to be useful for alignment”, do you mean something like ”...for alignment of arbitrarily capable systems”? i.e. do you think they could be useful for aligning systems that aren’t too much smarter than humans?