Vladimir_Nesov comments on Towards a mechanistic understanding of corrigibility

Vladimir_Nesov 24 Sep 2019 3:56 UTC
LW: 2 AF: 1
AF
I agree that exotic decision algorithms or preference transformations are probably not going to be useful for alignment, but I think this kind of activity is currently more fruitful for theory building than directly trying to get decision theory right. It’s just that the usual framing is suspect: instead of exploration of the decision theory landscape by considering clearly broken/insane-acting/useless but not yet well-understood constructions, these things are pitched (and chosen) for their perceived use in alignment.
- David Scott Krueger (formerly: capybaralet) 1 Mar 2020 20:26 UTC
  LW: 1 AF: 1
  AF Parent
  What do you mean “these things”?
  Also, to clarify, when you say “not going to be useful for alignment”, do you mean something like ”...for alignment of arbitrarily capable systems”? i.e. do you think they could be useful for aligning systems that aren’t too much smarter than humans?