David Scott Krueger (formerly: capybaralet) comments on Towards a mechanistic understanding of corrigibility

David Scott Krueger (formerly: capybaralet) 1 Mar 2020 20:26 UTC
LW: 1 AF: 1
AF
What do you mean “these things”?
Also, to clarify, when you say “not going to be useful for alignment”, do you mean something like ”...for alignment of arbitrarily capable systems”? i.e. do you think they could be useful for aligning systems that aren’t too much smarter than humans?