johnswentworth comments on Alignment as Translation

johnswentworth 28 Mar 2020 0:29 UTC
LW: 4 AF: 3
AF
I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
That’s the sort of strategy where illusion of transparency is a big problem, from a translation point of view. The difficult cases are exactly the cases where the translation usually produces the results you expect, but then produce something completely different in some rare cases.
Another way to put it: if we’re gathering data by seeing whether the system did what we wanted, then the long tail problem works against us pretty badly. Those rare tail-cases are exactly the cases we would need to observe in order to notice problems and improve the system. We’re not going to have very many of them to work with. Ability to generalize from small data sets becomes a key capability, but then we need to translate how-to-generalize in order for the AI to generalize in the ways we want (this gets at the can’t-ask-the-AI-to-do-anything-novel problem).