Rohin Shah comments on Alignment as Translation

Rohin Shah 28 Mar 2020 0:02 UTC
LW: 4 AF: 3
AF
Let me know if this analogy sounds representative of the strategies you imagine.
Yeah, it does. I definitely agree that this doesn’t get around the chicken-and-egg problem, and so shouldn’t be expected to succeed on the first try. It’s more like you get to keep trying this strategy over and over again until you eventually succeed, because if everything goes wrong you just unplug the AI system and start over.
the chicken-and-egg problem is a ground truth problem. If we have enough data to estimate X to within 5%, then doing clever things with that data is not going reduce that error any further.
I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
(This does suggest that you wouldn’t ever be able to ask your AI system to do something completely novel without having a human along to ensure it’s what we actually meant, which seems wrong to me, but I can’t articulate why.)
- johnswentworth 28 Mar 2020 0:29 UTC
  LW: 4 AF: 3
  AF Parent
  I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
  That’s the sort of strategy where illusion of transparency is a big problem, from a translation point of view. The difficult cases are exactly the cases where the translation usually produces the results you expect, but then produce something completely different in some rare cases.
  Another way to put it: if we’re gathering data by seeing whether the system did what we wanted, then the long tail problem works against us pretty badly. Those rare tail-cases are exactly the cases we would need to observe in order to notice problems and improve the system. We’re not going to have very many of them to work with. Ability to generalize from small data sets becomes a key capability, but then we need to translate how-to-generalize in order for the AI to generalize in the ways we want (this gets at the can’t-ask-the-AI-to-do-anything-novel problem).