johnswentworth comments on Alignment as Translation

johnswentworth Mar 27, 2020, 9:43 PM
LW: 4 AF: 3
AF
An important part of my intuition about value-in-the-tail is that if your first solution can knock off 95% of the risk, you can then use the resulting AI system to design a new AI system where you’ve translated better and now you’ve eliminated 99% of the risk...
I don’t see how this ever actually gets around the chicken-and-egg problem.
An analogy: we want to translate from English to Korean. We first obtain a translation dictionary which is 95% accurate, then use it to ask our Korean-speaking friend to help out. Problem is, there’s a very important difference between very similar translations of “help me translate things”—e.g. consider the difference between “what would you say if you wanted to convey X?” and “what should I say if I want to convey X?”, when giving instructions to an AI. Both of those would produce very similar results, right up until everything went wrong. (Let me know if this analogy sounds representative of the strategies you imagine.)
If you do manage to get that first translation exactly right, and successfully ask your friend for help, then you’re good—similar to the “translate how-to-translate” strategy from the OP. And with a 95% accurate dictionary, you might even have a decent chance of getting that first translation right. But if that first translation isn’t perfect, then you need some way to find that out safely—and the 95% accurate dictionary doesn’t make that any easier.
Another way to look at it: the chicken-and-egg problem is a ground truth problem. If we have enough data to estimate X to within 5%, then doing clever things with that data is not going reduce that error any further. We need some other way to get at the ground truth, in order to actually reduce the error rate. If we know how to convey what-we-want with 95% accuracy, then we need some other way to get at the ground truth of translation in order to increase that accuracy further.
- Rohin Shah Mar 28, 2020, 12:02 AM
  LW: 4 AF: 3
  AF Parent
  Let me know if this analogy sounds representative of the strategies you imagine.
  Yeah, it does. I definitely agree that this doesn’t get around the chicken-and-egg problem, and so shouldn’t be expected to succeed on the first try. It’s more like you get to keep trying this strategy over and over again until you eventually succeed, because if everything goes wrong you just unplug the AI system and start over.
  the chicken-and-egg problem is a ground truth problem. If we have enough data to estimate X to within 5%, then doing clever things with that data is not going reduce that error any further.
  I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
  (This does suggest that you wouldn’t ever be able to ask your AI system to do something completely novel without having a human along to ensure it’s what we actually meant, which seems wrong to me, but I can’t articulate why.)
  - johnswentworth Mar 28, 2020, 12:29 AM
    LW: 4 AF: 3
    AF Parent
    I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
    That’s the sort of strategy where illusion of transparency is a big problem, from a translation point of view. The difficult cases are exactly the cases where the translation usually produces the results you expect, but then produce something completely different in some rare cases.
    Another way to put it: if we’re gathering data by seeing whether the system did what we wanted, then the long tail problem works against us pretty badly. Those rare tail-cases are exactly the cases we would need to observe in order to notice problems and improve the system. We’re not going to have very many of them to work with. Ability to generalize from small data sets becomes a key capability, but then we need to translate how-to-generalize in order for the AI to generalize in the ways we want (this gets at the can’t-ask-the-AI-to-do-anything-novel problem).