Daniel Kokotajlo comments on Alignment as Translation

Daniel Kokotajlo 19 Mar 2020 23:21 UTC
LW: 4 AF: 2
AF
“What if we verify the translation by having one group translate English-to-Korean, another group translate back, and reward both when the result matches the original?”
This is a fun idea. Does it work in practice for machine translation?
In the AI safety context, perhaps it would look like: A human gives an AI in a game world some instructions. The AI then goes and does stuff in the game world, and another AI looks at it and reports back to the human. The human then decides whether the report is sufficiently similar to the instructions that both AIs deserve reward.
I feel like eventually this would reach a bad equilibria where the acting-AI just writes out the instructions somewhere and the reporting-AI just reports what they see written.
- Steven Byrnes 20 Mar 2020 11:55 UTC
  LW: 6 AF: 2
  AF Parent
  
  This is a fun idea. Does it work in practice for machine translation?
  
  I still find it mind-blowing, but unsupervised machine translation is a thing.
  What links here?
  - Could we use current AI methods to understand dolphins? by Daniel Kokotajlo (22 Mar 2020 14:45 UTC; 21 points)
  - Daniel Kokotajlo 20 Mar 2020 13:36 UTC
    LW: 4 AF: 1
    AF Parent
    Holy shit, that’s awesome. I wonder if it would work to figure out what dolphins, whales, etc. are saying.
    - Vaniver 26 Mar 2020 20:20 UTC
      LW: 2 AF: 1
      AF Parent
      I think you run into a problem that most animal communication is closer to a library of different sounds, each of which maps to a whole message, than it is something whose content is determined by internal structure, so you don’t have the sort of corpus you need for unsupervised learning (while you do have the ability to do supervised learning).