“What if we verify the translation by having one group translate English-to-Korean, another group translate back, and reward both when the result matches the original?”
This is a fun idea. Does it work in practice for machine translation?
In the AI safety context, perhaps it would look like: A human gives an AI in a game world some instructions. The AI then goes and does stuff in the game world, and another AI looks at it and reports back to the human. The human then decides whether the report is sufficiently similar to the instructions that both AIs deserve reward.
I feel like eventually this would reach a bad equilibria where the acting-AI just writes out the instructions somewhere and the reporting-AI just reports what they see written.
I think you run into a problem that most animal communication is closer to a library of different sounds, each of which maps to a whole message, than it is something whose content is determined by internal structure, so you don’t have the sort of corpus you need for unsupervised learning (while you do have the ability to do supervised learning).
“What if we verify the translation by having one group translate English-to-Korean, another group translate back, and reward both when the result matches the original?”
This is a fun idea. Does it work in practice for machine translation?
In the AI safety context, perhaps it would look like: A human gives an AI in a game world some instructions. The AI then goes and does stuff in the game world, and another AI looks at it and reports back to the human. The human then decides whether the report is sufficiently similar to the instructions that both AIs deserve reward.
I feel like eventually this would reach a bad equilibria where the acting-AI just writes out the instructions somewhere and the reporting-AI just reports what they see written.
I still find it mind-blowing, but unsupervised machine translation is a thing.
Holy shit, that’s awesome. I wonder if it would work to figure out what dolphins, whales, etc. are saying.
I think you run into a problem that most animal communication is closer to a library of different sounds, each of which maps to a whole message, than it is something whose content is determined by internal structure, so you don’t have the sort of corpus you need for unsupervised learning (while you do have the ability to do supervised learning).