I think the “second AI” really should just be an algorithm that the first AI runs in order to evaluate actions (it should not have to learn to predict the second AI based on signals in a reward channel). A logical rather than physical connection. Otherwise bad behavior is incentivized, to control the reward channel.
GANs are neat, but their highest-scoring images aren’t all that natural—I’d be worried about any implementation of this using current ideas about supervised learning. Certainly if you desire reasoning like “this action would lead to the AI taking over the world, and that’s not something a human would do,” you’ll need some futuristic AI design.
I think the “second AI” really should just be an algorithm that the first AI runs in order to evaluate actions (it should not have to learn to predict the second AI based on signals in a reward channel). A logical rather than physical connection. Otherwise bad behavior is incentivized, to control the reward channel.
GANs are neat, but their highest-scoring images aren’t all that natural—I’d be worried about any implementation of this using current ideas about supervised learning. Certainly if you desire reasoning like “this action would lead to the AI taking over the world, and that’s not something a human would do,” you’ll need some futuristic AI design.