I expect the human operator moderating this debate would get pretty good at thinking about AGI safety, and start to become noticeably better at dismissing bad reasoning than good reasoning, at which point BoMAI would find the production of correct reasoning a good heuristic for seeming convincing.
Alternatively, the human might have a lot of adversarial examples and the debate becomes an exercise in exploring all those adversarial examples. I’m not sure how to tell what will really happen short of actually having a superintelligent AI to test with.
Alternatively, the human might have a lot of adversarial examples and the debate becomes an exercise in exploring all those adversarial examples. I’m not sure how to tell what will really happen short of actually having a superintelligent AI to test with.