What’s your model here, that as part of the Turing Test they ask the participant to solve the alignment problem and check whether the solution is correct? Isn’t this gonna totally fail due to 1) it taking too long, 2) not knowing how to robustly verify a solution, 3) some people/PhDs just randomly not being able to solve the alignment problem? And probably more.
So no, I don’t think passing a PhD-level Turing Test requires the ability to solve alignment.
If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.
If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.
With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.
How do you verify a solution to the alignment problem? Or if you don’t have a verification method in mind, why assume it is easier than making a solution?
I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.
But I understand this was not clear from my previous response.
A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.
What’s your model here, that as part of the Turing Test they ask the participant to solve the alignment problem and check whether the solution is correct? Isn’t this gonna totally fail due to 1) it taking too long, 2) not knowing how to robustly verify a solution, 3) some people/PhDs just randomly not being able to solve the alignment problem? And probably more.
So no, I don’t think passing a PhD-level Turing Test requires the ability to solve alignment.
If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.
If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.
With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.
How do you verify a solution to the alignment problem? Or if you don’t have a verification method in mind, why assume it is easier than making a solution?
Great question.
I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.
But I understand this was not clear from my previous response.
A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.