The point is if GR is wrong and the AI doesn’t output GR because it’s wrong, then your test will say that the AI isn’t that smart. And then you do something like letting it out of the box and everyone probably dies.
And if the AI is that smart it will lie anyway....
Presumably it’s outputting the thing that’s right where GR is wrong, in which case you should be able to tell, at least in so much as it’s consistent with GR in all the places that GR has been tested.
Maybe it outputs something that’s just to hard to understand so you can’t actually tell what it’s predictions are, in which case you haven’t learned anything from your test.
The point is if GR is wrong and the AI doesn’t output GR because it’s wrong, then your test will say that the AI isn’t that smart. And then you do something like letting it out of the box and everyone probably dies.
And if the AI is that smart it will lie anyway....
Presumably it’s outputting the thing that’s right where GR is wrong, in which case you should be able to tell, at least in so much as it’s consistent with GR in all the places that GR has been tested.
Maybe it outputs something that’s just to hard to understand so you can’t actually tell what it’s predictions are, in which case you haven’t learned anything from your test.