But wouldn’t that actually support my approach? Assuming that there really is something important that all of humanity misses but the AI understands:
-If you hardcode the AI’s optimal goal based on human deliberations you are guaranteed to miss this important thing.
-If you use the method I suggested, the AI will, driven by the desire to speak the truth, try to explain the problem to the humans who will in turn tell the AI what they think of that.
But wouldn’t that actually support my approach? Assuming that there really is something important that all of humanity misses but the AI understands:
-If you hardcode the AI’s optimal goal based on human deliberations you are guaranteed to miss this important thing.
-If you use the method I suggested, the AI will, driven by the desire to speak the truth, try to explain the problem to the humans who will in turn tell the AI what they think of that.
I don’t see how that’s relivant to philosophical questions about truth. Did you mean to reply to my other comment?