It always seemed to me that this strategy had the fatal flaw that we would not be able to tell if the AI was really already superintelligent and was just playing dumb and telling us what we wanted to hear so that we would let it loose, or if the AI really was just learning.
In addition to that fatal flaw, it seems to me that the above quote suggests another fatal flaw to the “raising an AI” strategy—that there would be a limited time window in which the AI’s utility function would still be malleable. It would appear that, as soon as part of the AI figures out how to do consequentialist reasoning about code, then its “critical period” in which we could still mould its utility function would be over. Is this the right way of thinking about this, or is this line of thought waaaay too amateurish?
This problem is essentially what MIRI has been calling corrigibility. A corrigible AI is one that understands and accepts that it or its utility function is not yet complete.
This problem is essentially what MIRI has been calling corrigibility. A corrigible AI is one that understands and accepts that it or its utility function is not yet complete.