If I had a printout of the source code for a superintelligence in front of me, I could stare at it all day at no risk to myself or anyone around me. Of course I might be too dumb to tell what it would do, but an entity much better at analyzing source code than I am is similarly safe.
Remember Rice’s theorem? It doesn’t matter how smart you are; undecidable is undecidable.
A better way of making your argument might be to suggest that entity that was better at programming would have intentionally constructed a program that it knew was safe to begin with, and therefore had no need of simulation, rather than that it could just inspect any arbitrary program and know that it was safe.
That would, I think, also be a much safer approach for humans than building an uninterpretable ML system trained in some ad hoc way, and then trying to “test in correctness” by simulation...
I anticipated and addressed the objection around Rice’s theorem (without calling it that) in a child to my first comment, which was published 16 min before your comment, but maybe it took you 16 min to write yours.
A better way of making your argument might be to suggest that entity that was better at programming would have intentionally constructed a program that it knew was safe to begin with, and therefore had no need of simulation, rather than that it could just inspect any arbitrary program and know that it was safe.
I was assuming the reader would be charitable enough to me to interpret my words as including that possibility (since verifying that a program has property X is so similar to constructing a program with property X).
I’m sorry to have misjudged you. Possibly the reason is that, in my mind, constructing a program that provably has property X, and in the process generating a proof, feels like an almost totally different activity from trying to generate a proof given a program from an external source, especially if the property is nontrivial.
Remember Rice’s theorem? It doesn’t matter how smart you are; undecidable is undecidable.
A better way of making your argument might be to suggest that entity that was better at programming would have intentionally constructed a program that it knew was safe to begin with, and therefore had no need of simulation, rather than that it could just inspect any arbitrary program and know that it was safe.
That would, I think, also be a much safer approach for humans than building an uninterpretable ML system trained in some ad hoc way, and then trying to “test in correctness” by simulation...
I anticipated and addressed the objection around Rice’s theorem (without calling it that) in a child to my first comment, which was published 16 min before your comment, but maybe it took you 16 min to write yours.
I was assuming the reader would be charitable enough to me to interpret my words as including that possibility (since verifying that a program has property X is so similar to constructing a program with property X).
I’m sorry to have misjudged you. Possibly the reason is that, in my mind, constructing a program that provably has property X, and in the process generating a proof, feels like an almost totally different activity from trying to generate a proof given a program from an external source, especially if the property is nontrivial.
I agree with that, for sure. I didn’t point it out because the reader does not need to consider that distinction to follow my argument.