I’d expect implementing Friendliness to be easier than verifying Friendliness,
I’d rather like to verify that my AGI would be friendly before I run it. :) (Usually the label FAI seems to refer to AIs which will be ‘provably friendly’.)
You might be able to verify interesting properties of code that you constructed for the purpose of making verification possible, but you aren’t likely to be able to verify interesting properties of arbitrary hostile code like Clippy would have an incentive to produce.
You passed up an opportunity to point to your proposed verification procedure, so at this point I assume you don’t have one. Please prove me wrong.
Usually the label FAI seems to refer to AIs which will be ‘provably friendly’.
I don’t even know what the exact theorem to prove would be. Do you?
I’d rather like to verify that my AGI would be friendly before I run it. :) (Usually the label FAI seems to refer to AIs which will be ‘provably friendly’.)
You might be able to verify interesting properties of code that you constructed for the purpose of making verification possible, but you aren’t likely to be able to verify interesting properties of arbitrary hostile code like Clippy would have an incentive to produce.
You passed up an opportunity to point to your proposed verification procedure, so at this point I assume you don’t have one. Please prove me wrong.
I don’t even know what the exact theorem to prove would be. Do you?