Yes, that’s what I was referring to when saying this:
Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don’t usually have to deal with).
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).
Yes, that’s what I was referring to when saying this:
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).