“Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision—provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI.”
Yes, that’s what I was referring to when saying this:
Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don’t usually have to deal with).
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).
Eliezer_Yudkowsky
Yes, that’s what I was referring to when saying this:
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).