From where I’m sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations.
Correct me if I’m wrong, but it doesn’t seem as though “proofs” of algorithm correctness fail as frequently as “proofs” of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
It’s easier to empirically test algorithms to weed out the incorrect ones, so there are fewer efforts to prove conjectures of correctness that are actually false.
It’s easier to formalize what it means for an algorithm to be correct than for a cryptosystem to be secure.
In both respects, proving Friendliness seems even worse than proving security.
Correct me if I’m wrong, but it doesn’t seem as though “proofs” of algorithm correctness fail as frequently as “proofs” of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
It’s easier to empirically test algorithms to weed out the incorrect ones, so there are fewer efforts to prove conjectures of correctness that are actually false.
It’s easier to formalize what it means for an algorithm to be correct than for a cryptosystem to be secure.
In both respects, proving Friendliness seems even worse than proving security.