Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn’t mean we expect to ever have a definite “proof” that system will be stably friendly.
Formal methods work for today’s safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.
Otherwise compared to nothing, or otherwise compared to informal methods?
Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn’t have in the non AI domain, namely you lose the potential to tell an AI “stop doing that,it isn’t nice”
Why talk about unupdateable UFs and “solving morality” if you are not going for that approach?
Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn’t mean we expect to ever have a definite “proof” that system will be stably friendly.
Formal methods work for today’s safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.
Otherwise compared to nothing, or otherwise compared to informal methods?
Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn’t have in the non AI domain, namely you lose the potential to tell an AI “stop doing that,it isn’t nice”
How so?
Do you think that wouldl work on Clippie?