Samantha, what you’re obtaining is not Probability 1 of doing the right thing. What you’re obtaining is a precise (not “formal”, precise) statement of how you’ve defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions assuming that the transistors on the computer chip behave the way they’re supposed to, along with some formalization of reflective decision theory that lets you describe what happens when the AI modifies itself and the condition it will try to prove before modifying itself.
Anything short of this is not a sufficiently high standard to cause you to actually think about the problem. I can imagine trying to do this and surviving, but not anything short of that.
Samantha, what you’re obtaining is not Probability 1 of doing the right thing. What you’re obtaining is a precise (not “formal”, precise) statement of how you’ve defined root-level Friendliness along with a mathematical proof (probably computer-assisted) that this property holds in the initial conditions assuming that the transistors on the computer chip behave the way they’re supposed to, along with some formalization of reflective decision theory that lets you describe what happens when the AI modifies itself and the condition it will try to prove before modifying itself.
Anything short of this is not a sufficiently high standard to cause you to actually think about the problem. I can imagine trying to do this and surviving, but not anything short of that.