Proving “friendliness” may well be impossible to define, but there are narrower desirable properties that can be proven. You could prove that it optimizes correctly on special cases that are simpler than the world as a whole; you can prove that it doesn’t have certain classes of security holes; you can prove that it’s resilient against single-bit errors. With a more detailed understanding of metaethics, we might prove that it aggregates values in a way that’s stable in spite of outliers, and that its debug output about the values it’s discovered is accurate. Basically, we should prove as much as we can, even if there are some parts that aren’t amenable to formal proof.
Proving “friendliness” may well be impossible to define, but there are narrower desirable properties that can be proven. You could prove that it optimizes correctly on special cases that are simpler than the world as a whole; you can prove that it doesn’t have certain classes of security holes; you can prove that it’s resilient against single-bit errors. With a more detailed understanding of metaethics, we might prove that it aggregates values in a way that’s stable in spite of outliers, and that its debug output about the values it’s discovered is accurate. Basically, we should prove as much as we can, even if there are some parts that aren’t amenable to formal proof.