We do not necessarily need fully formally proven theorems, other forms of high confidence in safety could be sufficient. For example, I would be comfortable with turning on an AI that is safe iff the Goldbach conjecture is true, even if we have no proof of the Goldbach conjecture.
Sounds like the kind of formal arguments I was alluding to in the post. Here you have a theorem of the form:
X⟹this system is existentially safe
Where X is a proposition. Theorems of the above form are most useful if we have significantly higher confidence in X than our prior confidence in the system’s existential safety.
We currently don’t have any idea what kind of theorems we want to prove. Formulating the right theorem is likely more difficult than proving it.
While I currently haven’t formulated any theorems I want to prove, my stumbling block to me appears to be insufficient mathematical maturity [I’ve set aside the project to study more first] and not intrinsic difficulty of the endeavour.
Theorems can rely on imperfect assumptions (that are not exact as in the real world). In such a case, it is not clear that they give us the degree of high confidence that we would like to have.
My philosophy is to set up minimal preconditions that describe the universe we live in.
Such preconditions will probably also describe other universes and may not give us very strong statements about our particular universe, but that’s fine. We can iterate on our theorems and build progressive theorems that tighten the preconditions of earlier theorems and give us stricter post conditions.
We could synthesise similar theorems, combine disparate theorems into a unified whole, generalise existing theorems, etc.
There are a lot of tools in the mathematician’s toolbox.
Theorems that rely on imperfect assumptions could still be very valuable and increase overall safety, nonetheless. For example, if we could prove something like “this software is corrigible, assuming we are in a world run by Newtonian physics” then this could (depending on the details) be high evidence for the software being corrigible in a Quantum world.
Ideally we would relax the preconditions of our theorems/seek the most general form of it. And we could interrogate if the generalised theorem is still applicable (in an interesting way) to the world we actually live in.
Sounds like the kind of formal arguments I was alluding to in the post. Here you have a theorem of the form: X⟹this system is existentially safe
Where X is a proposition. Theorems of the above form are most useful if we have significantly higher confidence in X than our prior confidence in the system’s existential safety.
Strong disagree. I have a good idea on the kind of theorems I’ll be searching for over the next few years.
While I currently haven’t formulated any theorems I want to prove, my stumbling block to me appears to be insufficient mathematical maturity [I’ve set aside the project to study more first] and not intrinsic difficulty of the endeavour.
My philosophy is to set up minimal preconditions that describe the universe we live in.
Such preconditions will probably also describe other universes and may not give us very strong statements about our particular universe, but that’s fine. We can iterate on our theorems and build progressive theorems that tighten the preconditions of earlier theorems and give us stricter post conditions.
We could synthesise similar theorems, combine disparate theorems into a unified whole, generalise existing theorems, etc.
There are a lot of tools in the mathematician’s toolbox.
Ideally we would relax the preconditions of our theorems/seek the most general form of it. And we could interrogate if the generalised theorem is still applicable (in an interesting way) to the world we actually live in.