“AI that can verify itself” seems likely doable for reasons wholly unrelated to metamathematics (unlike what you claim offhandedly) since AIs are finite objects that nevertheless need to handle a combinatorially large space. This has the flavor of “searching a combinatorial explosion based on a small yet well-structured set of criteria” (ie the relatively easy instances of various NP problems), which has had a fair bit of success with SAT/SMT solvers and nonconvex optimizers and evolutionary algorithms and whatnot. I don’t think constructing a system that systematically explores the exponentially big input space of a neural network is going to be too hard a challenge.
Also, has anyone really constructed a specific self-verifying theory yet? (It seems like from the English Wikipedia article, if I understand correctly, Dan Willard came up with a system where subtraction and division are primitive operations with addition and multiplication defined in terms of them, with it being impossible to prove that multiplication is a total function.)
The issue arises specifically in the situation of recursive self-improvement: You can’t prove self-consistency in mathematical frameworks of “sufficient complexity” (that is, containing the rules of arithmetic in a provable manner).
What this cashes out to is that, considering AI as a mathematical framework, and the next generation of AI (designed by the first) as a secondary mathematical framework—you can’t actually prove that there are no contradictions in an umbrella mathematical framework that comprises both of them, if they are of “sufficient complexity”. Which means an AI cannot -prove- that a successor AI has not experienced value drift—that is, that the combined mathematical framework does not contain contradictions—if they are of sufficient complexity.
To illustrate the issue, suppose the existence of a powerful creator AI, designing its successor; the successor, presumably, is more powerful than the creator AI in some fashion, and so there are areas of the combinatorially large space that the successor AI can explore (in a reasonable timeframe), but that the creator AI cannot. If the creator can prove there are no contradictions in the combined mathematical framework—then, supposing its values are embedded in that framework in a provable manner, it can be assured that the successor has not experienced value drift.
Mind, I don’t particularly think the above scenario is terribly likely; I have strong doubts about basically everything in there, in particular the idea of provable values. I created the post for the five or six people who might still be interested in ideas I haven’t seen kicked around on Less Wrong for over a decade.
“AI that can verify itself” seems likely doable for reasons wholly unrelated to metamathematics (unlike what you claim offhandedly) since AIs are finite objects that nevertheless need to handle a combinatorially large space. This has the flavor of “searching a combinatorial explosion based on a small yet well-structured set of criteria” (ie the relatively easy instances of various NP problems), which has had a fair bit of success with SAT/SMT solvers and nonconvex optimizers and evolutionary algorithms and whatnot. I don’t think constructing a system that systematically explores the exponentially big input space of a neural network is going to be too hard a challenge.
Also, has anyone really constructed a specific self-verifying theory yet? (It seems like from the English Wikipedia article, if I understand correctly, Dan Willard came up with a system where subtraction and division are primitive operations with addition and multiplication defined in terms of them, with it being impossible to prove that multiplication is a total function.)
The issue arises specifically in the situation of recursive self-improvement: You can’t prove self-consistency in mathematical frameworks of “sufficient complexity” (that is, containing the rules of arithmetic in a provable manner).
What this cashes out to is that, considering AI as a mathematical framework, and the next generation of AI (designed by the first) as a secondary mathematical framework—you can’t actually prove that there are no contradictions in an umbrella mathematical framework that comprises both of them, if they are of “sufficient complexity”. Which means an AI cannot -prove- that a successor AI has not experienced value drift—that is, that the combined mathematical framework does not contain contradictions—if they are of sufficient complexity.
To illustrate the issue, suppose the existence of a powerful creator AI, designing its successor; the successor, presumably, is more powerful than the creator AI in some fashion, and so there are areas of the combinatorially large space that the successor AI can explore (in a reasonable timeframe), but that the creator AI cannot. If the creator can prove there are no contradictions in the combined mathematical framework—then, supposing its values are embedded in that framework in a provable manner, it can be assured that the successor has not experienced value drift.
Mind, I don’t particularly think the above scenario is terribly likely; I have strong doubts about basically everything in there, in particular the idea of provable values. I created the post for the five or six people who might still be interested in ideas I haven’t seen kicked around on Less Wrong for over a decade.