If our safety research is useless, this path to AGI gives me the most hope, because it may produce math that lets us solve alignment before it becomes general.
Because of the Curry-Howard correspondence, as well as for other reasons, it does not seem that the distance between solving math problems and writing AIs is large. I mean, actually, according to the correspondence, the distance is zero, but perhaps we may grant that programming an AI is a different kind of math problem from the Olympiad fare. Does this make you feel safe?
Also, it seems that the core difficulty in alignment is more in producing definitions and statements of theorems, than in proving theorems. What should the math even look like or be about? A proof assistant is not helpful here.
Writing AIs is not running them. Proving what they would do is, but we need not have the math engine design an AI and prove it safe. We need it to babble about agent foundations in the same way that it would presumably be inspiring to hear Ramanujan talk in his sleep.
The math engine I’m looking for would be able to intuit not only a lemma that helps prove a theorem, but a conjecture, which is just a lemma when you don’t know the theorem. Or a definition, which is to a conjecture as sets are to truth values. A human who has proven many theorems sometimes becomes able to write them in turn, why should language models be any different?
I can sense some math we need: An AI is more interpretable if the task of interpreting it can be decomposed into interpreting its parts, we want the assembly of descriptions to be associative, an AI design tolerates more mistakes if its behavior is more continuous in its parts than a maximizer’s in its utility function. Category Theory formalizes such intuitions, and even a tool that rewrites all our math in its terms would help a lot, let alone one that invents a math language even better at CTs job of the short sentences being the useful ones.
On the usefulness of proving theorems vs. writing them down: I think there’s more of a back and forth. See for instance Nature’s post on how DM used an AI to guide intuition (https://www.nature.com/articles/s41586-021-04086-x).
To me, it’s going to help humans “babble” more in math by using some extension, like a Github Copilot but for math. And I feel that overall increasing math generation is more “positive for alignment in EV” than generating code, especially when considering agent foundations.
Besides. in ML the correct proofs can happen many years after algorithms are used by everyone in the community (e.g. Adam’s proof shown to be wrong in 2018). Having a way to have a more grounded understanding of things could help both for interpretability & having guarantees of safety.
If our safety research is useless, this path to AGI gives me the most hope, because it may produce math that lets us solve alignment before it becomes general.
Because of the Curry-Howard correspondence, as well as for other reasons, it does not seem that the distance between solving math problems and writing AIs is large. I mean, actually, according to the correspondence, the distance is zero, but perhaps we may grant that programming an AI is a different kind of math problem from the Olympiad fare. Does this make you feel safe?
Also, it seems that the core difficulty in alignment is more in producing definitions and statements of theorems, than in proving theorems. What should the math even look like or be about? A proof assistant is not helpful here.
Writing AIs is not running them. Proving what they would do is, but we need not have the math engine design an AI and prove it safe. We need it to babble about agent foundations in the same way that it would presumably be inspiring to hear Ramanujan talk in his sleep.
The math engine I’m looking for would be able to intuit not only a lemma that helps prove a theorem, but a conjecture, which is just a lemma when you don’t know the theorem. Or a definition, which is to a conjecture as sets are to truth values. A human who has proven many theorems sometimes becomes able to write them in turn, why should language models be any different?
I can sense some math we need: An AI is more interpretable if the task of interpreting it can be decomposed into interpreting its parts, we want the assembly of descriptions to be associative, an AI design tolerates more mistakes if its behavior is more continuous in its parts than a maximizer’s in its utility function. Category Theory formalizes such intuitions, and even a tool that rewrites all our math in its terms would help a lot, let alone one that invents a math language even better at CTs job of the short sentences being the useful ones.
On the usefulness of proving theorems vs. writing them down: I think there’s more of a back and forth. See for instance Nature’s post on how DM used an AI to guide intuition (https://www.nature.com/articles/s41586-021-04086-x).
To me, it’s going to help humans “babble” more in math by using some extension, like a Github Copilot but for math. And I feel that overall increasing math generation is more “positive for alignment in EV” than generating code, especially when considering agent foundations.
Besides. in ML the correct proofs can happen many years after algorithms are used by everyone in the community (e.g. Adam’s proof shown to be wrong in 2018). Having a way to have a more grounded understanding of things could help both for interpretability & having guarantees of safety.