I think this kind of research is very harmful and should stop.
I think it’s important to repeat this even if it’s common knowledge in many of our circles, because it’s not in broader circles, and we should not give up on reminding people not to conduct research that leads to net increased risk of destroying the world, even if its really cool, gets you promoted, or makes you a lot of money.
Again, OpenAI people, if you’re reading this, please stop.
I think it’s very strange that this is the work that gets this sort of pushback—of all the capabilities research out there, I think this is some of the best from an alignment perspective. See e.g. STEM AI and Thoughts on Human Models for context on why this sort of work is good.
If our safety research is useless, this path to AGI gives me the most hope, because it may produce math that lets us solve alignment before it becomes general.
Because of the Curry-Howard correspondence, as well as for other reasons, it does not seem that the distance between solving math problems and writing AIs is large. I mean, actually, according to the correspondence, the distance is zero, but perhaps we may grant that programming an AI is a different kind of math problem from the Olympiad fare. Does this make you feel safe?
Also, it seems that the core difficulty in alignment is more in producing definitions and statements of theorems, than in proving theorems. What should the math even look like or be about? A proof assistant is not helpful here.
Writing AIs is not running them. Proving what they would do is, but we need not have the math engine design an AI and prove it safe. We need it to babble about agent foundations in the same way that it would presumably be inspiring to hear Ramanujan talk in his sleep.
The math engine I’m looking for would be able to intuit not only a lemma that helps prove a theorem, but a conjecture, which is just a lemma when you don’t know the theorem. Or a definition, which is to a conjecture as sets are to truth values. A human who has proven many theorems sometimes becomes able to write them in turn, why should language models be any different?
I can sense some math we need: An AI is more interpretable if the task of interpreting it can be decomposed into interpreting its parts, we want the assembly of descriptions to be associative, an AI design tolerates more mistakes if its behavior is more continuous in its parts than a maximizer’s in its utility function. Category Theory formalizes such intuitions, and even a tool that rewrites all our math in its terms would help a lot, let alone one that invents a math language even better at CTs job of the short sentences being the useful ones.
On the usefulness of proving theorems vs. writing them down: I think there’s more of a back and forth. See for instance Nature’s post on how DM used an AI to guide intuition (https://www.nature.com/articles/s41586-021-04086-x).
To me, it’s going to help humans “babble” more in math by using some extension, like a Github Copilot but for math. And I feel that overall increasing math generation is more “positive for alignment in EV” than generating code, especially when considering agent foundations.
Besides. in ML the correct proofs can happen many years after algorithms are used by everyone in the community (e.g. Adam’s proof shown to be wrong in 2018). Having a way to have a more grounded understanding of things could help both for interpretability & having guarantees of safety.
I think it might be better if you said “may have very harmful long-run consequences” or “is very harmful in expectation” rather than “is very harmful.” I worry that people who don’t already agree with you will find it easier to roll their eyes at “is very harmful.”
Is gain-of-function research “very harmful”? I feel like it’s not appropriate to nickel-and-dime this.
And also, yes, I do think it’s harmful directly, in addition to eventually in expectation. It’s a substantial derogation of a norm that should exist. To explain this concept further:
In addition to risking pandemics, participating in gain-of-function research also sullies and debases the research community, and makes it less the shape it needs to be culturally to do epidemiology. Refusing to take massive risks with minor upsides, even if they’re cool, is also a virtue cultivation practice.
When a politician talks openly about how he wants to rig elections, exchange military aid for domestic political assistance, etc., he causes direct harm now even if the “plans” do not amount to anything later. This is because the speech acts disrupt the equilibria that make similar things less likely in general.
My comments here are intended as an explicit, loud signal of condemnation. This research is misconduct. Frankly, I am frustrated I have to be the one to say this, when it duly falls to community leaders to do so.
I don’t think we disagree about the harmfulness of this kind of research. Our disagreement is about the probable consequences of going around saying “I think this research is harmful and should stop.”
It’s the classic disagreement about how “righteous” vs. “moderate” a movement should be. “Speaking truth to power” vs. “winning hearts and minds.” I don’t have anything interesting to say here, I was just putting in a vote for a small move towards the “moderate” direction. I defer to the judgment of people who spend more time talking to policymakers and AGI capabilities researchers, and if you are such a person, then I defer to your judgment.
I think this kind of research is very harmful and should stop.
I think it’s important to repeat this even if it’s common knowledge in many of our circles, because it’s not in broader circles, and we should not give up on reminding people not to conduct research that leads to net increased risk of destroying the world, even if its really cool, gets you promoted, or makes you a lot of money.
Again, OpenAI people, if you’re reading this, please stop.
I think it’s very strange that this is the work that gets this sort of pushback—of all the capabilities research out there, I think this is some of the best from an alignment perspective. See e.g. STEM AI and Thoughts on Human Models for context on why this sort of work is good.
If our safety research is useless, this path to AGI gives me the most hope, because it may produce math that lets us solve alignment before it becomes general.
Because of the Curry-Howard correspondence, as well as for other reasons, it does not seem that the distance between solving math problems and writing AIs is large. I mean, actually, according to the correspondence, the distance is zero, but perhaps we may grant that programming an AI is a different kind of math problem from the Olympiad fare. Does this make you feel safe?
Also, it seems that the core difficulty in alignment is more in producing definitions and statements of theorems, than in proving theorems. What should the math even look like or be about? A proof assistant is not helpful here.
Writing AIs is not running them. Proving what they would do is, but we need not have the math engine design an AI and prove it safe. We need it to babble about agent foundations in the same way that it would presumably be inspiring to hear Ramanujan talk in his sleep.
The math engine I’m looking for would be able to intuit not only a lemma that helps prove a theorem, but a conjecture, which is just a lemma when you don’t know the theorem. Or a definition, which is to a conjecture as sets are to truth values. A human who has proven many theorems sometimes becomes able to write them in turn, why should language models be any different?
I can sense some math we need: An AI is more interpretable if the task of interpreting it can be decomposed into interpreting its parts, we want the assembly of descriptions to be associative, an AI design tolerates more mistakes if its behavior is more continuous in its parts than a maximizer’s in its utility function. Category Theory formalizes such intuitions, and even a tool that rewrites all our math in its terms would help a lot, let alone one that invents a math language even better at CTs job of the short sentences being the useful ones.
On the usefulness of proving theorems vs. writing them down: I think there’s more of a back and forth. See for instance Nature’s post on how DM used an AI to guide intuition (https://www.nature.com/articles/s41586-021-04086-x).
To me, it’s going to help humans “babble” more in math by using some extension, like a Github Copilot but for math. And I feel that overall increasing math generation is more “positive for alignment in EV” than generating code, especially when considering agent foundations.
Besides. in ML the correct proofs can happen many years after algorithms are used by everyone in the community (e.g. Adam’s proof shown to be wrong in 2018). Having a way to have a more grounded understanding of things could help both for interpretability & having guarantees of safety.
I think it might be better if you said “may have very harmful long-run consequences” or “is very harmful in expectation” rather than “is very harmful.” I worry that people who don’t already agree with you will find it easier to roll their eyes at “is very harmful.”
Is gain-of-function research “very harmful”? I feel like it’s not appropriate to nickel-and-dime this.
And also, yes, I do think it’s harmful directly, in addition to eventually in expectation. It’s a substantial derogation of a norm that should exist. To explain this concept further:
In addition to risking pandemics, participating in gain-of-function research also sullies and debases the research community, and makes it less the shape it needs to be culturally to do epidemiology. Refusing to take massive risks with minor upsides, even if they’re cool, is also a virtue cultivation practice.
When a politician talks openly about how he wants to rig elections, exchange military aid for domestic political assistance, etc., he causes direct harm now even if the “plans” do not amount to anything later. This is because the speech acts disrupt the equilibria that make similar things less likely in general.
My comments here are intended as an explicit, loud signal of condemnation. This research is misconduct. Frankly, I am frustrated I have to be the one to say this, when it duly falls to community leaders to do so.
I don’t think we disagree about the harmfulness of this kind of research. Our disagreement is about the probable consequences of going around saying “I think this research is harmful and should stop.”
It’s the classic disagreement about how “righteous” vs. “moderate” a movement should be. “Speaking truth to power” vs. “winning hearts and minds.” I don’t have anything interesting to say here, I was just putting in a vote for a small move towards the “moderate” direction. I defer to the judgment of people who spend more time talking to policymakers and AGI capabilities researchers, and if you are such a person, then I defer to your judgment.