If that evidence would update you that far, then your space of doom hypotheses seems far too narrow. There is so much that we don’t know about strong AI. A failure to be rapidly killed only seems to rule out some of the highest-risk hypotheses, while still leaving plenty of hypotheses in which doom is still highly likely but slower.
If we get to that point of AI capabilities, we will likely be able to make 50 years of scientific progress in a matter of months for domains which are not too constrained by physical experimentation (just run more compute for LLMs), and I’d expect AI safety to be one of those. So either we die quickly thereafter, or we’ve solved AI safety. Getting LLMs to do scientific progress basically telescopes the future.
Are you assuming that there will be a sudden jump in AI scientific research capability from subhuman to strongly superhuman? It is one possibility, sure. Another is that the first AIs capable of writing research papers won’t be superhumanly good at it, and won’t advance research very far or even in a useful direction. It seems to me quite likely that this state of affairs will persist for at least six months.
Do you give the latter scenario less than 0.01 probability? That seems extremely confident to me.
I don’t think we need superhuman capability here for stuff to get crazy, pure volume of papers could substitute for that. If you can write a mediocre but logically correct paper with $50 of compute instead of with $10k of graduate student salary, that accelerates the pace of progress by a factor of 200, which seems enough for me to enable a whole bunch of other advances which will feed into AI research and make the models even better.
That’s not a math or physics paper, and it includes a bit more “handholding” in the form of an explicit database than would really make me update. The style of scientific papers is obviously very easy to copy for current LLMs, what I’m trying to get at is that if LLMs can start to make genuinely novel contributions at a slightly below-human level and learn from the mediocre article they write, pure volume of papers can make up for quality.
AI can make papers as good as the average scientist, but wow is it slow. Total AI paper output is less than total average scientist output, even with all available compute thrown at it.
AI can write papers as good as the Average scientist. But a lot of progress is driven by the most insightful 1% of scientists. So we get ever more mediocre incremental papers without any revolutionary new paradigms.
AI can make papers as good as the average scientist. For AI safety reasons, this AI is kept rather locked down and not run much. Any results are not trusted in the slightest.
AI can make papers as good as the average scientist. Most of the peer review and journal process is also AI automated. This leads to a goodhearting loop. All the big players are trying to get papers “published” by the million. Almost none of these papers will ever be read by a human. There may be good AI safety ideas somewhere in that giant pile of research. But good luck finding them in the massive piles of superficially plausible rubbish. If making a good paper becomes 100x easier, but making a rubbish paper becomes a million times easier, and telling the difference becomes 2x easier, the whole system get’s buried in mountains of junk papers.
AI’s can do and have done AI safety research. There are now some rather long and technical books that present all the answers. Capabilities is now a question of scaling up chip production. (Which has slow engineering bottlenecks) We aren’t safe yet. When someone has enough chips, will they use that AI safety book or ignore it? What goal will they align their AI to?
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment.
So firstly, in this world capability is bottlenecked by chips. There isn’t a runaway process of software improvements happening yet. And this means there probably aren’t large easy capabilities software improvements lying around.
Now “making capability improvements that are actively tied to alignment somehow” sounds harder than making any capability improvement at all. And you don’t have as much compute as the big players. So you probably don’t find much.
What kind of AI research would make it hard to create a misaligned AI anyway?
A new more efficient matrix multiplication algorithm that only works when it’s part of a CEV maximizing AI?
The big players do care about having instruction-following AIs,
Likely somewhat true.
and if the way to do that is to use the AI safety book, they will use it.
Perhaps. Don’t underestimate sheer incompetence. Someone pressing the run button to test the code works so far, when they haven’t programmed the alignment bit yet. Someone copying and pasting in an alignment function but forgetting to actually call the function anywhere. Misspelled variable names that are actually another variable. Nothing is idiot proof.
I mean presumably alignment is fairly complicated and it could all go badly wrong because of the equivalent of one malfunctioning o-ring. Or what if someone finds a much more efficient approach that’s harder to align.
If that evidence would update you that far, then your space of doom hypotheses seems far too narrow. There is so much that we don’t know about strong AI. A failure to be rapidly killed only seems to rule out some of the highest-risk hypotheses, while still leaving plenty of hypotheses in which doom is still highly likely but slower.
If we get to that point of AI capabilities, we will likely be able to make 50 years of scientific progress in a matter of months for domains which are not too constrained by physical experimentation (just run more compute for LLMs), and I’d expect AI safety to be one of those. So either we die quickly thereafter, or we’ve solved AI safety. Getting LLMs to do scientific progress basically telescopes the future.
Are you assuming that there will be a sudden jump in AI scientific research capability from subhuman to strongly superhuman? It is one possibility, sure. Another is that the first AIs capable of writing research papers won’t be superhumanly good at it, and won’t advance research very far or even in a useful direction. It seems to me quite likely that this state of affairs will persist for at least six months.
Do you give the latter scenario less than 0.01 probability? That seems extremely confident to me.
I don’t think we need superhuman capability here for stuff to get crazy, pure volume of papers could substitute for that. If you can write a mediocre but logically correct paper with $50 of compute instead of with $10k of graduate student salary, that accelerates the pace of progress by a factor of 200, which seems enough for me to enable a whole bunch of other advances which will feed into AI research and make the models even better.
So you’re now strongly expecting to die in less than 6 months? (Assuming that the tweet is not completely false)
That’s not a math or physics paper, and it includes a bit more “handholding” in the form of an explicit database than would really make me update. The style of scientific papers is obviously very easy to copy for current LLMs, what I’m trying to get at is that if LLMs can start to make genuinely novel contributions at a slightly below-human level and learn from the mediocre article they write, pure volume of papers can make up for quality.
Possible alternatives.
AI can make papers as good as the average scientist, but wow is it slow. Total AI paper output is less than total average scientist output, even with all available compute thrown at it.
AI can write papers as good as the Average scientist. But a lot of progress is driven by the most insightful 1% of scientists. So we get ever more mediocre incremental papers without any revolutionary new paradigms.
AI can make papers as good as the average scientist. For AI safety reasons, this AI is kept rather locked down and not run much. Any results are not trusted in the slightest.
AI can make papers as good as the average scientist. Most of the peer review and journal process is also AI automated. This leads to a goodhearting loop. All the big players are trying to get papers “published” by the million. Almost none of these papers will ever be read by a human. There may be good AI safety ideas somewhere in that giant pile of research. But good luck finding them in the massive piles of superficially plausible rubbish. If making a good paper becomes 100x easier, but making a rubbish paper becomes a million times easier, and telling the difference becomes 2x easier, the whole system get’s buried in mountains of junk papers.
AI’s can do and have done AI safety research. There are now some rather long and technical books that present all the answers. Capabilities is now a question of scaling up chip production. (Which has slow engineering bottlenecks) We aren’t safe yet. When someone has enough chips, will they use that AI safety book or ignore it? What goal will they align their AI to?
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
So firstly, in this world capability is bottlenecked by chips. There isn’t a runaway process of software improvements happening yet. And this means there probably aren’t large easy capabilities software improvements lying around.
Now “making capability improvements that are actively tied to alignment somehow” sounds harder than making any capability improvement at all. And you don’t have as much compute as the big players. So you probably don’t find much.
What kind of AI research would make it hard to create a misaligned AI anyway?
A new more efficient matrix multiplication algorithm that only works when it’s part of a CEV maximizing AI?
Likely somewhat true.
Perhaps. Don’t underestimate sheer incompetence. Someone pressing the run button to test the code works so far, when they haven’t programmed the alignment bit yet. Someone copying and pasting in an alignment function but forgetting to actually call the function anywhere. Misspelled variable names that are actually another variable. Nothing is idiot proof.
I mean presumably alignment is fairly complicated and it could all go badly wrong because of the equivalent of one malfunctioning o-ring. Or what if someone finds a much more efficient approach that’s harder to align.