AI can make papers as good as the average scientist, but wow is it slow. Total AI paper output is less than total average scientist output, even with all available compute thrown at it.
AI can write papers as good as the Average scientist. But a lot of progress is driven by the most insightful 1% of scientists. So we get ever more mediocre incremental papers without any revolutionary new paradigms.
AI can make papers as good as the average scientist. For AI safety reasons, this AI is kept rather locked down and not run much. Any results are not trusted in the slightest.
AI can make papers as good as the average scientist. Most of the peer review and journal process is also AI automated. This leads to a goodhearting loop. All the big players are trying to get papers “published” by the million. Almost none of these papers will ever be read by a human. There may be good AI safety ideas somewhere in that giant pile of research. But good luck finding them in the massive piles of superficially plausible rubbish. If making a good paper becomes 100x easier, but making a rubbish paper becomes a million times easier, and telling the difference becomes 2x easier, the whole system get’s buried in mountains of junk papers.
AI’s can do and have done AI safety research. There are now some rather long and technical books that present all the answers. Capabilities is now a question of scaling up chip production. (Which has slow engineering bottlenecks) We aren’t safe yet. When someone has enough chips, will they use that AI safety book or ignore it? What goal will they align their AI to?
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment.
So firstly, in this world capability is bottlenecked by chips. There isn’t a runaway process of software improvements happening yet. And this means there probably aren’t large easy capabilities software improvements lying around.
Now “making capability improvements that are actively tied to alignment somehow” sounds harder than making any capability improvement at all. And you don’t have as much compute as the big players. So you probably don’t find much.
What kind of AI research would make it hard to create a misaligned AI anyway?
A new more efficient matrix multiplication algorithm that only works when it’s part of a CEV maximizing AI?
The big players do care about having instruction-following AIs,
Likely somewhat true.
and if the way to do that is to use the AI safety book, they will use it.
Perhaps. Don’t underestimate sheer incompetence. Someone pressing the run button to test the code works so far, when they haven’t programmed the alignment bit yet. Someone copying and pasting in an alignment function but forgetting to actually call the function anywhere. Misspelled variable names that are actually another variable. Nothing is idiot proof.
I mean presumably alignment is fairly complicated and it could all go badly wrong because of the equivalent of one malfunctioning o-ring. Or what if someone finds a much more efficient approach that’s harder to align.
Possible alternatives.
AI can make papers as good as the average scientist, but wow is it slow. Total AI paper output is less than total average scientist output, even with all available compute thrown at it.
AI can write papers as good as the Average scientist. But a lot of progress is driven by the most insightful 1% of scientists. So we get ever more mediocre incremental papers without any revolutionary new paradigms.
AI can make papers as good as the average scientist. For AI safety reasons, this AI is kept rather locked down and not run much. Any results are not trusted in the slightest.
AI can make papers as good as the average scientist. Most of the peer review and journal process is also AI automated. This leads to a goodhearting loop. All the big players are trying to get papers “published” by the million. Almost none of these papers will ever be read by a human. There may be good AI safety ideas somewhere in that giant pile of research. But good luck finding them in the massive piles of superficially plausible rubbish. If making a good paper becomes 100x easier, but making a rubbish paper becomes a million times easier, and telling the difference becomes 2x easier, the whole system get’s buried in mountains of junk papers.
AI’s can do and have done AI safety research. There are now some rather long and technical books that present all the answers. Capabilities is now a question of scaling up chip production. (Which has slow engineering bottlenecks) We aren’t safe yet. When someone has enough chips, will they use that AI safety book or ignore it? What goal will they align their AI to?
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
So firstly, in this world capability is bottlenecked by chips. There isn’t a runaway process of software improvements happening yet. And this means there probably aren’t large easy capabilities software improvements lying around.
Now “making capability improvements that are actively tied to alignment somehow” sounds harder than making any capability improvement at all. And you don’t have as much compute as the big players. So you probably don’t find much.
What kind of AI research would make it hard to create a misaligned AI anyway?
A new more efficient matrix multiplication algorithm that only works when it’s part of a CEV maximizing AI?
Likely somewhat true.
Perhaps. Don’t underestimate sheer incompetence. Someone pressing the run button to test the code works so far, when they haven’t programmed the alignment bit yet. Someone copying and pasting in an alignment function but forgetting to actually call the function anywhere. Misspelled variable names that are actually another variable. Nothing is idiot proof.
I mean presumably alignment is fairly complicated and it could all go badly wrong because of the equivalent of one malfunctioning o-ring. Or what if someone finds a much more efficient approach that’s harder to align.