it seems like you really think that what I think of as the “normal, boring” world just isn’t going to happen.
I agree. I don’t think that RSI is a crux for me on that front, FYI.
It sounds from skimming your comment (I’m travelling at the moment, so I won’t reply in much depth, sorry) like there is in fact a misunderstanding in here somewhere. Like:
If you had a 50% chance on “something like boring business as usual with SGD driving crucial performance improvements at the crucial time” then your dismissal of prosaic AI alignment seems strange to me.
I do not have that view, and my alternative view is not particularly founded on RSI.
Trotting out some good old fashioned evolutionary analogies, my models say that something boring with natural selection pushed humans past thresholds that allowed some other process (that was neither natural selection nor RSI) to drive a bunch of performance improvements, and I expect that shocks like that can happen again.
RSI increases the upside from such a shock. But also RSI is easier to get started in a clean mind than in a huge opaque model, so \shrug maybe it won’t be relevant until after the acute risk period ends.
That sure sounds like he’s on board with the part of RSI that is obvious, and what he’s saying is precisely that other crazy stuff will happen first, essentially that we will use computers to replace the hardware of brains before we replace the software.
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
As far as I can make out from Eliezer and your comments, you think that instead the action is crossing a criticality threshold of “k>1,”
Speaking for myself, it looks like the action is in crossing the minimum of [some threshold humans crossed and chimps didn’t] and [the threshold for recursive self-improvement of the relevant mind] (and perhaps-more-realistically [the other thresholds we cannot forsee], given that this looks like thresholdy terrain), where the RSI threshould might in principle be the lowest one on a particularly clean mind design, but it’s not looking like we’re angling towards particularly clean minds.
(Also, to be clear, my median guess is that some self-modification probably does wind up being part of the mix. But, like, if we suppose it doesn’t, or that it’s not playing a key role, then I’m like “huh, I guess the mind was janky enough that the returns on that weren’t worth the costs \shrug”.)
My guess is that past-Eliezer and/or past-I were conflating RSI thresholds with other critical thresholds (perhaps by not super explicitly tracking the difference) in a way that bred this particuar confusion. Oops, sorry.
I’d be interested to get predictions from you and Eliezer about what you think is going to happen in relevant domains over the next 5 years.
For what it’s worth, the sort of predictions I was reverse-soliciting were predictions of the form “we just trained the system X on task Y which looks alignment-related to us, and are happy to share details of the setup, how do you think it performed?”. I find it much easier to generate predictions of that form, than to generate open-ended predictions about what the field will be able to pull off in the near-term (where my models aren’t particularly sharply concentrated (which means that anyone who wants to sharply concentrate probability has an opportunity to take Bayes points off of me! (though ofc I’d appreciate the option to say either “oh, well sure, that’s obvious” or “that’s not obvious to me!” in advance of hearing the results, if you think that someone’s narrow prediction is particularly novel with respect to me))).
I don’t know why the domain looks thresholdy to you. Do you think some existing phenomena in ML look thresholdy in practice? Do you see a general argument for thresholds even if the k>1 criticality threshold argument doesn’t pan out? Is the whole thing coming down to generalization from chimps → humans?
Some central reasons the terrain looks thresholdy to me:
Science often comes with “click” moments, where many things slide into place and start making sense.
As we enter the ‘AI can do true science’ regime, it becomes important that AI can unlock new technologies (both cognitive/AI technologies, and other impactful technologies), new scientific disciplines and subdisciplines, new methodologies and ways of doing intellectual inquiry, etc.
‘The ability to invent new technologies’ and ‘the ability to launch into new scientific fields/subfields’, including ones that may not even be on our radar today (whether or not they’re ‘hard’ in an absolute sense — sometimes AI will just think differently from us), is inherently thresholdy, because ‘starting or creating an entirely new thing’ is a 0-to-1 change, more so than ‘incrementally improving on existing technologies and subdisciplines’ tends to be.
Many of these can also use one discovery/innovation to reach other discoveries/innovations, increasing the thresholdiness. (An obvious example of this is RSI, but AI can also just unlock a scientific subdiscipline that chains into a bunch of new discoveries, leads to more new subdisciplines, etc.)
Empirically, humans did not need to evolve separate specialized-to-the-field modules in order to be able to do biotechnology as well as astrophysics as well as materials science as well as economics as well as topology. Some combination of ‘human-specific machinery’ and ‘machinery that precedes humans’ sufficed to do all the sciences (that we know of), even though those fields didn’t exist in the environment our brain was being built in. Thus, general intelligence is a thing; you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences.
Empirically, all of these fields sprang into existence almost simultaneously for humans, within the space of a few decades or centuries. So in addition to the general points above about “clicks are a thing” and “starting new fields and inventing new technologies is threshold-y”, it’s also the case that AGI is likely to unlock all of the sciences simultaneously in much the same way humans did.
That one big “click” moment, that unlocks all the other click moments and new sciences/technologies and sciences-and-technologies-that-chain-off-of-those-sciences-and-technologies, implies that many different thresholds are likely to get reached at the same time.
Which increases the probability that even if one specific threshold wouldn’t have been crazily high-impact on its own, the aggregate effect of many of those thresholds at once does end up crazily high-impact.
you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences
And indeed, I would be extremely surprised if we find a way to do AI that only lets you build general-purpose par-human astrophysics AI, but doesn’t also let you build general-purpose par-human biochemistry AI.
(There may be an AI technique like that in principle, but I expect it to be a very weird technique you’d have to steer toward on purpose; general techniques are a much easier way to build science AI. So I don’t think that the first general-purpose astrophysics AI system we build will be like that, in the worlds where we build general-purpose astrophysics AI systems.)
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
FWIW, I think everyone agrees strongly with “which crazy stuff happens first seems pretty important”. Paul was saying that Robin never disagreed with eventual RSI, but just argued that other crazy stuff would happen first. So Robin shouldn’t be criticized on the grounds of disagreeing about the importance of RSI, unless you want to claim that RSI is the first crazy thing that happens (which you don’t seem to believe particularly strongly). But it’s totally fair game to e.g. criticize the prediction that ems will happen before de-novo AI (if you think that now looks very unlikely).
I agree. I don’t think that RSI is a crux for me on that front, FYI.
It sounds from skimming your comment (I’m travelling at the moment, so I won’t reply in much depth, sorry) like there is in fact a misunderstanding in here somewhere. Like:
I do not have that view, and my alternative view is not particularly founded on RSI.
Trotting out some good old fashioned evolutionary analogies, my models say that something boring with natural selection pushed humans past thresholds that allowed some other process (that was neither natural selection nor RSI) to drive a bunch of performance improvements, and I expect that shocks like that can happen again.
RSI increases the upside from such a shock. But also RSI is easier to get started in a clean mind than in a huge opaque model, so \shrug maybe it won’t be relevant until after the acute risk period ends.
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
Speaking for myself, it looks like the action is in crossing the minimum of [some threshold humans crossed and chimps didn’t] and [the threshold for recursive self-improvement of the relevant mind] (and perhaps-more-realistically [the other thresholds we cannot forsee], given that this looks like thresholdy terrain), where the RSI threshould might in principle be the lowest one on a particularly clean mind design, but it’s not looking like we’re angling towards particularly clean minds.
(Also, to be clear, my median guess is that some self-modification probably does wind up being part of the mix. But, like, if we suppose it doesn’t, or that it’s not playing a key role, then I’m like “huh, I guess the mind was janky enough that the returns on that weren’t worth the costs \shrug”.)
My guess is that past-Eliezer and/or past-I were conflating RSI thresholds with other critical thresholds (perhaps by not super explicitly tracking the difference) in a way that bred this particuar confusion. Oops, sorry.
For what it’s worth, the sort of predictions I was reverse-soliciting were predictions of the form “we just trained the system X on task Y which looks alignment-related to us, and are happy to share details of the setup, how do you think it performed?”. I find it much easier to generate predictions of that form, than to generate open-ended predictions about what the field will be able to pull off in the near-term (where my models aren’t particularly sharply concentrated (which means that anyone who wants to sharply concentrate probability has an opportunity to take Bayes points off of me! (though ofc I’d appreciate the option to say either “oh, well sure, that’s obvious” or “that’s not obvious to me!” in advance of hearing the results, if you think that someone’s narrow prediction is particularly novel with respect to me))).
I don’t know why the domain looks thresholdy to you. Do you think some existing phenomena in ML look thresholdy in practice? Do you see a general argument for thresholds even if the k>1 criticality threshold argument doesn’t pan out? Is the whole thing coming down to generalization from chimps → humans?
Some central reasons the terrain looks thresholdy to me:
Science often comes with “click” moments, where many things slide into place and start making sense.
As we enter the ‘AI can do true science’ regime, it becomes important that AI can unlock new technologies (both cognitive/AI technologies, and other impactful technologies), new scientific disciplines and subdisciplines, new methodologies and ways of doing intellectual inquiry, etc.
‘The ability to invent new technologies’ and ‘the ability to launch into new scientific fields/subfields’, including ones that may not even be on our radar today (whether or not they’re ‘hard’ in an absolute sense — sometimes AI will just think differently from us), is inherently thresholdy, because ‘starting or creating an entirely new thing’ is a 0-to-1 change, more so than ‘incrementally improving on existing technologies and subdisciplines’ tends to be.
Many of these can also use one discovery/innovation to reach other discoveries/innovations, increasing the thresholdiness. (An obvious example of this is RSI, but AI can also just unlock a scientific subdiscipline that chains into a bunch of new discoveries, leads to more new subdisciplines, etc.)
Empirically, humans did not need to evolve separate specialized-to-the-field modules in order to be able to do biotechnology as well as astrophysics as well as materials science as well as economics as well as topology. Some combination of ‘human-specific machinery’ and ‘machinery that precedes humans’ sufficed to do all the sciences (that we know of), even though those fields didn’t exist in the environment our brain was being built in. Thus, general intelligence is a thing; you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences.
Empirically, all of these fields sprang into existence almost simultaneously for humans, within the space of a few decades or centuries. So in addition to the general points above about “clicks are a thing” and “starting new fields and inventing new technologies is threshold-y”, it’s also the case that AGI is likely to unlock all of the sciences simultaneously in much the same way humans did.
That one big “click” moment, that unlocks all the other click moments and new sciences/technologies and sciences-and-technologies-that-chain-off-of-those-sciences-and-technologies, implies that many different thresholds are likely to get reached at the same time.
Which increases the probability that even if one specific threshold wouldn’t have been crazily high-impact on its own, the aggregate effect of many of those thresholds at once does end up crazily high-impact.
And indeed, I would be extremely surprised if we find a way to do AI that only lets you build general-purpose par-human astrophysics AI, but doesn’t also let you build general-purpose par-human biochemistry AI.
(There may be an AI technique like that in principle, but I expect it to be a very weird technique you’d have to steer toward on purpose; general techniques are a much easier way to build science AI. So I don’t think that the first general-purpose astrophysics AI system we build will be like that, in the worlds where we build general-purpose astrophysics AI systems.)
Do you think that things won’t look thresholdy even in a capability regime in which a large actor can work out how melt all the GPUs?
FWIW, I think everyone agrees strongly with “which crazy stuff happens first seems pretty important”. Paul was saying that Robin never disagreed with eventual RSI, but just argued that other crazy stuff would happen first. So Robin shouldn’t be criticized on the grounds of disagreeing about the importance of RSI, unless you want to claim that RSI is the first crazy thing that happens (which you don’t seem to believe particularly strongly). But it’s totally fair game to e.g. criticize the prediction that ems will happen before de-novo AI (if you think that now looks very unlikely).