I’m on the optimists discord and I do make the above argument explicitly in this presentation (e.g. slide 4): Reasons for optimism about superalignment (though, fwiw, Idk if I’d go all the way down to 1% p(doom), but I have probably updated something like 10% to <5%, and most of my uncertainty now comes more from the governance / misuse side).
On your points ‘Is massive effective acceleration enough?’ and ‘Will “human level” systems be sufficiently controlled to get enough useful work?’, I think conditioned on aligned-enough ~human-level automated alignment RAs, the answers to the above are very likely yes, because it should be possible to get a very large amount of work out of those systems even in a very brief amount of time—e.g. a couple of months (feasible with e.g. a coordinated pause, or even with a sufficient lead). See e.g. slides 9, 10 of the above presentation (and I’ll note that this argument isn’t new, it’s been made in variously similar forms by e.g. Ajeya Cotra, Lukas Finnveden, Jacob Steinhardt).
I’m generally reasonably optimistic about using human level-ish systems to do a ton of useful work while simultaneously avoiding most risk from these systems. But, I think this requires substantial effort and won’t clearly go well by default.
I’m on the optimists discord and I do make the above argument explicitly in this presentation (e.g. slide 4): Reasons for optimism about superalignment (though, fwiw, Idk if I’d go all the way down to 1% p(doom), but I have probably updated something like 10% to <5%, and most of my uncertainty now comes more from the governance / misuse side).
On your points ‘Is massive effective acceleration enough?’ and ‘Will “human level” systems be sufficiently controlled to get enough useful work?’, I think conditioned on aligned-enough ~human-level automated alignment RAs, the answers to the above are very likely yes, because it should be possible to get a very large amount of work out of those systems even in a very brief amount of time—e.g. a couple of months (feasible with e.g. a coordinated pause, or even with a sufficient lead). See e.g. slides 9, 10 of the above presentation (and I’ll note that this argument isn’t new, it’s been made in variously similar forms by e.g. Ajeya Cotra, Lukas Finnveden, Jacob Steinhardt).
I’m generally reasonably optimistic about using human level-ish systems to do a ton of useful work while simultaneously avoiding most risk from these systems. But, I think this requires substantial effort and won’t clearly go well by default.