It seems like the question you’re asking is close to (2) in my above decomposition.
Yup.
Aren’t you worried that long before human obsoleting AI (or AI safety researcher obsoleting AI), these architectures are very uncompetitive and thus won’t be viable given realistic delay budgets?
Quite uncertain about all this, but I have short timelines and expect likely not many more OOMs of effective compute will be needed to e.g. something which can 30x AI safety research (as long as we really try). I expect shorter timelines / OOM ‘gaps’ to come along with e.g. fewer architectural changes, all else equal. There are also broader reasons why I think it’s quite plausible the high level considerations might not change much even given some architectural changes, discussed in the weak-forward-pass comment (e.g. ‘the parallelism tradeoff’).
I have some hope for a plan like:
Control early transformative AI
Use these AIs to make a safer approach much more competitive. (Maybe the approach consists of AIs which are too weak to scheme in a forward pass but which are combined into some crazy bureaucracy and made very cheap to run.)
Use those next AIs to do something.
(This plan has the downside that it probably requires doing a bunch of general purpose capabilities which might make the situation much more unstable and volatile due to huge compute overhang if fully scaled up in the most performant (but unsafe) way.)
Sounds pretty good to me, I guess the crux (as hinted at during some personal conversations too) might be that I’m just much more optimistic about this being feasible without huge capabilities pushes (again, some arguments in the weak-forward-pass comment, e.g. about CoT distillation seeming to work decently—helping with, for a fixed level of capabilities, more of it coming from scaffolding and less from one forward pass; or on CoT length / inference complexity tradeoffs).
Yup.
Quite uncertain about all this, but I have short timelines and expect likely not many more OOMs of effective compute will be needed to e.g. something which can 30x AI safety research (as long as we really try). I expect shorter timelines / OOM ‘gaps’ to come along with e.g. fewer architectural changes, all else equal. There are also broader reasons why I think it’s quite plausible the high level considerations might not change much even given some architectural changes, discussed in the weak-forward-pass comment (e.g. ‘the parallelism tradeoff’).
Sounds pretty good to me, I guess the crux (as hinted at during some personal conversations too) might be that I’m just much more optimistic about this being feasible without huge capabilities pushes (again, some arguments in the weak-forward-pass comment, e.g. about CoT distillation seeming to work decently—helping with, for a fixed level of capabilities, more of it coming from scaffolding and less from one forward pass; or on CoT length / inference complexity tradeoffs).