Because it’s too technically hard to align some cognitive process that is powerful enough, and operating in a sufficiently dangerous domain, to stop the next group from building an unaligned AGI in 3 months or 2 years. Like, they can’t coordinate to build an AGI that builds a nanosystem because it is too technically hard to align their AGI technology in the 2 years before the world ends.
I’m not totally convinced by this argument because of the quote below:
The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.
It seems to me this does hugely change things. I think we are underestimating the amount of change humans will be able to make in the short timeframe after we get human level AI and before recursive self improvement gets developed. Human level AI + huge amounts of compute would allow you to take over the world through much more conventional means, like massively hacking computer systems to render your opponents powerless (and other easy-to-imagine more gruesome ways). So the first group to develop near-human level AI wouldn’t need to align it in 2 years, because it would have the chance to shut down everyone else. It may not even come down to the first group who develops it, but the first people who have access to some powerful system, since they could use that to hack the group itself and do what they wish without requiring the buy-in from others—this would depend on a lot of factors like how controlled is the access to the AI and how quickly a single person can use AI to take control over physical stuff. I’m not saying this would be easy to do, but certainly seems within the realm of plausibility.
I agree… seems to me like the most likely course of action to happen is for some researchers to develop human+ / fast capability gain AI that is dedicated to the problem of solving AI Alignment, and its that AI which develops the solution that can be implemented for aligning the inevitable AGI.
I’m not totally convinced by this argument because of the quote below:
It seems to me this does hugely change things. I think we are underestimating the amount of change humans will be able to make in the short timeframe after we get human level AI and before recursive self improvement gets developed. Human level AI + huge amounts of compute would allow you to take over the world through much more conventional means, like massively hacking computer systems to render your opponents powerless (and other easy-to-imagine more gruesome ways). So the first group to develop near-human level AI wouldn’t need to align it in 2 years, because it would have the chance to shut down everyone else. It may not even come down to the first group who develops it, but the first people who have access to some powerful system, since they could use that to hack the group itself and do what they wish without requiring the buy-in from others—this would depend on a lot of factors like how controlled is the access to the AI and how quickly a single person can use AI to take control over physical stuff. I’m not saying this would be easy to do, but certainly seems within the realm of plausibility.
I agree… seems to me like the most likely course of action to happen is for some researchers to develop human+ / fast capability gain AI that is dedicated to the problem of solving AI Alignment, and its that AI which develops the solution that can be implemented for aligning the inevitable AGI.