My thesis above is that, at AGI level, the combination of human-like capabilities (except perhaps higher speed, or more encyclopedic knowledge) and making human-like errors in alignment is probably copable with, by mechanisms and techniques comparabe to things like law enforcemnt we use for humans — but that at ASI level it’s likely to be x-risk disastrous, just like most human autocrats are. (I assume that this observation is similar to the concerns others have raised about “sharp left turns” — personally I find the simile with human autocrats more illuminating than an metaphor about an out-of-control vehicle.) So IMO AGI is the last level at which we can afford to be still working the bugs out of/converging to alignment.
My thesis above is that, at AGI level, the combination of human-like capabilities (except perhaps higher speed, or more encyclopedic knowledge) and making human-like errors in alignment is probably copable with, by mechanisms and techniques comparabe to things like law enforcemnt we use for humans — but that at ASI level it’s likely to be x-risk disastrous, just like most human autocrats are. (I assume that this observation is similar to the concerns others have raised about “sharp left turns” — personally I find the simile with human autocrats more illuminating than an metaphor about an out-of-control vehicle.) So IMO AGI is the last level at which we can afford to be still working the bugs out of/converging to alignment.