I agree that there will be potential for harm as people abuse AIs that aren’t quite superintelligent for nefarious purposes. However, in order for that harm to prevent us from facing existential risk due to the control problem, the harm for nefarious use of sub-superintelligent AI itself has to be xrisk-level, and I don’t really see that being the case.
Consider someone consistently giving each new AI release the instructions “become superintelligent and then destroy humanity”. This is not the control problem, but doing this will surely manifest x-risk behaviour at least some degree earlier than when given innocuous instructions?
I think this failure mode would happen extremely close to ordinary AI risk; I don’t think that e.g. solving this failure mode while keeping everything else the same buys you significantly more time to solve the control problem.
I agree that there will be potential for harm as people abuse AIs that aren’t quite superintelligent for nefarious purposes. However, in order for that harm to prevent us from facing existential risk due to the control problem, the harm for nefarious use of sub-superintelligent AI itself has to be xrisk-level, and I don’t really see that being the case.
Consider someone consistently giving each new AI release the instructions “become superintelligent and then destroy humanity”. This is not the control problem, but doing this will surely manifest x-risk behaviour at least some degree earlier than when given innocuous instructions?
I think this failure mode would happen extremely close to ordinary AI risk; I don’t think that e.g. solving this failure mode while keeping everything else the same buys you significantly more time to solve the control problem.