yieldthought comments on Loss of Alignment is not the High-Order Bit for AI Risk

yieldthought 14 Oct 2022 11:10 UTC
1 point
0
Consider someone consistently giving each new AI release the instructions “become superintelligent and then destroy humanity”. This is not the control problem, but doing this will surely manifest x-risk behaviour at least some degree earlier than when given innocuous instructions?
- tailcalled 14 Oct 2022 11:49 UTC
  8 points
  6
  Parent
  I think this failure mode would happen extremely close to ordinary AI risk; I don’t think that e.g. solving this failure mode while keeping everything else the same buys you significantly more time to solve the control problem.