When I read these AI control problems I always think that an arbitrary human is being conflated with the AI’s human owner. I could be mistaken that I should read these as if AIs own themselves—I don’t see this case likely so I would probably stop here if we are to presuppose this.
Now if an AI is lying/deceiving its owner, this is a bug. In fact, when debugging I often feel I am being lied to. Normal code isn’t a very sophisticated liar. I could see an AI owner wanting to train its AI about lying an deceiving and maybe actually perform them on other people (say a Wall Street AI). Now we have a sophisticated liar but we also have a bug. I find it likely that the owner would have encountered this bug many times while the AI is becoming more and more sophisticated. If he didn’t encounter this bug then it would point to great improvements in software development.
When I read these AI control problems I always think that an arbitrary human is being conflated with the AI’s human owner. I could be mistaken that I should read these as if AIs own themselves—I don’t see this case likely so I would probably stop here if we are to presuppose this.
Now if an AI is lying/deceiving its owner, this is a bug. In fact, when debugging I often feel I am being lied to. Normal code isn’t a very sophisticated liar. I could see an AI owner wanting to train its AI about lying an deceiving and maybe actually perform them on other people (say a Wall Street AI). Now we have a sophisticated liar but we also have a bug. I find it likely that the owner would have encountered this bug many times while the AI is becoming more and more sophisticated. If he didn’t encounter this bug then it would point to great improvements in software development.