So far, my experience is that AI safety engineers have much better intuitions about AI safety than normal AI folk tend to have. Like, I haven’t yet encountered anybody in this field who thinks we’ll get desirable behavior from fully autonomous systems by default. They all understand that it’s extremely difficult to translate into intuitively desirable behavior into mathematically precise design requirements. They understand that when high safety standards are required, you’ve got to build the system from the ground up for safety rather than slapping on a safety module near the end. So I’ve been mildly encouraged by these conversations even though almost none of them are thinking about the longer-term issues — at least not yet.
I did not follow the interviews in detail. But I doubt that most of these AI safety engineers believe that you could achieve AI software that can drive trains and fly planes without crashing, but which nonetheless drives and flies people to locations they do not desire. In other words, my guess is that these people believe that without being able to prove that programs meet certain conditions you won’t achieve FOOM in the first place. What they probably do not believe is MIRI’s idea of an AI that works perfectly along a huge number of dimensions (e.g. making itself superhuman smart, solving the protein folding problem etc.), but which nonetheless fails at doing what people designed it to do (except that it does not fail at all the aforementioned tasks).
The problem isn’t so much that the AI doesn’t do what is was designed to do, it’s that what you implemented is subtly different from what you designed. This is something that commonly happens in programming, not just a hypothetical concern.
I did not follow the interviews in detail. But I doubt that most of these AI safety engineers believe that you could achieve AI software that can drive trains and fly planes without crashing, but which nonetheless drives and flies people to locations they do not desire. In other words, my guess is that these people believe that without being able to prove that programs meet certain conditions you won’t achieve FOOM in the first place. What they probably do not believe is MIRI’s idea of an AI that works perfectly along a huge number of dimensions (e.g. making itself superhuman smart, solving the protein folding problem etc.), but which nonetheless fails at doing what people designed it to do (except that it does not fail at all the aforementioned tasks).
The problem isn’t so much that the AI doesn’t do what is was designed to do, it’s that what you implemented is subtly different from what you designed. This is something that commonly happens in programming, not just a hypothetical concern.