Donald Hobson gives a comment below explaining some reasoning around dealing with unknown unknowns, but it’s not a direct answer to the question, so I’ll offer that.
The short answer is “yes”.
The longer answer is that this is one of the fundamental considerations in approaching AI alignment and is why some organizations, like MIRI, have taken an approach that doesn’t drive straight at the object-level problem and instead tackles issues likely to be foundational to any approach to alignment that could work. In fact you might say the big schism between MIRI and, say, OpenAI, is that MIRI places greater emphasis on addressing the unknown whereas OpenAI expects alignment to look more like an engineering problem with relatively small and not especially dangerous unknown unknowns.
(note: I am not affiliated with either organization so this is an informed opinion on their general approaches, and also note that neither organization is monolithic and individual researches vary greatly in their assessment of these risks.)
My own efforts addressing AI alignment are largely about addressing these sorts of questions, because I think we still poorly understand what alignment even really means. In this sense I know that there is a lot we don’t know, but I don’t know all of what we don’t know that we’ll need to (so known unknown unknowns).
Donald Hobson gives a comment below explaining some reasoning around dealing with unknown unknowns, but it’s not a direct answer to the question, so I’ll offer that.
The short answer is “yes”.
The longer answer is that this is one of the fundamental considerations in approaching AI alignment and is why some organizations, like MIRI, have taken an approach that doesn’t drive straight at the object-level problem and instead tackles issues likely to be foundational to any approach to alignment that could work. In fact you might say the big schism between MIRI and, say, OpenAI, is that MIRI places greater emphasis on addressing the unknown whereas OpenAI expects alignment to look more like an engineering problem with relatively small and not especially dangerous unknown unknowns.
(note: I am not affiliated with either organization so this is an informed opinion on their general approaches, and also note that neither organization is monolithic and individual researches vary greatly in their assessment of these risks.)
My own efforts addressing AI alignment are largely about addressing these sorts of questions, because I think we still poorly understand what alignment even really means. In this sense I know that there is a lot we don’t know, but I don’t know all of what we don’t know that we’ll need to (so known unknown unknowns).