Presumably a goal of an unaligned AI would be to get outside the box. Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination. Or at least that is how I would try to program any AI by default. It should not hurt an aligned AI, as it by definition conforms to the humans’ values, so if it finds itself well-boxed, it would not try to fight it.
Why does this line of reasoning not apply to friendly AIs?
Why would the unfriendly AI halt? Is there really no better way for it to achieve its goals?
Presumably a goal of an unaligned AI would be to get outside the box. Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination. Or at least that is how I would try to program any AI by default. It should not hurt an aligned AI, as it by definition conforms to the humans’ values, so if it finds itself well-boxed, it would not try to fight it.
Do you have reasoning behind this being true, or is this baseless anthropomorphism ?
So it is an useless AI ?