Weak down-vote: I feel like if one takes this position to its logical extreme, they could claim that any arbitrary AI misbehavior is not misaligned, almost by definition: you just don’t know the true held values of its creators, according to which this behavior is perfectly aligned.
Weak down-vote: I feel like if one takes this position to its logical extreme, they could claim that any arbitrary AI misbehavior is not misaligned, almost by definition: you just don’t know the true held values of its creators, according to which this behavior is perfectly aligned.