Because it’s easy to detect and correct (except that correcting it might push you into one of the other regimes).
So far causally upstream of the human evaluator’s opinion? Eg an AI counselor optimizing for getting to know you
Because it’s easy to detect and correct (except that correcting it might push you into one of the other regimes).
So far causally upstream of the human evaluator’s opinion? Eg an AI counselor optimizing for getting to know you