Orthogonality thesis says that it’s invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It’s entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it’s in fact likely. But you do need to ask, that’s the point of orthogonality thesis, its narrow scope.
Yeah, I agree with what you just said; I should have been more careful with my phrasing.
Maybe something like: “The naive version of the orthogonality thesis where we assume that AIs can’t converge towards human values is assumed to be true too often”
Orthogonality thesis says that it’s invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It’s entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it’s in fact likely. But you do need to ask, that’s the point of orthogonality thesis, its narrow scope.
Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?
Yeah, I agree with what you just said; I should have been more careful with my phrasing.
Maybe something like: “The naive version of the orthogonality thesis where we assume that AIs can’t converge towards human values is assumed to be true too often”