I haven’t devoted much time to this because I don’t think anybody who has ever interacted with us in person has ever thought this was likely, and I’m not sure if anyone even on the internet has ever made the accusation—though of course some have raised the vague possibility, as you have. In other words, I doubt this worry is anyone’s true rejection, whereas I suspect the lack of peer-reviewed papers from SIAI is many people’s true rejection.
Skepticism about SIAI’s competence screens off skepticism about SIAI’s intentions, so of course that’s not the true rejection for the vast majority of people. But it genuinely troubles me if nobody’s thought of the latter question at all, beyond “Trust us, we have no incentive to implement anything but CEV”.
If I told you that a large government or corporation was working hard on AGI plus Friendliness content (and that they were avoiding the obvious traps), even if they claimed altruistic goals, wouldn’t you worry a bit about their real plan? What features would make you more or less worried?
I think the key point is that we’re not there yet. Whatever theoretical tools we shape now are either generally useful, or generally useless, irrespective of considerations of motive; currently relevant question is (potential) competence. Only at some point in the (moderately distant) future, conditional on current and future work bearing fruit, motive might become relevant.
What features would make you more or less worried?
I’d worry about selfish institutional behavior, or explicit identification of the programmers’ goals with the nation/corporation’s selfish interests. Also, I guess, belief in the moral infallibility of some guru.
Otherwise I wouldn’t worry about motives, not unless I thought one programmer could feasibly deceive the others and tell the AI to look only at this person’s goals. Well, I have to qualify that—if everyone in the relevant subculture agreed on moral issues and we never saw any public disagreement on what the future of humanity should look like, then maybe I’d worry. That might give each of them a greater expectation of getting what they want if they go with a more limited goal than CEV.
An “outside view” might be to put the SI in the reference class of “groups who are trying to create a utopia” and observe that previous such efforts that have managed to gain momentum have tended to make the world worse.
I think the reality is more complicated than that, but that might be part of what motivates these kind of questions.
I think the biggest specific trust-related issue I have is with CEV—getting the utility function generation process right is really important, and in an optimal world I’d expect to see CEV subjected to a process of continual improvement and informed discussion. I haven’t seen that, but it’s hard to tell whether the SI are being overly protective of their CEV document or whether it’s just really hard getting the right people talking about it in the right way.
Right.
I haven’t devoted much time to this because I don’t think anybody who has ever interacted with us in person has ever thought this was likely, and I’m not sure if anyone even on the internet has ever made the accusation—though of course some have raised the vague possibility, as you have. In other words, I doubt this worry is anyone’s true rejection, whereas I suspect the lack of peer-reviewed papers from SIAI is many people’s true rejection.
Skepticism about SIAI’s competence screens off skepticism about SIAI’s intentions, so of course that’s not the true rejection for the vast majority of people. But it genuinely troubles me if nobody’s thought of the latter question at all, beyond “Trust us, we have no incentive to implement anything but CEV”.
If I told you that a large government or corporation was working hard on AGI plus Friendliness content (and that they were avoiding the obvious traps), even if they claimed altruistic goals, wouldn’t you worry a bit about their real plan? What features would make you more or less worried?
I think the key point is that we’re not there yet. Whatever theoretical tools we shape now are either generally useful, or generally useless, irrespective of considerations of motive; currently relevant question is (potential) competence. Only at some point in the (moderately distant) future, conditional on current and future work bearing fruit, motive might become relevant.
I’d worry about selfish institutional behavior, or explicit identification of the programmers’ goals with the nation/corporation’s selfish interests. Also, I guess, belief in the moral infallibility of some guru.
Otherwise I wouldn’t worry about motives, not unless I thought one programmer could feasibly deceive the others and tell the AI to look only at this person’s goals. Well, I have to qualify that—if everyone in the relevant subculture agreed on moral issues and we never saw any public disagreement on what the future of humanity should look like, then maybe I’d worry. That might give each of them a greater expectation of getting what they want if they go with a more limited goal than CEV.
An “outside view” might be to put the SI in the reference class of “groups who are trying to create a utopia” and observe that previous such efforts that have managed to gain momentum have tended to make the world worse.
I think the reality is more complicated than that, but that might be part of what motivates these kind of questions.
I think the biggest specific trust-related issue I have is with CEV—getting the utility function generation process right is really important, and in an optimal world I’d expect to see CEV subjected to a process of continual improvement and informed discussion. I haven’t seen that, but it’s hard to tell whether the SI are being overly protective of their CEV document or whether it’s just really hard getting the right people talking about it in the right way.
Am I to take this as a general answer to the overall question of trustworthiness or is this intended just as an answer to the specific example?