I’m actually not coming up with any- it seems to be a tough problem. Here’s an elaborate hypothetical that I’m not particularly worried about, but which serves as a case study:
Suppose that Robin Hanson is right about the Singularity (no discontinuity, no singleton, just rapid economic doubling until technology reaches physical limits, at which point it’s a hardscrapple expansion through the future lightcone for those rich enough to afford descendants), and that furthermore, EY knows it and has been trying to deceive the rest of us in order to fund an early AI, and thus grab a share of the Singularity pie for himself and a few chosen friends.
The thing that makes this seem implausible right now are that the SIAI people I know don’t seem to be the sort of people who are into long cons, and also, their object-level arguments about the Singularity make sense to me. But, uh, I’m not sure that I can stake the future on my ability to play a game of Mafia. So I’m wondering if SIAI has come up with any ideas (stronger than a mission statement) to make credible their dedication to a fair Singularity.
I haven’t devoted much time to this because I don’t think anybody who has ever interacted with us in person has ever thought this was likely, and I’m not sure if anyone even on the internet has ever made the accusation—though of course some have raised the vague possibility, as you have. In other words, I doubt this worry is anyone’s true rejection, whereas I suspect the lack of peer-reviewed papers from SIAI is many people’s true rejection.
Skepticism about SIAI’s competence screens off skepticism about SIAI’s intentions, so of course that’s not the true rejection for the vast majority of people. But it genuinely troubles me if nobody’s thought of the latter question at all, beyond “Trust us, we have no incentive to implement anything but CEV”.
If I told you that a large government or corporation was working hard on AGI plus Friendliness content (and that they were avoiding the obvious traps), even if they claimed altruistic goals, wouldn’t you worry a bit about their real plan? What features would make you more or less worried?
I think the key point is that we’re not there yet. Whatever theoretical tools we shape now are either generally useful, or generally useless, irrespective of considerations of motive; currently relevant question is (potential) competence. Only at some point in the (moderately distant) future, conditional on current and future work bearing fruit, motive might become relevant.
What features would make you more or less worried?
I’d worry about selfish institutional behavior, or explicit identification of the programmers’ goals with the nation/corporation’s selfish interests. Also, I guess, belief in the moral infallibility of some guru.
Otherwise I wouldn’t worry about motives, not unless I thought one programmer could feasibly deceive the others and tell the AI to look only at this person’s goals. Well, I have to qualify that—if everyone in the relevant subculture agreed on moral issues and we never saw any public disagreement on what the future of humanity should look like, then maybe I’d worry. That might give each of them a greater expectation of getting what they want if they go with a more limited goal than CEV.
An “outside view” might be to put the SI in the reference class of “groups who are trying to create a utopia” and observe that previous such efforts that have managed to gain momentum have tended to make the world worse.
I think the reality is more complicated than that, but that might be part of what motivates these kind of questions.
I think the biggest specific trust-related issue I have is with CEV—getting the utility function generation process right is really important, and in an optimal world I’d expect to see CEV subjected to a process of continual improvement and informed discussion. I haven’t seen that, but it’s hard to tell whether the SI are being overly protective of their CEV document or whether it’s just really hard getting the right people talking about it in the right way.
Suppose that Robin Hanson is right about the Singularity (no discontinuity, no singleton, just rapid economic doubling until technology reaches physical limits, at which point it’s a hardscrapple expansion through the future lightcone for those rich enough to afford descendants), and that furthermore, EY knows it and has been trying to deceive the rest of us in order to fund an early AI, and thus grab a share of the Singularity pie for himself and a few chosen friends.
It would be clearer to say that Robin is right about the future, that there will not be a singularity. A hardscrapple race through the frontier basically just isn’t one.
Can you give any examples of what you’re thinking of, so I can be clearer about what you have in mind when you ask your question?
I’m actually not coming up with any- it seems to be a tough problem. Here’s an elaborate hypothetical that I’m not particularly worried about, but which serves as a case study:
Suppose that Robin Hanson is right about the Singularity (no discontinuity, no singleton, just rapid economic doubling until technology reaches physical limits, at which point it’s a hardscrapple expansion through the future lightcone for those rich enough to afford descendants), and that furthermore, EY knows it and has been trying to deceive the rest of us in order to fund an early AI, and thus grab a share of the Singularity pie for himself and a few chosen friends.
The thing that makes this seem implausible right now are that the SIAI people I know don’t seem to be the sort of people who are into long cons, and also, their object-level arguments about the Singularity make sense to me. But, uh, I’m not sure that I can stake the future on my ability to play a game of Mafia. So I’m wondering if SIAI has come up with any ideas (stronger than a mission statement) to make credible their dedication to a fair Singularity.
Right.
I haven’t devoted much time to this because I don’t think anybody who has ever interacted with us in person has ever thought this was likely, and I’m not sure if anyone even on the internet has ever made the accusation—though of course some have raised the vague possibility, as you have. In other words, I doubt this worry is anyone’s true rejection, whereas I suspect the lack of peer-reviewed papers from SIAI is many people’s true rejection.
Skepticism about SIAI’s competence screens off skepticism about SIAI’s intentions, so of course that’s not the true rejection for the vast majority of people. But it genuinely troubles me if nobody’s thought of the latter question at all, beyond “Trust us, we have no incentive to implement anything but CEV”.
If I told you that a large government or corporation was working hard on AGI plus Friendliness content (and that they were avoiding the obvious traps), even if they claimed altruistic goals, wouldn’t you worry a bit about their real plan? What features would make you more or less worried?
I think the key point is that we’re not there yet. Whatever theoretical tools we shape now are either generally useful, or generally useless, irrespective of considerations of motive; currently relevant question is (potential) competence. Only at some point in the (moderately distant) future, conditional on current and future work bearing fruit, motive might become relevant.
I’d worry about selfish institutional behavior, or explicit identification of the programmers’ goals with the nation/corporation’s selfish interests. Also, I guess, belief in the moral infallibility of some guru.
Otherwise I wouldn’t worry about motives, not unless I thought one programmer could feasibly deceive the others and tell the AI to look only at this person’s goals. Well, I have to qualify that—if everyone in the relevant subculture agreed on moral issues and we never saw any public disagreement on what the future of humanity should look like, then maybe I’d worry. That might give each of them a greater expectation of getting what they want if they go with a more limited goal than CEV.
An “outside view” might be to put the SI in the reference class of “groups who are trying to create a utopia” and observe that previous such efforts that have managed to gain momentum have tended to make the world worse.
I think the reality is more complicated than that, but that might be part of what motivates these kind of questions.
I think the biggest specific trust-related issue I have is with CEV—getting the utility function generation process right is really important, and in an optimal world I’d expect to see CEV subjected to a process of continual improvement and informed discussion. I haven’t seen that, but it’s hard to tell whether the SI are being overly protective of their CEV document or whether it’s just really hard getting the right people talking about it in the right way.
Am I to take this as a general answer to the overall question of trustworthiness or is this intended just as an answer to the specific example?
It would be clearer to say that Robin is right about the future, that there will not be a singularity. A hardscrapple race through the frontier basically just isn’t one.
If you want to hypothesize that SingInst has secrets plus an evil plan, the secrets and plan have to combine in such a way that it’s a good plan.