Can you give me some references for the idea that “you don’t need to have solved the AGI problem to have solved friendliness”? I’m not saying it’s not true, I just want to improve this article.
Let’s taboo “solved” for a minute.
Say you have a detailed, rigorous theory of Friendliness, but you don’t have it implemented in code as part of an AGI. You are racing with your competitor to code a self-improving super-AGI. Isn’t it still quicker to implement something that doesn’t incorporate Friendliness?
To me, it seems like, even if the theory was settled, Friendliness would be an additional feature you would have to code into an AI that would take extra time and effort.
What I’m getting at is that, throughout the history of computing, the version of a system with desirable property X, even if the theoretical benefits of X are well known by the academy, has tended to be implemented and deployed commercially after the version without X. For example, it would have been better for the general public and web developers if web browsers obeyed W3C specifications and didn’t have any extra proprietary tags—but in practice, commercial pressures meant that companies made grossly non-compliant browsers for years until eventually they started moving towards compliance.
The “Friendly browser” theory was solved, but compliant and non-compliant browsers still weren’t on basically equal footing.
(Now, you might say that CEV will be way more mathematical and rigorous than browser specifications—but the only important point for my argument is that it will take more effort to implement than the alternative).
Now you could say that browser compliance is a fairly trvial matter, and corporations will be more cautious about deploying AGI. But the potential gain from deploying a super-AI first would surely be much greater than the benefit of supporting the blink tag or whatever—so the incentive to rationalise away the perceived dangers will be much greater.
If you have a rigorous, detailed theory of Friendliness, you presumably also know that creating an Unfriendly AI is suicide and won’t do it. If one competitor in the race doesn’t have the Friendliness theory or the understanding of why it’s important, that’s a serious problem, but I don’t see any programmer who understands Friendliness deliberately leaving it out.
Also, what little I know about browser design suggests that, say, supporting the blink tag is an extra chunk of code that gets added on later, possibly with a few deeper changes to existing code. Friendliness, on the other hand, is something built into every part of the system—you can’t just leave it out and plan to patch it in later, even if you’re clueless enough to think that’s a good idea.
OK, what about the case where there’s a CEV theory which can extrapolate the volition of all humans, or a subset of them? It’s not suicide for you to tell the AI “coherently extrapolate my volition/the shareholders’ volition”. But it might be hell for the people whose interests aren’t taken into account.
At that point, that particular company wouldn’t be able to build the AI any faster than other companies, so at that point it’s just a matter of getting an FAI out there first and have it optimize rapidly enough that it could destroy any UFAI that come along after.
Can you give me some references for the idea that “you don’t need to have solved the AGI problem to have solved friendliness”? I’m not saying it’s not true, I just want to improve this article.
Let’s taboo “solved” for a minute.
Say you have a detailed, rigorous theory of Friendliness, but you don’t have it implemented in code as part of an AGI. You are racing with your competitor to code a self-improving super-AGI. Isn’t it still quicker to implement something that doesn’t incorporate Friendliness?
To me, it seems like, even if the theory was settled, Friendliness would be an additional feature you would have to code into an AI that would take extra time and effort.
What I’m getting at is that, throughout the history of computing, the version of a system with desirable property X, even if the theoretical benefits of X are well known by the academy, has tended to be implemented and deployed commercially after the version without X. For example, it would have been better for the general public and web developers if web browsers obeyed W3C specifications and didn’t have any extra proprietary tags—but in practice, commercial pressures meant that companies made grossly non-compliant browsers for years until eventually they started moving towards compliance.
The “Friendly browser” theory was solved, but compliant and non-compliant browsers still weren’t on basically equal footing.
(Now, you might say that CEV will be way more mathematical and rigorous than browser specifications—but the only important point for my argument is that it will take more effort to implement than the alternative).
Now you could say that browser compliance is a fairly trvial matter, and corporations will be more cautious about deploying AGI. But the potential gain from deploying a super-AI first would surely be much greater than the benefit of supporting the blink tag or whatever—so the incentive to rationalise away the perceived dangers will be much greater.
If you have a rigorous, detailed theory of Friendliness, you presumably also know that creating an Unfriendly AI is suicide and won’t do it. If one competitor in the race doesn’t have the Friendliness theory or the understanding of why it’s important, that’s a serious problem, but I don’t see any programmer who understands Friendliness deliberately leaving it out.
Also, what little I know about browser design suggests that, say, supporting the blink tag is an extra chunk of code that gets added on later, possibly with a few deeper changes to existing code. Friendliness, on the other hand, is something built into every part of the system—you can’t just leave it out and plan to patch it in later, even if you’re clueless enough to think that’s a good idea.
OK, what about the case where there’s a CEV theory which can extrapolate the volition of all humans, or a subset of them? It’s not suicide for you to tell the AI “coherently extrapolate my volition/the shareholders’ volition”. But it might be hell for the people whose interests aren’t taken into account.
At that point, that particular company wouldn’t be able to build the AI any faster than other companies, so at that point it’s just a matter of getting an FAI out there first and have it optimize rapidly enough that it could destroy any UFAI that come along after.