Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
(FAIs shouldn’t be a threat, but a theory to create a FAI will obviously have at least potential to be used to create uFAIs. FAI theory will have plenty of dangerous ideas.)
I do think that TORTURE is the obvious option, and I think the main instinct behind SPECKS is scope insensitivity.
That isn’t very reassuring. I believe that if you had the choice of either letting a Paperclip maximizer burn the cosmic commons or torture 100 people, you’d choose to torture 100 people. Wouldn’t you?
...correctly programmed FAIs should not be a threat.
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation. If you accept all this possible oppression in the name of what is subjectively friendliness, how can I be sure that you don’t favor torture for some humans that support CEV, in order to ensure it? After all you already allow for the possibility that many beings are being oppressed or possible killed.
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation.
Another pointless flamewar. This part makes me curious though:
Roko’s original post was, in fact, wrong
There are two ways I can interpret your statement:
a) you know a lot more about decision theory than you’ve disclosed so far (here, in the workshop and elsewhere);
b) you don’t have that advanced knowledge, but won’t accept as “correct” any decision theory that leads to unpalatable consequences like Roko’s scenario.
From my point of view, and as I discussed in the post (this discussion got banned with the rest, although it’s not exactly on that topic), the problem here is the notion of “blackmail”. I don’t know how to formally distinguish that from any other kind of bargaining, and the way in which Roko’s post could be wrong that I remember required this distinction to be made (it could be wrong in other ways, but that I didn’t notice at the time and don’t care to revisit).
(The actual content edited out and posted as a top-level post.)
(I seem to have a talent for writing stuff, then deleting it, and then getting interesting replies. Okay. Let it stay as a little inference exercise for onlookers! And please nobody think that my comment contained interesting secret stuff; it was just a dumb question to Eliezer that I deleted myself, because I figured out on my own what his answer would be.)
Thanks for verbalizing the problems with “blackmail”. I’ve been thinking about these issues in the exact same way, but made no progress and never cared enough to write it up.
Perhaps the reason you are having trouble coming up with a satisfactory characterization of blackmail is that you want a definition with the consequence that it is rational to resist blackmail and therefore not rational to engage in blackmail.
Pleasant though this might be, I fear the universe is not so accomodating.
Elsewhere VN asks how to unpack the notion of a status-quo, and tries to characterize blackmail as a threat which forces the recipient to accept less utility than she would have received in the status quo. I don’t see any reason in game theory why such threats should be treated any differently than other threats. But it is easy enough to define the ‘status-quo’.
The status quo is the solution to a modified game—modified in such a way that the time between moves increases toward infinity and the current significance of those future moves (be they retaliations or compensations) is discounted toward zero. A player who lives in the present and doesn’t respond to delayed gratification or delayed punishment is pretty much immune to threats (and to promises).
On RW it’s called Headless Chicken Mode, when the community appears to go nuts for a time. It generally resolves itself once people have the yelling out of their system.
The trick is not to make any decisions based on the fact that things have gone into headless chicken mode. It’ll pass.
[The comment this is in reply to was innocently deleted by the poster, but not before I made this comment. However, I think I’m making a useful point here, so would prefer to keep this comment.]
This is certainly the case with regard to the kind of decision theoretic thing in Roko’s deleted post. I’m not sure if it is the case with all ideas that might come up while discussing FAI.
Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
And so it was, but not an example for other times when it wasn’t. A rare occurrence. I’m pretty sure it didn’t lead to any errors though, in this simple case.
(I wonder why Eliezer pitched in the way he did, with only weak disambiguation between the content of Tetronian’s comment and commentary on correctness of Roko’s post.)
I got the impression that you responded to “FAI Theory” as our theorizing and Eliezer responded to it as the theory making its way to the eventual FAI.
Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
Pretty much correct in this case. Roko’s original post was, in fact, wrong; correctly programmed FAIs should not be a threat.
(FAIs shouldn’t be a threat, but a theory to create a FAI will obviously have at least potential to be used to create uFAIs. FAI theory will have plenty of dangerous ideas.)
I want to highlight at this point how you think about similar scenarios:
That isn’t very reassuring. I believe that if you had the choice of either letting a Paperclip maximizer burn the cosmic commons or torture 100 people, you’d choose to torture 100 people. Wouldn’t you?
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation. If you accept all this possible oppression in the name of what is subjectively friendliness, how can I be sure that you don’t favor torture for some humans that support CEV, in order to ensure it? After all you already allow for the possibility that many beings are being oppressed or possible killed.
This seems to be true and obviously so.
Narrowness. You can parry almost any statement like this, by posing a context outside its domain of applicability.
Another pointless flamewar. This part makes me curious though:
There are two ways I can interpret your statement:
a) you know a lot more about decision theory than you’ve disclosed so far (here, in the workshop and elsewhere);
b) you don’t have that advanced knowledge, but won’t accept as “correct” any decision theory that leads to unpalatable consequences like Roko’s scenario.
Which is it?
From my point of view, and as I discussed in the post (this discussion got banned with the rest, although it’s not exactly on that topic), the problem here is the notion of “blackmail”. I don’t know how to formally distinguish that from any other kind of bargaining, and the way in which Roko’s post could be wrong that I remember required this distinction to be made (it could be wrong in other ways, but that I didn’t notice at the time and don’t care to revisit).
(The actual content edited out and posted as a top-level post.)
(I seem to have a talent for writing stuff, then deleting it, and then getting interesting replies. Okay. Let it stay as a little inference exercise for onlookers! And please nobody think that my comment contained interesting secret stuff; it was just a dumb question to Eliezer that I deleted myself, because I figured out on my own what his answer would be.)
Thanks for verbalizing the problems with “blackmail”. I’ve been thinking about these issues in the exact same way, but made no progress and never cared enough to write it up.
Perhaps the reason you are having trouble coming up with a satisfactory characterization of blackmail is that you want a definition with the consequence that it is rational to resist blackmail and therefore not rational to engage in blackmail.
Pleasant though this might be, I fear the universe is not so accomodating.
Elsewhere VN asks how to unpack the notion of a status-quo, and tries to characterize blackmail as a threat which forces the recipient to accept less utility than she would have received in the status quo. I don’t see any reason in game theory why such threats should be treated any differently than other threats. But it is easy enough to define the ‘status-quo’.
The status quo is the solution to a modified game—modified in such a way that the time between moves increases toward infinity and the current significance of those future moves (be they retaliations or compensations) is discounted toward zero. A player who lives in the present and doesn’t respond to delayed gratification or delayed punishment is pretty much immune to threats (and to promises).
On RW it’s called Headless Chicken Mode, when the community appears to go nuts for a time. It generally resolves itself once people have the yelling out of their system.
The trick is not to make any decisions based on the fact that things have gone into headless chicken mode. It’ll pass.
[The comment this is in reply to was innocently deleted by the poster, but not before I made this comment. However, I think I’m making a useful point here, so would prefer to keep this comment.]
This is certainly the case with regard to the kind of decision theoretic thing in Roko’s deleted post. I’m not sure if it is the case with all ideas that might come up while discussing FAI.
Wrong and stupid.
FYI, this is an excellent example of contempt.
And so it was, but not an example for other times when it wasn’t. A rare occurrence. I’m pretty sure it didn’t lead to any errors though, in this simple case.
(I wonder why Eliezer pitched in the way he did, with only weak disambiguation between the content of Tetronian’s comment and commentary on correctness of Roko’s post.)
I got the impression that you responded to “FAI Theory” as our theorizing and Eliezer responded to it as the theory making its way to the eventual FAI.
Ok...but why?
Edit: If you don’t want to say why publicly, feel free to PM me.
here